Pip’s shortcomings and techniques to manage them

While the NPM ecosystem is a dumpster fire of 20-line function that should mostly be copy-and-pasted into a project directly, its technical underpinnings seem pretty solid. Rarely have I had an issue with the package manager itself. It has dependencies and development dependencies as first-class concepts. It has lockfiles, and manages transitive dependencies in a way that only bites me a couple of times a year. NPM also has alternatives (yarn, pmpn, etc), but having a standard package manager ship with the language has ultimately been a benefit the language, even if it is ultimately owned by Microsoft.

Python’s package management ecosystem is the wild west. Pip has long served as the default package manager for Python. Unlike modern PHP, Ruby, JS or Go, packages are installed as global by default.

Local package installations in Pip also don’t create any kind of package manifest by default. This is where Python package management begins to fall apart.

Package manifest problems

Many developers use the command pip freeze > requirements.txt to generate a manifest of all packages their app depends on. pip freeze lists all packages installed and their current versions. This technique has several problems.

Packages are global by default

If you have multiple Python projects on your machine that have installed global dependencies, these will be included in your newly created requirements.txt. Your Flask app could now have Django and it’s dependencies in the requirements file.

Using venv to create isolated virtual Python environments can mitigate this, but it’s not a bulletproof solution. You can forget to activate the virtual environment and install dependencies globally, or you can create a requirements file when not using the virtual environment. Editors like VS Code that have deep integrations with venv make it harder to make these mistakes, but the mistakes are still possible to make.

pip freeze makes no distinction between dependencies and transitive dependencies

Installing Flask also installs Werkzeug, Jinja, MarkupSafe, ItsDangerous, Click, and Blinker. Using pip freeze dumps all of these, even though Flask is the only direct dependency. If Flask ever removed one of these as a dependency, it may still be in your requirements.txt file forever. However, that might not a problem because of the next issue.

pip freeze locks packages to the exact install version by default

If you have Flask 3.0.2 installed, the requirements file will list it as Flask==3.0.2. If there’s a critical security fix in Flask 3.0.3, you aren’t going to get it without manually editing the requirements.txt file to tell it what range of Flask versions are acceptable, or by upgrading each package individually using something like pip install <package_name> --upgrade for each package in the requirements file.

Dependencies can be orphaned during upgrades

If you manually upgrade a dependency and it no longer requires one of its transitive dependencies, the dependencies will continue to exist in ip, and still be present when running pip freeze.

There’s no first-class way of marking development dependencies

If you need pytest or another similar package that’s only used for testing, there’s no way to mark that dependency as development-only within Pip or a the requirements file dumped by pip freeze. If these are installed in your virtual environment, they’ll be dumped to requirements.txt and shipped to production.

Practices for working around Pip’s deficiencies

Given how much of the world runs Python and how much of Python runs Pip, it’s astonishing that the world has not descended into chaos. There are numerous tools and techniques for working around Pip’s shortcomings. We will first focus on some techniques.

Techniques for taming Pip center around creating multiple requirements files, and sometimes even creating multiple virtual environments. Let’s explore these.

Only use requirements.txt for top-level dependencies

The requirements.txt doesn’t absolutely have to be programmatically generated from the output of pip freeze. You can manually create a requirements.txt and hand-populate it with your app’s requirements. This means that dependencies will no longer be installed by running pip install <package_name>, but instead added to requirements.txt.

# requirements.txt

Flask
gunicorn

While this is a good start, running pip install -r requirements.txt will always install the latest version of Flask and Gunicorn. This introduces risk: Flask 3.0 had a breaking change in a transitive dependency that broke much of the Flask ecosystem. Ideally, we want security updates, but not major version upgrades that could break our apps.

# requirements.txt

Flask == 3.0.*
gunicorn == 21.*

You can also specify ranges (thanks Django forums):

Django>=3.2,<4.0

Pip has numerous version specifiers as defined in PEP 440, but for most of my cases grabbing the latest of a major version will probably be enough.

This gives an at-a-glance view of all non-transitive dependencies within an application, but if this is the only technique used to specify dependencies, the development environment and production environment could running dramatically different versions of dependencies, which may cause bugs.

To ensure that the server runs the exact same dependencies that are used locally, you can create a lock file.

source .venv/bin/activate
pip install -r requirements.txt
pip freeze > requirements-lock.txt

Once the lock-file is created, it should be used to install dependencies locally and in production by running pip install -r requirements-lock.txt. When it comes time to upgrade dependencies, you can run the same sequence of commands above to generate a new lock file.

While this technique solves separating top level application dependencies from transitive dependencies and provides a plausible upgrade path, it doesn’t solve the problem of orphaned dependencies or defining development dependencies.

Remove orphaned dependencies

If you want to ensure dependencies that were orphaned during an upgrade are not listed when running pip freeze, you must start with a clean environment before creating a lock file. To do this, you must completely destroy your virtual environment and rebuild it before installing top-level dependencies and creating a lock file.

deactivate
rm -rf .venv
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip freeze > requirements-lock.txt

Goodness–that is tedious!

Handling development dependencies

Okie dokie, next we need to find a way to deal with development dependencies. Next we need–you guessed it–another requirements file!

# requirements-dev.txt

pytest==8.*

Now that we have our development dependencies, let’s say we wanted to create lock files for both our dependencies and development dependencies. Here’s where things get really obnoxious:

deactivate
rm -rf .venv
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip freeze > requirements-lock.txt
pip install -r requirements-dev.txt
pip freeze > requirements-dev-lock.txt

The requirements-dev-lock.txt file will include everything in requirements-lock.txt, along with all of the development dependencies and their transitive dependencies. Therefore, when installing locally or in CI, you will run pip install -r requirements-dev-lock.txt. For production, you will instead run pip install -r requirements-lock.txt. Since this process is potentially error prone, it should probably line in a bash script.

Testing

Nothing about the previously listed techniques are foolproof. A developer could run pip install <package_name> and then run pip freeze > requirements-freeze.txt, which would circumvent the requirements.txt file completely. The next time the requirements-freeze.txt file gets regenerated, that dependency would be completely absent from the application. The best way to catch that before pushing to production is making liberal use of automated testing within CI so that a broken app never makes it to production.

Tools for working around Pip’s deficiencies

We had to do a lot of work just to get to a more error-prone version of what NPM provides out of the box. As a result, the Python community has built an multitude of tools to provide a more sane package management experience. Whatever the intention of the authors, it’s created a wild west of package management tools.

I will not discuss any tools in detail here, but I will mention a few that I think are interesting. I will also mention that I don’t currently use any of these: I typically try to stay on the paved path that most developers in an ecosystem use, even if that paved path sucks. When one of these tools becomes more popular than using Pip, I’ll adopt it.

pipreqs and pigar generate a requirements.txt file for an application’s top-level dependencies by analyzing the code itself. These would be useful if you have a project that currently dumps all top-level and transitive dependencies in requirements.txt. The pip-chill package seems to do something similar.

pip-compile from pip-tools takes top-level requirements defined in requirements.in, pyproject.toml, setup.cfg or setup.py and turns it into a requirements.txt file that includes transitive dependencies without needing to destroy and recreate virtual environments like the examples above. You can also specify development dependencies using the as shown in the pip-tools README.

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "my-cool-django-app"
version = "42"
dependencies = ["django"]

[project.optional-dependencies]
dev = ["pytest"]

A dev-requirements.txt can then be generated with the following command:

pip-compile --extra dev -o dev-requirements.txt pyproject.toml

pipenv is a tool to create a virtual environments and install packages. It looks like it can specify the version of Python used by a project and set the virtual environment to use that version. It also has a Pipfile and Pipfile.lock, which seem analogous to NPM’s package.json and package-lock.json. Also like NPM, it allows specifying developments dependencies.

Poetry and PDM are alternative package managers that uses the pyproject.toml format defined in PIP 518. From my limited understanding, Poetry is older and more established, but PDM follows the pyproject.toml standard more closely. I could be wrong

Rye is an all-in-one package management and Python development experience by Armin Ronacher, the creator of Flask. It stitches together various open source tools like pip-tools, ruff and others to create a start-to-finish experience with Python. At the time of writing this post, it is still early days.

pur is a tool for updating the dependencies listed in requirements.txt to their latest version. It looks like a fairly blunt instrument that could be dangerous.

Reactions to tools

While I don’t see myself adopting Poetry, PDM or Rye anytime soon, using the new pyproject.toml standard alongside pip-compile seems particularly compelling. Using something like pipreqs to extract top-level dependencies from an existing project also seems like a great option, and if I need to do that in the future I’ll explore using that tool.

Parting thoughts

The tools and techniques mentioned in here are really for managing applications. Managing library dependencies may require different approaches. You can read thoughts about that in this Reddit thread.

References

These three Stack Overflow questions were immensely helpful in me putting together this list of tools:

This article introduced me pur:

The article Why I moved away from Poetry for Python, while only helpful in discussing why Poetry might not be a great solution, generated a ton of conversation online that was helpful and had links to several helpful articles. The general consensus is to use pip-compile from pip-tools.

Leave a comment