Conda for Python Packages & Environments

6 min read

I’ve been using Conda for multiple years in my Python projects to manage packages and environments. I’m going to share my experience with Conda and show potential advantages compared to other solutions.

Working With Virtual Environments.

Plenty of nice libraries are available for Python, but dependencies management can quickly become a big problem. Let’s try to illustrate with a short story.

Bob is a beginner Python developer, he has an exciting project in mind, so he installs Python 3.7 and starts coding.

His project requires some external modules, like NumPy and OpenCV, so he installs them. After one week of hard work, and 20 other modules installed, the code is finally working 🙂 time to move to a new project.

Few projects (and installed modules) later, Bob wants to upgrade OpenCV to the newest version for the needs of a new machine-learning-potato-detector project, and also he wants to try some features of Python 3.8. He upgrades everything, and successfully finish the project!

After a few other projects (and on-the-fly updates of some modules), for some reason, Bob needs to run an old project he was working on months earlier. and…

As you can imagine, it doesn’t work anymore!

Is that because of some OpenCV updates? Python version update? maybe both? Hard to say without an in-depth check.

Bob doesn’t remember the exact version of OpenCV used on this old project, but if he downgrades his current version, will the more recent projects / some recently installed modules still work properly? Probably not.

Long story short, virtual environments are a way to avoid those problems. Virtual environments allow multiple isolated installations of Python, with different modules, on a single system.

For each project, you can work on a fresh dedicated environment, with the python version you need, and just the modules you need 🙂 !

With virtual environments, your projects are also more portable. You know exactly the Python version and modules required to run them individually.

Pip, Virtualenv, Pyenv, Venv & Friends

In this section, I’ll describe the “standard” (non-Conda) ways of dealing with packages and virtual environments with Python.

What is Pip ?

Pip (Package Installer for Python), is shipped with recent versions of Python (2.7.9+ and 3.4+). It allows you to download and install packages from Pypi (Python Packages Index), the official Python packages repository.

Examples of Pip commands:

# Install a package
pip install numpy

# Install a specific version
pip install numpy==1.18

# list installed packages
pip list

# Save a list of installed packages to a text file
pip freeze > requirements.txt

Ways to Setup Virtual Environments in Python

There are actually many ways to manage virtual environments for Python.

Third-party (PyPI) libraries:

  • virtualenv is probably the most popular one. It allows creating environments with different modules but doesn’t help to manage different Python versions.
  • pyenv is a tool to manage different versions of Python on your system.
  • pyenv-virtualenv is a pyenv extension to make pyenv (different Python versions) works with virtualenv (isolated environments with different modules). It is also compatible with venv if you are using Python 3.3+
  • virtualenvwrapper is an extension on top of virtualenv providing useful commands to speed up working with virtual environments.
  • pyenv-virtualenvwrapper is a pyenv extension to make pyenv (different Python versions) works with virtualenvwrapper.
  • pipenv is a nice effort to make Python package management feel a bit like Npm / Cargo. It kind of combines Pip, PipFile, and virtualenv together.

Standard Library:

  • venv is a lightweight package shipped with Python 3. It’s pretty similar to virtualenv but with a more limited set of features.

A bit confusing, isn’t it (no, it is not Javascript 😆 )? And I’m probably missing some libraries.

Venv is quite promising, but most people continue to use virtualenv + pyenv because it just does the job, offers more features, and some shameless folks are still using Python 2!

Why Conda ?

Conda is a generic multi-platform & open-source package manager and environment manager. It allows you to set up virtual environments, and to download & install conda packages from the Anaconda Cloud repository.

Python Packages vs Python as a Package

Even if Conda depends on Python, it is not Python-specific. Actually, CPython (the standard Python interpreter) is shipped as a conda package itself.

The following command:

conda create -n "my-env" python=3.7

Will create a new conda environment named “my-env” and install the latest CPython 3.7.x package.

To enter/activate your newly created environment, simply execute:

conda activate my-env

You should see (my-env) written on the left of your shell, your virtual environment is activated and you can start installing packages!

It is absolutely possible to install PyPI packages using pip in a Conda environment!

So yes, if you want to use conda as a drop-in replacement for virtualenv, pyenv, and friends, it’s perfectly fine. But it also allows you to install Conda packages from the Anaconda Cloud repository.

Examples of Conda commands:

# Install a package in the current environment
conda install numpy

# Install a specific version/source of a package in the current environment
conda install numpy=1.18

# list installed packages in the current environment
conda list

# Save a list of the current environment installed packages to a text file
conda env export > environment.yml

But why would you need Conda packages ? Isn’t Pip doing the job ?

Python… But Not Only

Indeed, a lot of Python modules (OpenCV, Tensorflow, Numpy, etc…) are written in a compiled (faster) language like C/C++. This way we can write easy Python code that runs at a decent speed.

Conda is by nature a binary package manager, so the installation of these packages is straight forward, as long as they are available for your platform.

For a long time, it was quite complicated to install those libraries with pip, as it was trying to build them from source on the fly. Conda addressed this problem at the time.

But since the introduction of binary wheels on PyPI, their number keeps increasing. Binary wheels are precompiled binary packages, the concept is a bit similar to Conda: you don’t need to build anything, just let Pip pick the right wheel for your platform / Python version.

So, nowadays running pip install NumPy or pip install OpenCV-python should grab a binary wheel and work without issues.

…We’re still at the same point, is there clear advantages using conda over pip for package management?

Performance

Conda is used a lot for Scientific Programming, Machine Learning, Computer Vision, etc… So precompiled packages are built with performance in mind.

Taking NumPy as an example, the PyPI version is built against OpenBLAS whereas Conda version links to Intel MKL. I ran a quick performance test on my computer (using this piece of code), measuring the execution time of some NumPy methods in both Pip and Conda versions:

Numpy operations in Pip & Conda Numpy packages : execution time in seconds (less is better)
Numpy operations in Pip & Conda Numpy packages : relative execution time (less is better)

As you can see, Conda version is quite faster !

Conda TensorFlow also seems to be built against Intel MKL and offers better CPU performances than the default PyPI version.

In the specific case of Tensorflow-GPU, Conda version embeds Nvidia CuDNN, way easier than login onto the Nvidia website and downloading it manually.

Conda-Forge

This is probably the most important reason why I use Conda.

If you take for example PyPI or Anaconda Cloud, they both share some challenges that are common to almost all public package repositories:

  • How to make sure thousands of open-source packages are well maintained ?
  • Is there any error in this new release?
  • How to avoid duplicates?
  • What we do if the authors of a package don’t want to maintain its distribution on the repository?

You, me, everyone can upload a package to PyPI or Anaconda Cloud, so how to avoid ending up with a big mess?

on PyPI, most of the packages (like NumPy) are actually backed by their authors (and it works quite well like that). But it’s not always the case.

For example, most popular OpenCV PyPI wheels are built by Olli-Pekka Heinisuo who is doing an amazing job at it! (with nice CI/CD), but what if he decides to stop? maybe this nice work on CI/CD could also benefit other packages?

This is where conda-forge is really smart.

A community-led collection of recipes, build infrastructure and distributions for the conda package manager.

conda-forge description from https://conda-forge.org/

Conda Forge is a Github Organization hosting repositories of Conda recipes. CI/CD runs automatically thanks to open-source friendly providers (AppVeyor, Azure Pipeline, CircleCI, and TravisCI) to create binaries for Windows, Mac OS, and Linux, and upload them to Anaconda Cloud conda-forge repo.

So even if you’re not a CI/CD expert, you can submit your own packages (binary or pure Python), check if they work on all platforms, and help to maintain existing packages in a very unified/clean way.

I really like how dependencies are handled, some common C++ libraries like LibTiff or Intel MKL are available as “Pinned Dependencies” on conda-forge, which means all the packages can share the same versions of those libs, and avoid any clash at runtime.

If you want to learn more, feel free to visit the conda-forge website.

Wrap Up

Conda & Anaconda

Conda and Anaconda are not exactly the same thing.

Anaconda is a distribution of CPython (the official Python interpreter) with Conda, and a ton of pre-installed libraries like NumPy, SciPy, Pandas, etc…

If like me you just need CPython and Conda, I would recommend installing Miniconda or Miniforge.

As I said in the intro, I’ve been using Conda for multiple years, and I’m pretty happy with it.

The virtual environment management is deadly simple and does a perfect job for me. So I almost always use Conda as a base for my Python installations.

A Conda environment is an isolated standard Python install, so it can do everything a standard Python install does, plus installing Conda packages 😉

About packages, as I said earlier, you can still use pip in a Conda environment.

You can also (to a small extent) mix pip and conda packages, but I try to avoid that as much as possible. Conda keeps improving its support for PyPI dependencies to allow proper mixing.

So depending on the project:

  • If I’m working on some image processing/machine learning stuff that requires a lot of precompiled binary packages. I use conda (forge) packages as much as possible, and PyPI if some packages are not available on Conda. Building from source is also an option for performance-critical packages.
  • For smaller projects that I wish to share, with mainly pure python dependencies, I use a Conda environment, but pip to install dependencies so non-Conda users can still set up the project

What about Docker ?

Docker is a really nice option to work in isolated environments.

For some 3d scanning reasons, I work mainly on Windows. At the moment, Docker for desktop doesn’t have a proper GPU support, so I can’t rely only on Docker for my virtual environments.

I use Docker especially for web/projects that will be deployed on a Linux server, and it’s totally fine to install Conda on it.

Leave a Reply

Your email address will not be published. Required fields are marked *