I’ve been using Conda for multiple years in my Python projects to manage packages and environments. I’m going to share my experience with Conda and show potential advantages compared to other solutions.
Working With Virtual Environments.
Plenty of nice libraries are available for Python, but dependencies management can quickly become a big problem. Let’s try to illustrate with a short story.
Bob is a beginner Python developer, he has an exciting project in mind, so he installs Python 3.7 and starts coding.
His project requires some external modules, like NumPy and OpenCV, so he installs them. After one week of hard work, and 20 other modules installed, the code is finally working 🙂 time to move to a new project.
Few projects (and installed modules) later, Bob wants to upgrade OpenCV to the newest version for the needs of a new machine-learning-potato-detector project, and also he wants to try some features of Python 3.8. He upgrades everything, and successfully finish the project!
After a few other projects (and on-the-fly updates of some modules), for some reason, Bob needs to run an old project he was working on months earlier. and…
As you can imagine, it doesn’t work anymore!
Is that because of some OpenCV updates? Python version update? maybe both? Hard to say without an in-depth check.
Bob doesn’t remember the exact version of OpenCV used on this old project, but if he downgrades his current version, will the more recent projects / some recently installed modules still work properly? Probably not.
Long story short, virtual environments are a way to avoid those problems. Virtual environments allow multiple isolated installations of Python, with different modules, on a single system.
For each project, you can work on a fresh dedicated environment, with the python version you need, and just the modules you need 🙂 !
With virtual environments, your projects are also more portable. You know exactly the Python version and modules required to run them individually.
Pip, Virtualenv, Pyenv, Venv & Friends
In this section, I’ll describe the “standard” (non-Conda) ways of dealing with packages and virtual environments with Python.
What is Pip ?
Pip (Package Installer for Python), is shipped with recent versions of Python (2.7.9+ and 3.4+). It allows you to download and install packages from Pypi (Python Packages Index), the official Python packages repository.
Examples of Pip commands:
# Install a package pip install numpy # Install a specific version pip install numpy==1.18 # list installed packages pip list # Save a list of installed packages to a text file pip freeze > requirements.txt
Ways to Setup Virtual Environments in Python
There are actually many ways to manage virtual environments for Python.
Third-party (PyPI) libraries:
- virtualenv is probably the most popular one. It allows creating environments with different modules but doesn’t help to manage different Python versions.
- pyenv is a tool to manage different versions of Python on your system.
- pyenv-virtualenv is a pyenv extension to make pyenv (different Python versions) works with virtualenv (isolated environments with different modules). It is also compatible with venv if you are using Python 3.3+
- virtualenvwrapper is an extension on top of virtualenv providing useful commands to speed up working with virtual environments.
- pyenv-virtualenvwrapper is a pyenv extension to make pyenv (different Python versions) works with virtualenvwrapper.
- pipenv is a nice effort to make Python package management feel a bit like Npm / Cargo. It kind of combines Pip, PipFile, and virtualenv together.
- venv is a lightweight package shipped with Python 3. It’s pretty similar to virtualenv but with a more limited set of features.
Venv is quite promising, but most people continue to use virtualenv + pyenv because it just does the job, offers more features, and some shameless folks are still using Python 2!
Why Conda ?
Conda is a generic multi-platform & open-source package manager and environment manager. It allows you to set up virtual environments, and to download & install conda packages from the Anaconda Cloud repository.
Python Packages vs Python as a Package
Even if Conda depends on Python, it is not Python-specific. Actually, CPython (the standard Python interpreter) is shipped as a conda package itself.
The following command:
conda create -n "my-env" python=3.7
Will create a new conda environment named “my-env” and install the latest CPython 3.7.x package.
To enter/activate your newly created environment, simply execute:
conda activate my-env
You should see (my-env) written on the left of your shell, your virtual environment is activated and you can start installing packages!
It is absolutely possible to install PyPI packages using pip in a Conda environment!
So yes, if you want to use conda as a drop-in replacement for virtualenv, pyenv, and friends, it’s perfectly fine. But it also allows you to install Conda packages from the Anaconda Cloud repository.
Examples of Conda commands:
# Install a package in the current environment conda install numpy # Install a specific version/source of a package in the current environment conda install numpy=1.18 # list installed packages in the current environment conda list # Save a list of the current environment installed packages to a text file conda env export > environment.yml
But why would you need Conda packages ? Isn’t Pip doing the job ?
Python… But Not Only
Indeed, a lot of Python modules (OpenCV, Tensorflow, Numpy, etc…) are written in a compiled (faster) language like C/C++. This way we can write easy Python code that runs at a decent speed.
Conda is by nature a binary package manager, so the installation of these packages is straight forward, as long as they are available for your platform.
For a long time, it was quite complicated to install those libraries with pip, as it was trying to build them from source on the fly. Conda addressed this problem at the time.
But since the introduction of binary wheels on PyPI, their number keeps increasing. Binary wheels are precompiled binary packages, the concept is a bit similar to Conda: you don’t need to build anything, just let Pip pick the right wheel for your platform / Python version.
So, nowadays running pip install NumPy or pip install OpenCV-python should grab a binary wheel and work without issues.
…We’re still at the same point, is there clear advantages using conda over pip for package management?
Conda is used a lot for Scientific Programming, Machine Learning, Computer Vision, etc… So precompiled packages are built with performance in mind.
Taking NumPy as an example, the PyPI version is built against OpenBLAS whereas Conda version links to Intel MKL. I ran a quick performance test on my computer (using this piece of code), measuring the execution time of some NumPy methods in both Pip and Conda versions:
As you can see, Conda version is quite faster !
Conda TensorFlow also seems to be built against Intel MKL and offers better CPU performances than the default PyPI version.
In the specific case of Tensorflow-GPU, Conda version embeds Nvidia CuDNN, way easier than login onto the Nvidia website and downloading it manually.
This is probably the most important reason why I use Conda.
If you take for example PyPI or Anaconda Cloud, they both share some challenges that are common to almost all public package repositories:
- How to make sure thousands of open-source packages are well maintained ?
- Is there any error in this new release?
- How to avoid duplicates?
- What we do if the authors of a package don’t want to maintain its distribution on the repository?
You, me, everyone can upload a package to PyPI or Anaconda Cloud, so how to avoid ending up with a big mess?
on PyPI, most of the packages (like NumPy) are actually backed by their authors (and it works quite well like that). But it’s not always the case.
For example, most popular OpenCV PyPI wheels are built by Olli-Pekka Heinisuo who is doing an amazing job at it! (with nice CI/CD), but what if he decides to stop? maybe this nice work on CI/CD could also benefit other packages?
This is where conda-forge is really smart.
A community-led collection of recipes, build infrastructure and distributions for the conda package manager.conda-forge description from https://conda-forge.org/
Conda Forge is a Github Organization hosting repositories of Conda recipes. CI/CD runs automatically thanks to open-source friendly providers (AppVeyor, Azure Pipeline, CircleCI, and TravisCI) to create binaries for Windows, Mac OS, and Linux, and upload them to Anaconda Cloud conda-forge repo.
So even if you’re not a CI/CD expert, you can submit your own packages (binary or pure Python), check if they work on all platforms, and help to maintain existing packages in a very unified/clean way.
I really like how dependencies are handled, some common C++ libraries like LibTiff or Intel MKL are available as “Pinned Dependencies” on conda-forge, which means all the packages can share the same versions of those libs, and avoid any clash at runtime.
If you want to learn more, feel free to visit the conda-forge website.
As I said in the intro, I’ve been using Conda for multiple years, and I’m pretty happy with it.
The virtual environment management is deadly simple and does a perfect job for me. So I almost always use Conda as a base for my Python installations.
A Conda environment is an isolated standard Python install, so it can do everything a standard Python install does, plus installing Conda packages 😉
About packages, as I said earlier, you can still use pip in a Conda environment.
You can also (to a small extent) mix pip and conda packages, but I try to avoid that as much as possible. Conda keeps improving its support for PyPI dependencies to allow proper mixing.
So depending on the project:
- If I’m working on some image processing/machine learning stuff that requires a lot of precompiled binary packages. I use conda (forge) packages as much as possible, and PyPI if some packages are not available on Conda. Building from source is also an option for performance-critical packages.
- For smaller projects that I wish to share, with mainly pure python dependencies, I use a Conda environment, but pip to install dependencies so non-Conda users can still set up the project
What about Docker ?
Docker is a really nice option to work in isolated environments.
For some 3d scanning reasons, I work mainly on Windows. At the moment, Docker for desktop doesn’t have a proper GPU support, so I can’t rely only on Docker for my virtual environments.
I use Docker especially for web/projects that will be deployed on a Linux server, and it’s totally fine to install Conda on it.