Creating a private Python package with Git and using it in Docker

Date posted: 2020-07-08

If you develop proprietary software with Python, you will likely have to create and host private Python packages. One way to host them is by running your own PyPI server and protect it with a VPN and/or authentication, but I think the simplest way to do this, which eliminates the need for VMs and setting up an authentication system, is by using private Git repositories to host packages. The idea is simple: let GitHub (or whatever version control repository hosting you are using) handle the hosting and authentication, and get your package directly from there. Pip does support this functionality - all you have to do is authenticate with GitHub, and you will have access to the package.

Creating the package is relatively simple, but containerizing the application is a bit trickier. To do this, you will need to inject the SSH keys in your container in order to authenticate and download the packages. This is the tricky step - you don't want to leave your SSH keys in the container, since any attacker would get access to these keys if they gain access to the image. Even if you delete the key files after downloading the packages, your keys may still be available in the image layers, so this step needs extra care.

Creating the package

I'll start by creating a private repository that will contain the package. Here is the file tree:

.
└── packages
    └── mypackage1
        ├── mypackage1
        │   ├── __init__.py
        │   └── utils.py
        ├── requirements.txt
        └── setup.py

I made things a bit more complicated by nesting subdirectories, so I can have more than one package. The package I created is called mypackage1, and it has my setup file, the requirements file, and the code under the subdirectory packages/mypackage1/mypackage1.

Here are the file contents:

# requirements.txt
pandas===1.0.1
# utils.py
import pandas as pd

def print_a_table():
    table = pd.DataFrame({
        'col1': [1, 2, 3],
        'col2': ['a', 'b', 'c'],
    })
    print(table)
# setup.py
from setuptools import setup, find_packages

with open('requirements.txt') as req_file:
      install_reqs = req_file.readlines()

setup(name = 'mypackage1',
      description = 'Python utilities',
      packages = find_packages(),
      install_requires = install_reqs,
      zip_safe = False,
      python_requires = '>3.0.0')

The init.py file is empty.

There is not much going on there, but I think it is complex enough to see how this works. The task now is to install mypackage1 and run the print_a_table function.

Versioning

Releasing different versions of the package is very simple, since we can use git tags. I'm going to tag the package with the version v1.0 and push it to GitHub:

> git tag -a v1.0 -m "First release"

> git push origin v1.0
Enumerating objects: 1, done.
Counting objects: 100% (1/1), done.
Writing objects: 100% (1/1), 162 bytes | 162.00 KiB/s, done.
Total 1 (delta 0), reused 0 (delta 0), pack-reused 0
To github.com:hscasn/myprivatepypackage.git
 * [new tag]         v1.0 -> v1.0

All done. This process is very easy to automate with a CI/CD pipeline. Now it can be used.

Installing the package locally

Let's try downloading the package locally. Here is a very simple test:

.
├── main.py
└── requirements.txt

Just a Python file and a requirements file.

# main.py

# Here we will import my package and call the print_a_table function

from mypackage1 import utils

utils.print_a_table()

And now for the requirements file (which you can find the full documentation here). This is what my file looks like:

# requirements.txt

# github.com/hscasc/myprivatepackage - this is my repository
# @v1.0 - This is how I specify the version
# #subdirectory=packages/mypackage1 - Path to the root of my package (setup.py)
git+ssh://git@github.com/hscasn/myprivatepypackage.git@v1.0#subdirectory=packages/mypackage1

It won't be the prettiest line in your requirements.txt file, but it will work. All we have to do is install the package with pip install -r requirements.txt and then run the main file:

> python3 main.py                                                                              19:52:59
   col1 col2
0     1    a
1     2    b
2     3    c

Running it locally is one thing, but dockerizing the application is more difficult.

Installing the package in a Docker container

As this article explains it, adding using an SSH key inside a Dockerfile is dangerous even if you delete the key after using it, since the key can still be retrieved in the intermediate layers of the image. Using the squash option is also not optimal. The best way to guarantee your keys will not be leaked is by using a multi-stage build.

So here is the plan: the first step of the Dockerfile will receive the SSH key and set up the authentication, after this, we can download the dependencies. With the dependencies, we can jump to the next stage of the build, which will copy over the directory where the dependencies were saved.

Here is my Dockerfile with a multi-stage build:

FROM python:3.8 AS build

# Install git
RUN apt-get update -y && \
    apt-get install -y git

# Add SSH keys and assure the domain is in the known_hosts
ARG SSH_PRIVATE_KEY
RUN mkdir -p /root/.ssh/
RUN echo "${SSH_PRIVATE_KEY}" > /root/.ssh/id_rsa
RUN chmod 600 /root/.ssh/id_rsa
RUN touch /root/.ssh/known_hosts
RUN ssh-keyscan github.com >> /root/.ssh/known_hosts

# Installing requirements
# Note: the --user flag is important here, so the dependencies
# will be saved under /root/.local
ADD ./requirements.txt requirements.txt
RUN python3 -m pip install --user -r requirements.txt

#------------------------------
# Production image

FROM python:3.8-slim-buster

COPY . .

# Retrieve the packages and add their path to the ENV
RUN mkdir -p /root/.local
COPY --from=build /root/.local /root/.local
ENV PATH=/root/.local/bin:$PATH

ENTRYPOINT ["python", "main.py"]

In multi-stage builds, each stage starts completely blank - the modifications we make in one stage do not affect others. If we want to copy files from previous stages, we need to do it explicitly. In the first stage (which I called build) we configure the keys and download the packages, which are saved under /root/.local. In the last stage of the build, (the part I commented as "production image"), I copied the files downloaded in the build stage with COPY --from=build /root/.local /root/.local. This means that the downloaded packages get transferred to the second stage, but not the SSH keys.

With the dockerfile ready, we can build the image:

docker build \
    -t testmypackage \
    --build-arg SSH_PRIVATE_KEY="$(cat ~/path/to/key_id_rsa)" \
    .

And now we can run it:

> docker run --rm testmypackage
   col1 col2
0     1    a
1     2    b
2     3    c

If you are running this in a CI/CD pipeline, it is a good idea to create a dedicated SSH key for this step.