Skip to main content

A DEEP DIVE INTO THE OFFICIAL DOCKER IMAGE FOR PYTHON

 

A DEEP DIVE INTO THE OFFICIAL DOCKER IMAGE FOR PYTHON

The official Python image for Docker is quite popular, and in fact I recommend one of its variations as a base image. But many people don’t quite understand what it does, which can lead to confusion and brokenness.

In this post I will therefore go over how it’s constructed, why it’s useful, how to use it correctly, as well as its limitations. In particular, I’ll be reading through the python:3.8-slim-buster variant, as of August 19, 2020, and explaining it as I go along.

Reading the Dockerfile

The base image

We start with the base image:

FROM debian:buster-slim

That is, the base image is Debian GNU/Linux 10, the current stable release of the Debian distribution, also known as Buster because Debian names all their releases after characters from Toy Story. In case you’re wondering, Buster is Andy’s pet dog.

So to begin with, this is a Linux distribution that guarantees stability over time, while providing bug fixes. The slim variant has less packages installed, so no compilers for example.

Environment variables

Next, some environment variables. The first makes sure /usr/local/bin is early in the $PATH:

# ensure local python is preferred over distribution python

ENV PATH /usr/local/bin:$PATH

Basically, the Python image works by installing Python into /usr/local, so this ensures the executables it installs are the default ones used.

Next, the locale is set:


# http://bugs.python.org/issue19846

# > At the moment, setting "LANG=C" on a Linux system *fundamentally breaks Python 3*, and that's not OK.

ENV LANG C.UTF-8

As far as I can tell modern Python 3 will default to UTF-8 even without this, so I’m not sure it’s necessary these days.

There’s also an environment variable that tells you the current Python version:

ENV PYTHON_VERSION 3.8.5

And an environment variable with a GPG key, used to verify the Python source code when it’s downloaded.

Runtime dependencies

In order to run, Python needs some additional packages:

RUN apt-get update && apt-get install -y --no-install-recommends \

ca-certificates \

netbase \

&& rm -rf /var/lib/apt/lists/*

The first, ca-certificates, is the list of standard certificate authorities’s certificates, comparable to what your browser uses to validate https:// URLs. This allows Python, wget, and other tools to validate certificates provided by servers.

The second, netbase, installs a few files in /etc that are needed to map certain names to corresponding ports or protocols. For example, /etc/services maps service names like https to corresponding port numbers, in this case 443/tcp.

Installing Python

Next, a compiler toolchain is installed, Python source code is downloaded, Python is compiled, and then the unneeded Debian packages are uninstalled:

RUN set -ex \

\

&& savedAptMark="$(apt-mark showmanual)" \

&& apt-get update && apt-get install -y --no-install-recommends \

dpkg-dev \

gcc \

libbluetooth-dev \

libbz2-dev \

libc6-dev \

libexpat1-dev \

libffi-dev \

libgdbm-dev \

liblzma-dev \

libncursesw5-dev \

libreadline-dev \

libsqlite3-dev \

libssl-dev \

make \

tk-dev \

uuid-dev \

wget \

xz-utils \

zlib1g-dev \

# as of Stretch, "gpg" is no longer included by default

$(command -v gpg > /dev/null || echo 'gnupg dirmngr') \

\

&& wget -O python.tar.xz "https://www.python.org/ftp/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz" \

&& wget -O python.tar.xz.asc "https://www.python.org/ftp/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz.asc" \

&& export GNUPGHOME="$(mktemp -d)" \

&& gpg --batch --keyserver ha.pool.sks-keyservers.net --recv-keys "$GPG_KEY" \

&& gpg --batch --verify python.tar.xz.asc python.tar.xz \

&& { command -v gpgconf > /dev/null && gpgconf --kill all || :; } \

&& rm -rf "$GNUPGHOME" python.tar.xz.asc \

&& mkdir -p /usr/src/python \

&& tar -xJC /usr/src/python --strip-components=1 -f python.tar.xz \

&& rm python.tar.xz \

\

&& cd /usr/src/python \

&& gnuArch="$(dpkg-architecture --query DEB_BUILD_GNU_TYPE)" \

&& ./configure \

--build="$gnuArch" \

--enable-loadable-sqlite-extensions \

--enable-optimizations \

--enable-option-checking=fatal \

--enable-shared \

--with-system-expat \

--with-system-ffi \

--without-ensurepip \

&& make -j "$(nproc)" \

LDFLAGS="-Wl,--strip-all" \

&& make install \

&& rm -rf /usr/src/python \

\

&& find /usr/local -depth \

\( \

\( -type d -a \( -name test -o -name tests -o -name idle_test \) \) \

-o \( -type f -a \( -name '*.pyc' -o -name '*.pyo' -o -name '*.a' \) \) \

-o \( -type f -a -name 'wininst-*.exe' \) \

\) -exec rm -rf '{}' + \

\

&& ldconfig \

\

&& apt-mark auto '.*' > /dev/null \

&& apt-mark manual $savedAptMark \

&& find /usr/local -type f -executable -not \( -name '*tkinter*' \) -exec ldd '{}' ';' \

| awk '/=>/ { print $(NF-1) }' \

| sort -u \

| xargs -r dpkg-query --search \

| cut -d: -f1 \

| sort -u \

| xargs -r apt-mark manual \

&& apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false \

&& rm -rf /var/lib/apt/lists/* \

\

&& python3 --version

There’s a lot in there, but the basic outcome is:

  1. Python is installed into /usr/local.

  2. All .pyc files are deleted.

  3. The packagesgcc and so on—needed to compile Python are removed once they are no longer needed.

Because this all happens in a single RUN command, the image does not end up storing the compiler in any of its layers, keeping it smaller.

One thing you might notice is that Python requires libbluetooth-dev to compile. I found this surprising, so I asked, and apparently Python can create Bluetooth sockets, but only if compiled with this package installed.

Setting up aliases

Next, /usr/local/bin/python3 gets an alias /usr/local/bin/python, so you can call it either way:

# make some useful symlinks that are expected to exist

RUN cd /usr/local/bin \

&& ln -s idle3 idle \

&& ln -s pydoc3 pydoc \

&& ln -s python3 python \

&& ln -s python3-config python-config

Installing pip

The pip package download tool has its own release schedule, distinct from Python’s. For example, this Dockerfile is installing Python 3.8.5, released in July 2020. pip 20.2.2 was released in August, after that, but the Dockerfile makes sure to include that newer pip:

# if this is called "PIP_VERSION", pip explodes with "ValueError: invalid truth value '<VERSION>'"

ENV PYTHON_PIP_VERSION 20.2.2

# https://github.com/pypa/get-pip

ENV PYTHON_GET_PIP_URL https://github.com/pypa/get-pip/raw/5578af97f8b2b466f4cdbebe18a3ba2d48ad1434/get-pip.py

ENV PYTHON_GET_PIP_SHA256 d4d62a0850fe0c2e6325b2cc20d818c580563de5a2038f917e3cb0e25280b4d1


RUN set -ex; \

\

savedAptMark="$(apt-mark showmanual)"; \

apt-get update; \

apt-get install -y --no-install-recommends wget; \

\

wget -O get-pip.py "$PYTHON_GET_PIP_URL"; \

echo "$PYTHON_GET_PIP_SHA256 *get-pip.py" | sha256sum --check --strict -; \

\

apt-mark auto '.*' > /dev/null; \

[ -z "$savedAptMark" ] || apt-mark manual $savedAptMark; \

apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false; \

rm -rf /var/lib/apt/lists/*; \

\

python get-pip.py \

--disable-pip-version-check \

--no-cache-dir \

"pip==$PYTHON_PIP_VERSION" \

; \

pip --version; \

\

find /usr/local -depth \

\( \

\( -type d -a \( -name test -o -name tests -o -name idle_test \) \) \

-o \

\( -type f -a \( -name '*.pyc' -o -name '*.pyo' \) \) \

\) -exec rm -rf '{}' +; \

rm -f get-pip.py

 

Again, all .pyc files are deleted.

The entrypoint

Finally, the Dockerfile specifices the entrypoint:

CMD ["python3"]

By using CMD with an empty ENTRYPOINT, you get python by default when you run the image:

$ docker run -it python:3.8-slim-buster

Python 3.8.5 (default, Aug  4 2020, 16:24:08)

[GCC 8.3.0] on linux

Type "help", "copyright", "credits" or "license" for more information.

>>> 

But, you can also can specify other executables if you want:

$ docker run -it python:3.8-slim-buster bash

root@280c9b73e8f9:/# 

What have we learned?

Again, focusing specifically on the slim-buster variant, here are some takeaways.

The python official image includes Python

While this point may seem obvious, it’s worth noticing how it’s included: it’s a custom install in /usr/local.

A common mistake for people using this base image is to install Python again, by using Debian’s version of Python:

FROM python:3.8-slim-buster


# THIS IS NOT NECESSARY:

RUN apt-get update && apt-get install python3-dev

That installs an additional Python install in /usr, rather than /usr/local, and it will typically be a different version of Python. You probably don’t want two different versions of Python in the same image; mostly it just leads to confusion.

If you really want to use the Debian version of Python, use debian:buster-slim as the base image instead.

The python official image includes the latest pip

For example, the last release of Python 3.5 was in November 2019, but the Docker image for python:3.5-slim-buster includes pip from August 2020. This is (usually) a good thing, it means you get the latest bug fixes, performance improvements, and support for newer wheel variants.

The python official image deletes all .pyc files

If you want to speed up startup very slightly, you may wish to compile the standard library source code to .pyc in your own image with the compileall module.

The python official image does not install Debian security updates

While the base debian:buster-slim and python images do get regenerated often, there are windows where a new Debian security fix has been released, but the images have not been regenerated. You should install security updates to the base Linux distribution.



Comments

Popular posts from this blog

How to secure PayPal

How to secure PayPal By- Aarti Jatan Your online finances need proper protection. Learn how to secure your PayPal account. With hundreds of millions of users around the world, PayPal has long been an international leader in the electronic payments industry. But as we know, money never fails to attract fraud, especially now, with as much of life as possible taking place online. Here is what you need to do to stay safe when sending or receiving money through PayPal. How secure is PayPal? As a matter of fact, PayPal is quite a reliable platform that maintains a high level of security — and keeps improving it. Thus, the company has an official program deploying white hat hackers to unearth vulnerabilities (the so-called bug bounty), under which it has already paid out almost $4 million since 2018. The program also covers several other services owned by PayPal, such as Venmo. PayPal also treats its users’ data responsibly: It did have one reliably reported leak, in 2017, but the leak invol...

Five regular checks for SMBs

Five regular checks for SMBs By- Aarti Jatan Five things that, if neglected, can cost SMBs dearly. It is not always economically viable for small and medium-size businesses to maintain a dedicated IT security team, so it often happens that one person is in charge of monitoring the entire infrastructure. Sometimes he or she is not even a permanent, full-time employee. Sure, a good administrator can do a lot, but even a pro might miss something, particularly if issues are mounting and time is short. So, it’s worth establishing a few habits. Here are our Top 5 regular checks. Renew the corporate site security certificate Any website that requests or processes user data must have an SSL certificate. It protects information entered by visitors from being intercepted, and almost all modern browsers  warn  users that sites without an SSL certificate are insecure. That can scare off potential customers. Your website most likely has an SSL certificate, but its validity period is limite...