I’m using docker a lot to build and deploy the software that I (try to) write. I’m also writing a lot of Python. And one of the things that really annoy me is the size of docker images. Especially python. I often laugh about the size of a hello world in Go. But in Go you can deploy your application in docker with no extra cost. So, compiling this hello world in go :
//test.go
package main
import "fmt"
func main() {
fmt.Println("Hello, World!")
}
Leads to a 2 MB binary :
╭─remi@laptop ~/test/go
╰─$ ls -alih
total 2.0M
8259338 drwxr-xr-x 2 remi remi 4.0K Apr 18 10:15 .
7471105 drwx------ 56 remi remi 4.0K Apr 19 14:40 ..
8258966 -rw-r--r-- 1 remi remi 61 Apr 16 08:44 Dockerfile
8258953 -rwxr-xr-x 1 remi remi 2.0M Apr 16 08:43 test
8258965 -rw-r--r-- 1 remi remi 72 Apr 16 08:43 test.go
And the corresponding Dockerfile
is quite straight forward :
FROM scratch
COPY test /
USER 1325
ENTRYPOINT [ "/test" ]
PS : Don’t bother about USER
.
╭─remi@laptop ~/test
╰─$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
test-go latest d9663a505be5 8 seconds ago 2MB
Ok, 2 MB, Not bad finally. A Real world app that follow this principle, traefik
, is 71 MB large. And
it’s just the binary. And one layer.
So now. I’ll build a hello world in Python :
print("Hello, World!")
Ok, then docker:
FROM python:3.7
COPY . /
ENTRYPOINT ["python3", "/test.py"]
Build it and then :
╭─remi@laptop ~/test/go
╰─$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
test-python latest d85038701a7e 10 seconds ago 929MB
Yeah 929 MB. That’s large. And we didn’t install anything from pip
. So, you’ll tell me “Yeah, use alpine then!”. Sure but when I use python alpine:
- I have a not well supported libc (musl), maintained by one person;
- We have to build most of the package that have binary (like
psycopg2
) and I really don’t like building python extension, it’s kind of fragile. So I need a python that is build and linked against glibc (sorry musl and uclibc).
Idea phase
So I had this idea :
Python is C right ? But Python is linked dynamically. Can we build python statically ?
The answer is yes. But there are trades off. Large trade off, like no crypto. I wanted to build a really small image of python in order to build upon it, not to build ultra embeddable stuff.
A note about the python image
Have you ever looked at the python image build upon debian/ubuntu. This scary. On the python:3.7
based on debian stretch, you have python2.7
embed in the debian image and also python3.5
because dev dependencies download to build python3.7
require python3
, so it downloads python3.5
from debian repos. I can continue like this for a long time. But you get the point. This image is unnecessarily heavy. So let’s build a small python image.
First Attempt : using minideb
image
I thought that I could build on the bitnami/minideb
image. But I end up with a super large image
without wanting it.
Second attempt : building it myself
I end up with this Dockerfile
:
FROM debian:stretch as builder
RUN set -ex; \
apt-get update; \
apt-get install -y --no-install-recommends \
autoconf \
automake \
bzip2 \
dpkg-dev \
file \
g++ \
gcc \
imagemagick \
libbz2-dev \
libc6-dev \
libcurl4-openssl-dev \
libdb-dev \
libevent-dev \
libffi-dev \
libgdbm-dev \
libgeoip-dev \
libglib2.0-dev \
libgmp-dev \
libjpeg-dev \
libkrb5-dev \
liblzma-dev \
libmagickcore-dev \
libmagickwand-dev \
libncurses5-dev \
libncursesw5-dev \
libpng-dev \
libpq-dev \
libreadline-dev \
libsqlite3-dev \
libssl-dev \
libtool \
libwebp-dev \
libxml2-dev \
libxslt-dev \
libyaml-dev \
make \
patch \
unzip \
xz-utils \
zlib1g-dev \
\
# https://lists.debian.org/debian-devel-announce/2016/09/msg00000.html
$( \
# if we use just "apt-cache show" here, it returns zero because "Can't select versions from package 'libmysqlclient-dev' as it is purely virtual", hence the pipe to grep
if apt-cache show 'default-libmysqlclient-dev' 2>/dev/null | grep -q '^Version:'; then \
echo 'default-libmysqlclient-dev'; \
else \
echo 'libmysqlclient-dev'; \
fi \
) \
; \
rm -rf /var/lib/apt/lists/*
ENV LANG C.UTF-8
# extra dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
tk-dev \
uuid-dev \
git \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /opt
RUN git clone -b 3.7 --depth=1 https://github.com/python/cpython.git --single-branch
WORKDIR /opt/cpython
RUN ./configure
RUN make -j8
RUN make install -j8
FROM busybox:glibc
COPY --from=builder /usr/local/lib/python3.7 /usr/local/lib/python3.7
COPY --from=builder /usr/local/lib/libpython3.7m.a /usr/local/lib/
COPY --from=builder /usr/local/bin /usr/local/bin
COPY --from=builder /lib/x86_64-linux-gnu/*.so.* /lib/
COPY --from=builder /usr/lib/*.so.* /usr/lib/
CMD ["/usr/local/bin/python3.7"]
The beginning of this Dockerfile
For the reader that knows how python is build, in the official docker image, libpython
is linked
as a dynamic lib (libpython3.7m.so
) and with my version it’s linked statically (libpython3.7m.a
, the lib, not the
python exe!). This was not always the case, it has been changed to be embeddable issue
here.
Ok So it works. The image is 218 MB large. But we have to build python. So can we rely on the
already built binary ? We could also have libpython.so
which are nice for some usage.
Third Attempt : just copy it dude
FROM python:3.7-stretch as base
FROM busybox:glibc
COPY --from=builder /usr/local/lib/python3.7 /usr/local/lib/python3.7
COPY --from=builder /usr/local/lib/libpython3.7m.a /usr/local/lib/
COPY --from=builder /usr/local/bin /usr/local/bin
COPY --from=builder /lib/x86_64-linux-gnu/*.so.* /lib/
COPY --from=builder /usr/lib/*.so.* /usr/lib/
ENV PYTHONHOME /usr/local
ENV LD_LIBRARY_PATH /usr/local/lib
CMD ["/usr/local/bin/python3.7"]
I added LD_LIBRARY_PATH
because in busybox, the .so
file are located in /lib
and lddconfig
or ldconfig
are not available. Not an expert but ld.so.conf
doesn’t seem to be present to, and I thought it was deprecated (but not sure about it).
We now have a 114 MB image. Nice !
Fourth Attempt: Just copying the right lib
Run python3
in docker :
╭─remi@laptop test/
╰─$ docker run -ti --rm python:3.7 bash
root@aba27e7938c4:/# ldd /usr/local/bin/python3.7
linux-vdso.so.1 (0x00007ffc41337000)
libpython3.7m.so.1.0 => /usr/local/lib/libpython3.7m.so.1.0 (0x00007f8208fc6000)
libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f8208d8e000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f8208b71000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f820896d000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f820876a000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f8208466000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f82080c7000)
/lib64/ld-linux-x86-64.so.2 (0x00007f8209722000)
Python binary need this lib, in order to work. The problem is, it will work, until you
import, let see, pip, which use ssl. And ssl will need libcrypto
which is not included here…
Since I want the image to be embeddeable
Conclusion
As we can see the Docker image for python that is not alpine are not optimized at all. The stuff I’m not really sure about is : what about copying lib that have been build with a glibc, and use another a runtime ? I didn’t dig deep enough to see the difference between the debian glibc and the busybox one. This will be for another time :-).