Docker Pipeline: Multi-Stage Docker Builds

5 min readSep 19, 2022

https://blog.hypriot.com/post/build-smallest-possible-docker-image/

Docker building operations include many steps that I already mentioned in the previous links: Building a Dockerfile, Dockerizing a Simple Web App. Whenever a build process is started, basically we download an image, add the required libraries, specify under which folder we will be working, copy all required pages from the local environment to the docker environment and then build it. Mostly, the result of this build is used in many projects, since it is indeed enough, especially for the develop environments. However, whenever you decide to move from the development to the production environment, then you may not need to have all source codes, instead just the compiled or build version of the code. For C/C++, Java, etc. that requires a compiler, their source code should not be involved in the docker image. This is also valid for frontend projects such as NPM, since some of them build also the code and a distribution version is generated for the production environment. Shortly, the question is how we can eliminate the files that we do not need. This question is answered by the docker itself by introducing the multistep docker builds approach. This approach works like a pipeline, every time you pass the required information from one docker to another docker. In other words, the part of the previous docker build result would be the input of the next docker build operation.

In order to show how it works, there is no need to implement a good example, since there is enough from them. A typical and quite declarative instance is the running of a frontend application that is based on the node image and, after its build, it will be running on a nginx server. This example is given below:

FROM node:ferium as builder #1
WORKDIR /app  #2
COPY package.json . #3
RUN npm install #4
COPY . . #5
RUN npm run build #6# Second docker starts
FROM nginx #7
COPY --from=builder /app/builder /usr/share/nginx/html #8

Now, we can explain the steps indicated in the above text:

node:fermium image is downloaded usually from the hub.docker, if you do not use a separate repository host, which means you use still standards dockers, you may also adapt to your requirements, and all images stay on your server.
We specify the directory in which we will do the necessary operations. As long as we do not change the folder, all generated stuff will be available under this /appfolder. You may ask, how this file will be located in the docker image. Actually, node:fermium is Linux image, by saying /app, we put all files directly the /path, which is the root path.
Our intention is to prepare all required libraries for our software, package.json includes all these libraries, we copy it from our project directory to the /app/ folder.
All NPM libraries are installed
All source code files are copied from the project folder to /app/folder.
The source codes and libraries are available on the platform, now, we can build the project. All built files are stored under /app/build/ folder.

Now our intention is to run the files under /app/build/ folder on nginx server, rather than a develop server. We can actually install nginx directly here and run it as well, however, this is not clean approach, since we don’t need the node libraries or additional libraries that are provided by the node:fermium, rather we will utilize directly the nginx server

7. We take as a base image, nginx, if it is not available in the cache, it will be downloaded from the hub.docker server

8. All files under /app/build/ folder are copied from the first image to the second image under /usr/share/nginx/html/.Once this operation is completed, a nginx-based image will be generated and whenever it is run, you will have a production-ready website. In order to run the created image, you should execute the following command on the terminal.

docker run -p 8080:80 <image-id>

Other Multi-Stage Examples

The previous example was for a basic frontend development, and for java I will convey here a docker file from this link.

FROM maven:3.5.2-jdk-9 AS build 
COPY src /usr/src/app/src 
COPY pom.xml /usr/src/app 
RUN mvn -f /usr/src/app/pom.xml clean package  # Second docker image starts...
FROM openjdk:9 
COPY --from=build /usr/src/app/target/flighttracker-1.0.0-SNAPSHOT.jar /usr/app/flighttracker-1.0.0-SNAPSHOT.jar  
EXPOSE 8080 
ENTRYPOINT ["java","-jar","/usr/app/flighttracker-1.0.0-SNAPSHOT.jar"]

As you see, from the previous build image flighttracker-1.10.0.SNAPSHOT.jar file is copied to /usr/app/ folder with the same name.

Another multi-stage example is from the official docker web page as given below:

# syntax=docker/dockerfile:1
FROM golang:1.16 AS builder
WORKDIR /go/src/github.com/alexellis/href-counter/
RUN go get -d -v golang.org/x/net/html  
COPY app.go    ./
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .# Second docker image starts...
FROM alpine:latest  
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /go/src/github.com/alexellis/href-counter/app ./
CMD ["./app"]

And a final example for the c/c++ language from the Microsoft dev blog, which is more complex than other Dockerfile, however as you notice we have always the same structure.

FROM alpine:latest as build

LABEL description="Build container - findfaces"

RUN apk update && apk add --no-cache \ 
    autoconf build-base binutils cmake curl file gcc g++ git libgcc libtool linux-headers make musl-dev ninja tar unzip wget

RUN cd /tmp \
    && wget https://github.com/Microsoft/CMake/releases/download/untagged-fb9b4dd1072bc49c0ba9/cmake-3.11.18033000-MSVC_2-Linux-x86_64.sh \
    && chmod +x cmake-3.11.18033000-MSVC_2-Linux-x86_64.sh \
    && ./cmake-3.11.18033000-MSVC_2-Linux-x86_64.sh --prefix=/usr/local --skip-license \
    && rm cmake-3.11.18033000-MSVC_2-Linux-x86_64.sh

RUN cd /tmp \
    && git clone https://github.com/Microsoft/vcpkg.git -n \ 
    && cd vcpkg \
    && git checkout 1d5e22919fcfeba3fe513248e73395c42ac18ae4 \
    && ./bootstrap-vcpkg.sh -useSystemBinaries

COPY x64-linux-musl.cmake /tmp/vcpkg/triplets/

RUN VCPKG_FORCE_SYSTEM_BINARIES=1 ./tmp/vcpkg/vcpkg install boost-asio boost-filesystem fmt http-parser opencv restinio

COPY ./src /src
WORKDIR /src
RUN mkdir out \
    && cd out \
    && cmake .. -DCMAKE_TOOLCHAIN_FILE=/tmp/vcpkg/scripts/buildsystems/vcpkg.cmake -DVCPKG_TARGET_TRIPLET=x64-linux-musl \
    && make# Second image starts ...
FROM alpine:latest as runtime

LABEL description="Run container - findfaces"

RUN apk update && apk add --no-cache \ 
    libstdc++

RUN mkdir /usr/local/faces
COPY --from=build /src/haarcascade_frontalface_alt2.xml /usr/local/faces/haarcascade_frontalface_alt2.xml

COPY --from=build /src/out/findfaces /usr/local/faces/findfaces

WORKDIR /usr/local/faces

CMD ./findfaces

EXPOSE 8080

Summary

Docker multi-stage concept reduces the image size and by making it a small image, resource-constrained devices can run it, an IoT device as presented in this link can profit from this dimension reduce. You may notice, the title inherits Docker Pipeline, yes by doing multi-stage docker we may use different docker chains. From your point of views, what could be other usage areas?

References

A Tale of Two (Docker Multi-Stage Build) Layers

Production Ready Dockerfiles for Node.js

medium.com

Create lean Node.js image with Docker multi-stage build

Starting from Docker 17.05+, you can create a single Dockerfile that can build multiple helper images with compilers…

medium.com

Let's build the smallest possible Docker image

Imagine what happens if we're starting to use Docker containers on IoT devices. On small and slow devices with limited…

blog.hypriot.com

Using multi-stage containers for C++ development

Updated January 10, 2020: Corrected link to article source that was broken by refactoring in the repo. Containers are a…

devblogs.microsoft.com