At the moment, we are copying the src directory into the container, and then using it to build our application. However, after the project is built, the src directory and other files like package.json and yarn.lock are not required to run the application:
$ docker exec -it 27459e1123d4 sh
~ $ pwd
/home/node
~ $ du -ahd1
4.0K ./.ash_history
588.0K ./dist
4.0K ./.babelrc
4.0K ./package.json
20.0K ./spec
128.0K ./yarn.lock
564.0K ./src
138.1M ./.cache
8.0K ./.yarn
98.5M ./node_modules
237.9M .
You can see that 138.1 MB is actually being used for the Yarn cache, which we don’t need. Therefore, we should remove these obsolete artifacts, and leave only the dist and node_modules directories.
After the RUN ["yarn", "run", "build"] instruction, add an additional instruction to remove the obsolete files:
RUN find . ! -name dist ! -name node_modules -maxdepth 1 -mindepth 1 -exec rm -rf {} \;
However, if you run docker build on this new Dockerfile, you may be surprised to see that the size of the image has not decreased. This is because each layer is simply a diff on the previous layer, and once a file is added to an image, it cannot be removed from the history.
To minimize the image's size, we must remove the artifacts before we finish with the instruction. This means that we must squash all of our installation and build commands into a single RUN instruction:
FROM node:8-alpine
USER node
WORKDIR /home/node
COPY --chown=node:node . .
RUN yarn && find . ! -name dist ! -name node_modules -maxdepth 1 -mindepth 1 -exec rm -rf {} \;
CMD ["node", "dist/index.js"]
Now, the image is just 122 MB, which is a 42% space saving!
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hobnob 0.1.0 fc57d9875bb5 3 seconds ago 122MB
However, doing so will forfeit the benefits we get from caching. Luckily, Docker supports a feature called multi-stage builds, which allows us to cache our layers, as well as have a small file size.