At the moment, we are copying our entire application code, installing its dependencies, and then building the application.
But what if I make changes to my application code, but do not introduce any new dependencies? In our current approach, we’d have to run all three steps again, and the RUN ["yarn"] step is likely going to take a long time as it has to download thousands of files:
COPY --chown=node:node . .
RUN ["yarn"]
RUN ["yarn", "run", "build"]
Fortunately, Docker implements a clever caching mechanism. Whenever Docker generates an image, it stores the underlying layers in the filesystem. When Docker is asked to build a new image, instead of blindly following the instructions again, Docker will check its existing cache of layers to see if there are layers it can simply reuse.
As Docker steps through each instruction, it will try to use the cache whenever possible, and will only invalidate the cache under the following circumstances:
- Starting from the same parent image, there are no cached layers that were built with exactly the same instruction as the next instruction in our current Dockerfile.
- If the next instruction is ADD or COPY, Docker will create a checksum for each file, based on the contents of each file. If any of the checksums do not match the ones in the cached layer, the cache is invalidated.
Therefore, we can modify the preceding three instructions (COPY, RUN, RUN) to the following four instructions:
COPY --chown=node:node ["package*.json", "yarn.lock", "./"]
RUN ["yarn"]
COPY --chown=node:node . .
RUN ["yarn", "run", "build"]
Now, if our dependencies that are specified solely inside our package.json, package-log.json, and yarn.lock files, have not changed, then the first two steps here will not be run again. Instead, the cached layer that we previously generated will be used.