Understanding concepts around Docker images and containers is crucial for anyone starting in cloud-native. Regardless if you're in development, DevOps or program management (or any other technical role :). Once you grasp the basics of Docker it will be so much easier for you to understand things like Kubernetes, service meshes and pretty much any other cloud-native tool works. You can think of this guide as the first practical guide to learning about cloud-native.
Understanding concepts around Docker images and containers is crucial for anyone starting in cloud-native. Regardless if you're in development, DevOps or program management (or any other technical role :). Once you grasp the basics of Docker it will be so much easier for you to understand things like Kubernetes, service meshes, and pretty much any other cloud-native tool works. You can think of this guide as the first practical guide to cloud-native.
What is Docker?
Docker containers were made popular by the company called Docker. As a concept, containers are not new. Containers existed in Linux for more than 10 years. It was Docker that made them more popular.
The idea behind containers is to divide an operating system in a way, that you can securely run multiple applications. The Linux features that made that possible are namespaces and cgroups. In short, using namespaces you can slice different components of an operating system and create an isolated workspace. The cgroups allow for fine-grained control of resources. For example. this prevents a single container to use up all resources.
What is the difference between virtual machines (VMs) and containers?
You can emulate a particular hardware system using a piece of software that runs on physical computers (hypervisor). A hypervisor is used to create and run virtual machines. It sits between the computer hardware and the virtual machine. Each virtual machine runs its guest operating system and it has its binaries, applications, etc. The virtual images are usually huge in terms of size and have a big memory footprint.
On the other hand, containers run directly on top of the host operating system. They share the operating system and kernel and the hardware is not virtualized.
Containers are more lightweight compared to virtual machines. They don't require a hypervisor which results in faster startup time. Container startup time is usually measured in seconds or less, where the virtual machine startup is measured in minutes.
Containers, images, and how to create them?
Before you can "run" or "create" a container, you need to create it. Just like with virtual machines - to run a virtual machine, you have to create an ISO image of the virtual machine.
A Docker image is a read-only template that contains instructions on how to create or run a container. A Docker image is created from a Dockerfile that looks like this:
FROM ubuntu:18.04WORKDIR /app
COPY hello.sh /app
RUN chmod +x hello.sh
RUN apt-get update
RUN apt-get install curl -y
CMD ["./hello.sh"]
In most cases, your images will be based on existing Docker images. In the above Dockerfile, we are basing the image from an existing image called ubuntu:18.04 which represents Ubuntu 18.04.
The remaining instructions in the Dockerfile are doing the following:
setting the working directory (note that this is the working directory inside the image, not on your host computer)
copying hello.sh file from the host computer to the /app folder inside the image
running a couple of commands
using CMD to set the default command for the container - i.e. this is what we want to execute when we run the container
Using a Dockerfile like the one above, we can create a Docker image. A Docker image is a collection of layers (one command in the Dockerfile = a layer in the image). The layers are stacked one on top of the other and they are all read-only, with the exception of a topmost, writeable layer.
You can think of Docker images as templates and containers as instances of those templates. When you run a container you are creating an instance of a Docker image with the writeable layer. Any changes that are made to the running container and made on the writeable layer. For example, if you're running an application inside the container that writes to a file, that file is stored on the writeable layer. One thing to remember here is that anything modifications you make to the writeable layer when your container is running will get lost once the container is stopped.
Features such as volumes can be used to store the data outside of a running container. When running a container you can specify and mount these volumes to use inside the container. In case when container is stopped, the data will remains as it's not part of the container.
Image naming
All Docker images are referenced by their names. The image name is made up of 3 parts: repository name, image name, and an image tag. The image tag (or version) can be omitted, however, any image without a tag gets the default tag called the latest. It's good practice to always use image tags and never rely on nor use the latest tag.
You can store Docker images on your computer. If you want to share them and make them available to others, you will have to upload (or push) the images to a Docker registry.
Later in the article, we will use an image called alpine:3.10.3. Notice how the image name is only made up of two parts. That's because it is an official image on the Docker hub. There's a collection of official images that are all named like that - without the repository. You can read more on official images here.
Docker registry
A Docker registry is a place where you can upload and store Docker images. Each cloud provider also offers a Docker or image registry as a managed service. You can also run and host your own registry, or use a free registry such as Docker Hub. Registries can be public or private. The public registry allows anyone to pull or download an image, while for the private registry you need to be authenticated.
Common scenarios
With basic concepts out of the way, let's look at a couple of common scenarios you will run into when working with Docker.
Build and push
Docker build refers to an act of taking the Dockerfile, a build context (folder with your code or files you want to potentially include in the image) and an image name, and using the Docker CLI to build an image. The result of this action is a Docker image.
Once you have a built image (or an existing image) you can push the image to a Docker registry. Pushing is simply uploading the image.
Using the Docker CLI, command to build an image would look like this:
docker build -t myrepository/imagename:0.1.0 .
The `-t` flag is used to provide the image name and the `.` at the end is how you provide a build context. Dot simply means that we want the current folder to be the build context. However, your build context can be a different folder.
What's the point of the build context?
Let's take the following instruction from the Dockerfile above:
COPY hello.sh /app
This instruction tells Docker CLI to copy hello.sh file from the host to the /app folder inside of the image. But how does Docker CLI know where that file is? Well, this is where the build context comes into play and it tells Docker to look for the hello.sh file in the build context.
To push an image to the registry you need to make sure you are logged in to your registry first and then you can run docker push myrepository/imagename:0.1.0
Pull and run
When the image is in the registry you can use the pull command to download the image from the registry. You provide the name of the image you want to pull and the Docker CLI goes to the registry and downloads the image to your local machine.
The command to pull an image using the Docker CLI is similar to the push command:
docker pull myrepository/imagename:0.1.0
Finally, you can run the Docker image and create a container from it. Note that you don't need to build the image, push it, and pull it in order to run in. If you don't wish to push the image to the registry, you don't have to. You can still use the run command to run the image.
Second note: if you want to run the image, you don't need to pull it first either. If you try to run the image that doesn't exist on your computer (i.e. you haven't pulled it yet), Docker CLI is smart enough to pull the image first and then run it.
Docker run command has a lot of options you can pass-in to control how the container is executed, if it exposes any ports to the host, mounts any volumes, you can also pass in environment variables an more.
An example Docker run command might look like this:
docker run --name mycontainername -p 5000:8080 myrepository/imagename:0.1.0
Note that in the above command we are naming the container (mycontainername), however, that is not required. If you don't provide the name, Docker will come up with a clever, random name for you: interesting_tharp , strange_hypatia, practical_morse, and so on.
Docker uses a list of scientists and notable hackers to come up with container names. You can check the list they use here. Note that your container will never be named boring_wozniak as Steve Wozniak is not boring :)
Docker in practice
With basic terminology and concepts out of the way, let's get our hands dirty and try Docker! As part of this section, you will install the Docker CLI, create a Docker hub account (if you don't have one already), and try different Docker CLI commands.
Installing Docker Desktop
Docker Desktop for Mac and Windows is a collection of tools that make working with Docker easier. It contains the Docker Engine, CLI client, and bunch of other tools.
Once you're signed in you can create different repositories. If you don't create a separate repository, you can just use your Docker Hub username. A repository name is the value that precedes the Docker image name. For example, one of my repositories is called learncloudnative. A full name for an image called hello-web with tag 0.1.0 would be learncloudnative/hello-web:0.1.0..
When you've completed Docker installation, open the terminal and run the docker version command:
$ docker version
Client: Docker Engine - Community
Version: 19.03.8
API version: 1.40
Go version: go1.12.17
Git commit: afacb8b
Built: Wed Mar 11 01:21:11 2020
OS/Arch: darwin/amd64
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 19.03.8
API version: 1.40 (minimum version 1.12)
Go version: go1.12.17
Git commit: afacb8b
Built: Wed Mar 11 01:29:16 2020
OS/Arch: linux/amd64
Experimental: true
containerd:
Version: v1.2.13
GitCommit: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683
The output of the command should be similar to the output above.
Dockerfile and other files
Create an empty folder somewhere on your computer and a file called Dockerfile with the following contents:
This Dockerfile copies a file called hello.sh to the image, makes it executable and then runs apk update and apk add curl and finally sets the hello.sh as a command to execute when the container runs.
Note
What is apk? apk is a tool used to managed software on Alpine.
Create the hello.sh file in the same folder as the Dockerfile:
#!/bin/shecho"Hello Docker!"
With this set up, we can start looking into Docker layers.
Docker layers
In this section, you will build your first Docker image, use the Docker CLI to inspect the layers.
We are going to start by building the Docker image called docker-layers. You can run the following command from the same folder your Dockerfile (and hello.sh file) is in:
docker build -t docker-layers:0.1.0 .
In the command output, you will see how the build context is sent to Docker daemon, the image is getting pulled and then each instruction from the Dockerfile is executed. The last line in the output should say that your image (docker-layers:0.1.0) was built and tagged.
I haven't mentioned image tagging before - you can think of it as renaming the image or creating a new image that references the old one. You can tag an image myimagename:0.1.0 for example as somethingelse:2.0.0 by running docker tag command. The command creates another image with a new name that references the original image. Note that Docker does not create another copy of the image.
You can run docker images to get the list of images that are on your machine.
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker-layers 0.1.0 6a5b8912d27f 5 minutes ago 8.31MB
alpine 3.10.3 965ea09ff2eb 6 months ago 5.55MB
If you haven't installed or used Docker before, there will be two images on your machine. The docker-layers image you just built and the alpine image that's used as a base image in the Dockerfile.
Let's also try to run the container to get the "Hello Docker!" output:
$ docker run docker-layers:0.1.0
Hello Docker!
When you build a Docker image, Docker creates a layer for each instructions in the Dockerfile. The nice thing about layers is that they can be reused, either when you're rebuilding the same image or by other Docker images as well. This means faster builds! The layers are stacked on top of each other and each layer only contains the differences between the preceding layer and the current one.
If you would build the image again, you will notice that the time it takes for the command to finish is significantly less than the first time. (Yes, excluding the time it took to download the image on the first try, the subsequent builds are still faster).
On my machine, the second Docker build took less than a second:
$ time docker build -t docker-layers:0.1.0 .
real 0m0.501s
user 0m0.088s
sys 0m0.061s
Let's look at the layers that are created for the image:
$ docker history docker-layers:0.1.0
IMAGE CREATED CREATED BY SIZE COMMENT
6a5b8912d27f 4 minutes ago /bin/sh -c #(nop) CMD ["./hello.sh"] 0B
3030475d0a23 4 minutes ago /bin/sh -c apk add curl 1.34MB
81cd9a8738f0 4 minutes ago /bin/sh -c apk update 1.42MB
e68bc418551f 4 minutes ago /bin/sh -c chmod +x hello.sh 30B
14d207dc283c 4 minutes ago /bin/sh -c #(nop) COPY file:c2c91b54b63f7c0e… 30B
4628741a4e97 4 minutes ago /bin/sh -c #(nop) WORKDIR /app 0B
965ea09ff2eb 6 months ago /bin/sh -c #(nop) CMD ["/bin/sh"] 0B
<missing> 6 months ago /bin/sh -c #(nop) ADD file:fe1f09249227e2da2… 5.55MB
Notice there are two separate layers created for the apk add curl and apk update command. We can make this more efficient by combining the two commands into a single command inside the Dockerfile.
Open the Dockerfile and combine the two lines like this:
FROM alpine:3.10.3
WORKDIR /app
COPY hello.sh /app
RUN chmod +x hello.sh
RUN apk update && apk add curl
CMD ["./hello.sh"]
Let's rebuild the image again and check the layers:
$ docker build -t docker-layers:0.1.0 .
...
$ docker history docker-layers:0.1.0
IMAGE CREATED CREATED BY SIZE COMMENT
8470cb0b2e17 19 seconds ago /bin/sh -c #(nop) CMD ["./hello.sh"] 0B
ed2a17906a01 19 seconds ago /bin/sh -c apk update && apk add curl 2.76MB
...
This time we have a single layer with the apk commands. Next, we will add another file to the image to see how it affects the layers.
Create a file called hello.txt with a simple message:
echo "Hello!" >> hello.txt
Next, let's update the Dockerfile as well to include this file in the image:
FROM alpine:3.10.3
WORKDIR /app
COPY hello.sh /app
RUN chmod +x hello.sh
COPY hello.txt /app
RUN apk update && apk add curl
CMD ["./hello.sh"]
I am copying the new file before running the apk command. If you rebuild the image again (docker build -t docker-layers) you will notice that the apk commands were executed again and not reused. The reason for that is the way layers are stacked. The new layer is a difference between the previous one and since we added the COPY command before running the apk we invalidated the previous cache.
The good news is that we can make this better. If we move the apk command right after the line with the FROM instruction our second layer in the image will be the apk layer. Let's see this in practice.
Move the RUN apk update && apk add curl command right under the first line in the Dockerfile:
FROM alpine:3.10.3
RUN apk update && apk add curl
WORKDIR /app
COPY hello.sh /app
RUN chmod +x hello.sh
COPY hello.txt /app
CMD ["./hello.sh"]
We need to rebuild the image again to create the layers. If you inspect the layers you will notice the apk command is closer to the bottom of the stack now. Let's prove that the command won't execute again if we add another file and rebuild the image.
Create a bye.txt file:
echo "Bye!" >> bye.txt
Add the COPY command to the bottom of the Dockerfile:
FROM alpine:3.10.3
RUN apk update && apk add curl
WORKDIR /app
COPY hello.sh /app
RUN chmod +x hello.sh
COPY hello.txt /app
COPY bye.txt /app
CMD ["./hello.sh"]
If you rebuild the image this time, you will notice that it is significantly faster. The reason is that Docker is re-using the layer and it's not re-running the apk commands anymore.
Understanding how layers work is important as you can significantly increase the speed of your Docker builds.
Pushing and tagging Docker images
You need to be logged in to the registry to push the Docker images, otherwise, the push command will fail. You can login to the registry through the Docker Desktop or use the docker login command from the terminal.
Once you're logged in, you are almost ready to push the image. In the previous section, we named the image as docker-layers:0.1.0 - without using a repository name. To push to the registry, the repository name is required. The repository name is the name you used when you signed up to the Docker hub.
We could rebuild the image again and provide a full name with the repository, however, it is faster to use the tag command and provide a new name for the image. Replace the [your-repository] with your repository name:
docker tag docker-layers:0.1.0 [your-repository]/docker-layers:0.1.0
If the command succeeded you won't see any output. Note that we could use the tag command to update other parts of the image name as well.
Now you can go ahead and push the image to the registry by running:
docker push [your-repository]/docker-layers:0.1.0
Docker pushes the image to the registry and anyone can pull or download it!
Pulling Docker images and running containers
Pulling or downloading an image from the registry can be done with the pull command. Let's pull the alpine image for example:
$ docker pull alpine
Using default tag: latest
latest: Pulling from library/alpine
cbdbe7a5bc2a: Pull complete
Digest: sha256:9a839e63dad54c3a6d1834e29692c8492d93f90c59c978c1ed79109ea4fb9a54
Status: Downloaded newer image for alpine:latest
docker.io/library/alpine:latest
Since we didn't provide a specific version for the image, Docker pulled the image with the :latest tag.
You could also use the run command directly and Docker will pull the image for us:
docker run alpine
Running the image with the above command will run the container, but it will exit right away. What we can do instead is to get a shell prompt inside the container to keep the container alive. We need to provide the -i and -t flags to make the container interactive and to allocate a pseudo TTY. Finally, we also need to provide a command we want to run when the container starts. Since we want a shell, we can run the shell - /bin/sh.
Let's try and run the command that gives us a shell inside the Alpine container:
docker run -it alpine /bin/sh
You will notice the prompt has changed and if you run env command you will see the environment variables are from the container, not your host machine.
To exit from the container you can type exit. Each running container also gets an ID - you can check that ID by opening a separate terminal window and running docker ps command:
$ dockerpsCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
32a19aa0c35f alpine "/bin/sh"2 minutes ago Up 2 minutes sharp_sanderson
Next, you can kill the container by this ID: docker kill 32a19aa0c35f. As the container dies, you will see the prompt in the other terminal window disappear.
Mapping the ports
Another fairly common task is running a container locally and mapping a container port to the port on the local machine. For example, you can run a service inside the container, but in order to access that service, you will need to map the container port to your host machine port so you can access it.
Let's use a simple Node.js application that's available in the learncloudnative/helloworld:0.1.0 image.
If you just run the image, you won't be able to access the application running inside it, as the port the application is listening on within the container is not mapped to your machine.
To map the container port to the host port, you can use the-p flag when running the container, like this:
docker run -p 8080:3000 learncloudnative/helloworld:0.1.0
Once the image is downloaded you will see the message saying "Listening on port 3000". This is the message from the application running inside the container that's listening on port 3000. With the -p flag we mapped container port 3000 to the host port 8080.
If you open your browser and navigate to http://localhost:8080 you will get the "Hello World!" web page. You will also notice any logs the container is writing in the terminal output. To stop running the container, you can press CTRL+C. Sometimes port mapping is also being referred to as 'exposing' a port or 'publishing' it. A Dockerfile can have an instruction called EXPOSE. The combination of referring to 'exposing ports' and the 'EXPOSE' instruction can be confusing. The purpose of the EXPOSE instruction is to document which port is the application running inside of a container listening on. For example, the Dockerfile for the helloworld image should have an EXPOSE 3000 instruction, as that's the port the application listens on.
So, even if you have the EXPOSE instruction in your Dockerfile, you still need to use the -p flag. You can use a different flag (-P) and in that case, Docker maps the port from the EXPOSE instruction to a random port on the host.
In that case, the run command looks like this:
docker run -P learncloudnative/helloworld:0.1.0
Note the absence of the actual port numbers. The container starts just like before, but now if you list all running containers you will notice a random port on the host (32768) is mapped to the container port (3000):
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
94cfd042df18 learncloudnative/helloworld:0.1.0 "npm start" 9 seconds ago Up 8 seconds 0.0.0.0:32768->3000/tcp hardcore_northcutt
Conclusion
In this article, I tried to explain what Docker is and how is it different from virtual machines, and some common concepts and terminology you will encounter when working or reading about Docker. I hope the tutorial at the end helped illustrate some of the concepts in practice.
There are many other Docker features and this article is just scratching the surface, but this should give you enough information to get you started. In one of the upcoming posts, I'll explain how to use Docker Volumes and Docker compose.
If you're interested to dive deeper and see more practical examples, you should check out the guide to gateways I wrote. That guide explains the basics you need to know about gateways and proxies and shows how to run a proxy using Docker compose in the practical section.