Motivation
I recently dealt with an application that is comprised of multiple services running in containers. Even though every part of this application is correctly split into each separated microservice, the independence of each service is not enforced.
This lack of independence has several drawbacks, one of which is that containers must be started by following a pre-defined startup order. Otherwise, some containers might be terminated due to an application error (the application breaks when an unexpected error occurs, e.g. it is relying on another linked service that is not ready to accept the connection).
Not all applications suffer from this kind of problem: the application I was dealing with was not born with microservices in mind, but it was rather split and converted to separate containers across its lifetime. But it is not the only application that has this particular limit, for sure other applications out there are converted into a Franken-microservice-stein “monster”.
Workarounds
I am going to explore what are the possible workarounds to define and follow a startup order when launching containerized applications that span across multiple containers.
Depending on the scenario, it is possible that we do not want (or we cannot) change the containers and the application itself: there are multiple reasons behind these factors, namely:
- the complexity of the application
- whether the sources are available
- if changes to the
Dockerfile
s are possible (especiallyENTRYPOINT
s) - the time required to change the architecture of the application
docker-compose
and healthcheck
Using docker-compose
, we can specify:
- a
healthcheck
: it specifies what is thetest
(command) to check if the container is working. Thetest
is executed at intervals (interval
) and retriedretries
times:
db:
image: my-db-image
container_name: db-management
ports:
- 31337:31337
healthcheck:
test: ["CMD", "curl", "-fk", "https://localhost:31337"]
interval: 300s
timeout: 400s
retries: 10
- a
depends_on
field to describe to start the container after the dependency has been started and arestart_on_failure
:
web:
image: my-web-image
restart: on-failure
depends_on:
- db
links:
- db
What is happening here?
docker-compose
starts the service and starts thedb
container first (theweb
one depends on it)- the
web
container is started shortly after (it does not wait fordb
to be ready, because it does not know what “ready” means for us). Until thedb
container is ready to accept connections, theweb
container will be restarted (restart: on-failure
). - the
db
service is marked ashealthy
as soon ascurl -fk https://localhost:31337
returns 0 (thedb-management
image ships with an HTTP controller, and it returns 0 only when the database is ready to accept the connections). Marking the service ishealthy
means that service is working as expected (because thetest
is returning what we are expecting). When the service is no longer healthy, the container must be restarted and other policies and actions might be introduced.
NOTE: in docker-compose
reference < 3, depends_on
could also wait for the health checks, but starting from docker-compose
reference specification version 3, depends_on
can only accept other services as parameters in docker-compose
.
This solution is not ideal, as the web
container is restarted until the dependency is satisfied: that can be a huge problem if we are using that container for running tests, as a container exiting because of failure can be assimilated as failed tests.
wait-for-it
wrapper script
This approach is slightly better than the previous, but it is still a workaround. We are going to use docker-compose
and the wait-for-it
script.
In the docker-compose.yml
file we insert a depends_on
(as described in the previous section) and a command
:
db:
container_name: db-management
ports:
- 31337:31337
healthcheck:
test: ["CMD", "curl", "-fk", "https://localhost:31337"]
interval: 300s
timeout: 400s
retries: 10
web:
image: my-web-image
depends_on:
- db
links:
- db
command: ["./wait-for-it.sh", "db:31337", "--", "./webapp"]
The wait-for-it
script waits for host:port
to be open (TCP only). Again, this does not guarantee that the application is ready to serve but, compared to the previous workaround, we are not restarting the web
container until its dependency is ready.
One drawback of this workaround is that it is invasive: it requires the container image to be rebuilt by adding the wait-for-it
script (you can use a multi-stage build to do so).
Re-architect the application
This is not a workaround but it is rather the solution, and the best one we can achieve. It takes effort and it might cost a lot: the application architecture needs to be modified to make it resilient against failures. There are no general guidelines on how to successfully re-architect an application to be failproof and microservice ready, even though I strongly suggest to follow the 12 guidelines expressed in the 12-factor applications website.