2023-Mar

Health Checks

Motivation

Adding a health check to your application is important for ensuring that your application is running smoothly and responding to requests properly. Health checks are used to monitor the status of your application and determine if it is healthy or not. By implementing a health check, you can:

Monitor application uptime: Ensure that it is available to users, this is important for applications that are critical to business operations, as downtime can result in lost revenue, decreased productivity, and decreased user satisfaction.
Detect issues early: Pickup issues before they become critical. By regularly monitoring the status of your application, you can quickly identify and address any problems that arise, such as server errors or network connectivity issues.
Improve application performance: Identify performance issues in your application and optimize it for better performance. By monitoring response times and resource utilization, you can identify bottlenecks and optimize your application for better performance.
Enable automatic failover: In conjunction with load balancers you can enable automatic failover. If a health check determines that an instance of your application is unhealthy, the load balancer can automatically redirect traffic to a healthy instance, ensuring that your application remains available to users.

HTTP or TCP?

TCP and HTTP are both protocols used for communication over a network, but they serve different purposes. TCP provides a reliable, ordered, and error-checked delivery of data, while HTTP is a higher-level application protocol that is typically used for web browsing, file transfer, and other web-based applications.

When it comes to health checks in a background service running in a Kubernetes cluster, using TCP as the health check protocol is generally preferred over HTTP for a few reasons:

Simplicity: TCP health checks are simpler to implement and have lower overhead than HTTP health checks. With TCP, you simply establish a connection to the target service and check if it responds with a successful handshake. With HTTP, you need to send an HTTP request and wait for an HTTP response, which requires more processing and can be slower.
Speed: TCP health checks are generally faster than HTTP health checks because they require fewer network round trips. Since health checks are typically performed frequently, minimizing the time spent on health checks can help improve the overall performance and responsiveness of the system.
Accuracy: TCP health checks provide a more accurate view of the availability of the service, since they check if the underlying network connection is available and responsive. With HTTP health checks, a successful response may not necessarily mean that the service is fully operational or responsive.

That being said, there may be situations where using HTTP as the health check protocol is more appropriate. For example, if your service relies on specific HTTP endpoints for functionality, using HTTP health checks may provide more insight into the service’s health and readiness. Ultimately, the choice of health check protocol depends on the specific requirements and constraints of your system.

Probes

In .Net both of these are added with .Services.AddHealthChecks().AddCheck<T> which can be confusing but they are slightly different. Ive seen some teams define these as:

Health: Check ALL dependencies
Ready: Check the critical dependencies

I dont really feel that both are needed, perhaps just the Health check is the most valuable.

Readiness (Health)

K8s checks the service is healthy and downstream dependencies are avalible

aka healthz or healthcheck or _health for the local app
- downstream API dependencies you can just call their /ping
- Database Health would try connect to the DB and run something like SELECT 1)
Example readiness for HTTP API Health Checks (C#)

Liveness (Ready)

In terms of K8s it needs to check the container health (this is not the application health), if the application is dead then K8s will remove the pod and spin up a replacement.

aka ready
Examples
- API you can just have a /ready endpoint returning Ok() or check the critical dependencies, K8s can just use the /ping endpoint
- Worker example: TCP Socket Action Probe In Worker (C#)

References

Kubernetes best practices: Setting up health checks with readiness and liveness probes