Spring Cloud Sleuth: A Simple Approach to Adding Distributed Tracing
So you and your team just migrated from a monolith application to a set of microservices built with Spring Boot and running on a distributed platform. With the old platform, maybe you were able to track incoming requests and each step of the process flow with a log file on that single server, but how do you manage that now with a distributed platform? Distributed tracing via Spring Cloud Sleuth is here to help!
Distributed tracing tags and monitors a variety of activities throughout a system (across multiple services) using a unique identifier. The power of distributed tracing is that this unique identifier is carried into each of the established services, and can be done through a variety of paths (e.g. RESTful requests, messaging protocols, etc.). If your service stack is built on Spring Boot, Spring Cloud Sleuth (https://spring.io/projects/spring-cloud-sleuth) is a great way to add distributed tracing into your application suite with very little additional configuration needed. Spring Cloud Sleuth works with a wide range of libraries and can also report back to a Zipkin instance (https://zipkin.io/), if so configured.
Implementation & Usage
To start, we need to add the following dependencies to the application’s pom.xml file (where release.train.version
is the latest stable Spring Cloud release train version, e.g. 2021.0.1):
<dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-dependencies</artifactId> <version>${release.train.version}</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-sleuth</artifactId> </dependency> </dependencies>
For most services, adding the Spring Cloud Sleuth dependency is enough to get distributed tracing up and running. For requests into the service, you’ll now see 2 new unique IDs added into log statements, appearing after the log level component:
2022-03-16 11:40:00.400 INFO [TestApplication,d8dff210398819ab,d8dff210398819ab] 74330 --- [nio-8080-exec-1] c.e.demo.controller.TestController : Received request for work
These IDs are the “trace” and “span” IDs, both of which are unique and auto-generated by Spring Cloud Sleuth if not provided by some upstream call. The trace ID is the parent ID (representing the entire request chain across all services), and the span ID is the unique ID for the current span (e.g. the current service, or a key logic block in a given service). With Spring Cloud Sleuth across multiple services, the trace ID is automatically included as a header in outgoing requests (in both HTTP and messaging protocols), so that trace ID can be carried into all downstream services. The default header propagation pattern is B3 (https://github.com/openzipkin/b3-propagation), so you’ll notice headers like X-B3-TraceId
and X-B3-SpanId
in HTTP requests, or b3
headers for messaging protocols.
With these IDs, you’ll be able to search on a centralized logging platform (e.g. Elastic/LogStash/Kibana stack, or AWS CloudWatch) to see all log statements across all services relevant to that specific request. These IDs can also be used for performance monitoring and debugging, as you can see the time it took to process the entire request, and the time it took to process each step of the request. Spans can be also be manually ended via simple Java code or Spring annotations, which is useful when wanting to target a specific block for a given span segment.
For performance monitoring, let’s say that you have a service that takes in an HTTP request, makes a database query, and then returns a response. With just the basic code, each piece of the process flow will be logged with the same trace & span ID:
2022-03-16 11:40:00.400 INFO [TestApplication,d8dff210398819ab,d8dff210398819ab] 74330 --- [nio-8080-exec-1] c.e.demo.controller.TestController : Received request 2022-03-16 11:40:00.401 INFO [TestApplication,d8dff210398819ab,d8dff210398819ab] 74330 --- [nio-8080-exec-1] c.e.demo.controller.TestController : Calling JPA method 2022-03-16 11:40:00.600 INFO [TestApplication,d8dff210398819ab,d8dff210398819ab] 74330 --- [nio-8080-exec-1] c.e.demo.dao.DataService : Starting DB query 2022-03-16 11:40:02.800 INFO [TestApplication,d8dff210398819ab,d8dff210398819ab] 74330 --- [nio-8080-exec-1] c.e.demo.dao.DataService : Ended DB query 2022-03-16 11:40:02.900 INFO [TestApplication,d8dff210398819ab,d8dff210398819ab] 74330 --- [nio-8080-exec-1] c.e.demo.controller.TestController : Finished JPA method 2022-03-16 11:40:03.100 INFO [TestApplication,d8dff210398819ab,d8dff210398819ab] 74330 --- [nio-8080-exec-1] c.e.demo.controller.TestController : Completed HTTP request
However, if we update that DAO object to include the “@NewSpan” annotation on the public method, we’ll get separate span IDs for that individual piece of the flow:
2022-03-16 11:40:00.400 INFO [TestApplication,d8dff210398819ab,d8dff210398819ab] 74330 --- [nio-8080-exec-1] c.e.demo.controller.TestController : Received request 2022-03-16 11:40:00.401 INFO [TestApplication,d8dff210398819ab,d8dff210398819ab] 74330 --- [nio-8080-exec-1] c.e.demo.controller.TestController : Calling JPA method 2022-03-16 11:40:00.600 INFO [TestApplication,d8dff210398819ab,c5d3e679e3ec9933] 74330 --- [nio-8080-exec-1] c.e.demo.dao.DataService : Starting DB query 2022-03-16 11:40:02.800 INFO [TestApplication,d8dff210398819ab,c5d3e679e3ec9933] 74330 --- [nio-8080-exec-1] c.e.demo.dao.DataService : Ended DB query 2022-03-16 11:40:02.900 INFO [TestApplication,d8dff210398819ab,d8dff210398819ab] 74330 --- [nio-8080-exec-1] c.e.demo.controller.TestController : Finished JPA method 2022-03-16 11:40:03.100 INFO [TestApplication,d8dff210398819ab,d8dff210398819ab] 74330 --- [nio-8080-exec-1] c.e.demo.controller.TestController : Completed HTTP request
With the separate span IDs, it can be easier to filter to just the log statements that are relevant for that piece of the flow (which can be incredibly useful when there are many log statements, or if that particular span covers a sizable number of classes).
Common Pitfalls
In my experience, there have only been a couple of scenarios where Spring Cloud Sleuth didn’t work as expected:
- If the application is using a custom log pattern, Spring Cloud Sleuth won’t be able to automatically apply the IDs into the log statements. The ID fields should either be added to the pattern, or switched to the standard pattern.
- Spring Cloud Sleuth can only instrument on Spring beans. If you end up creating a client via the “new” command, Spring Cloud Sleuth can’t hook into that library and the trace context will be lost.
- Good implementation:
@Autowired private RestTemplate restTemplate; public String makeARequest() { return restTemplate .getForObject("http://example.com", String.class); }
- Bad implementation:
public String makeARequest() { return new RestTemplate() .getForObject("http://example.com", String.class); }
- Good implementation:
Roundup
This article just scratches the surface of what’s available within Spring Cloud Sleuth. However, you can see that with minimal code changes, your application suite can quickly utilize the power of distributed tracing via Spring Cloud Sleuth, where tracing and debugging via logs becomes much easier in a distributed environment.