Go has become the backbone of many modern distributed systems – from high-performance ad tech platforms to cloud-native infrastructure and cybersecurity services. But as companies scale, so does the complexity of understanding what’s going on inside these systems.
To dig deeper, we contacted 72 Go developers across a variety of industries and roles to better understand observability challenges and practices in real-world settings. Fourteen agreed to speak with us, offering candid feedback from environments that span high-scale microservice fleets, cybersecurity platforms, and mission-critical backend systems. Their experiences covered a wide range of observability stacks – Prometheus, Grafana, OpenTelemetry, Jaeger, ELK, and more – and reflected both early and advanced stages of adoption.
What follows is a distillation of those conversations – shared pain points, recurring patterns, and opportunities to improve the state of observability in Go.
Observability in Go is still a puzzle
Go offers incredible performance and simplicity – but when it comes to observability, it lags behind its peers.
Across interviews, we consistently heard the same theme: instrumentation in Go is painful. Compared to Java or Python, where tools like OpenTelemetry work out-of-the-box, Go often requires significant manual effort, boilerplate, and careful context propagation.
Instrumentation in Go took us 6–8 months. It was worth it – but a huge pain.
Others simply gave up on manual instrumentation entirely.
The cost of too much data
Another recurring pain point was data overload. As one user put it:
"We generate trillions of requests. Filtering that down is everything."
Whether it’s high cardinality metrics, missing spans, or overloaded dashboards, teams are struggling to manage the sheer volume of telemetry. They’re using techniques like:
Sampling on a subset of servers
Declarative metric definitions via YAML
Telegraf or dependency injection filters
Only recording spans that exceed a duration threshold (e.g., >1ms)
Still, even with all this tuning, telemetry costs and data quality issues remain a problem for many.
Tracing: loved and hated
There’s a universal recognition of the power of distributed tracing, but also wide frustration with the current tools and workflows.
Common issues:
Lack of automation in Go
Forgetting to pass context
Weak trace search (especially with Jaeger)
Span gaps (especially around databases and queues)
Despite these challenges, nearly everyone wanted better tracing – not fewer traces.
Tracing helped us catch a memory spike that would have taken weeks otherwise.
What do developers want?
The wishlist for better observability tooling was surprisingly consistent:
Auto-instrumentation: Especially for HTTP/gRPC and third-party libraries
Span coverage tools: "Why is this path not instrumented?"
Minimal code pollution: Nobody wants another
otel.Tracer()
call in every function
Go developers don’t want magic – they want control without boilerplate.
Tooling Stack: The Usual Suspects
Most teams rely on the following stack:
Domain | Tools Used |
Metrics | Prometheus, Mimir, Chronosphere |
Tracing | OpenTelemetry, Jaeger, LightStep |
Logs | Loki, ELK (Elasticsearch), Sentry |
Debugging | pprof, runtime trace, |
Other tools like Telegraf, VictoriaMetrics, and custom dashboards also play a role, especially in cost-sensitive or highly scaled environments.
Legacy is real
In large organizations, OpenTracing is still widely used, even though OpenTelemetry is the future.
One engineer described a fleet of over 4,000 applications relying on OpenTracing. While there's interest in migrating to OpenTelemetry, the scope of change is massive. Bridge solutions are in place, but full adoption is slow and complex.
Where we see opportunity
If you're building tooling in this space, here are some clear areas where innovation is needed:
Go auto-instrumentation tooling with YAML or other kind of configuration
Trace coverage analyzers to highlight gaps in instrumentation
Unified correlation of metrics, logs, spans – without jumping tools
Security-aware instrumentation to meet internal audit demands
The real people behind the feedback
These insights didn’t come from surveys – they came from candid conversations with:
Engineers running load balancers handling millions of RPS
Infra teams managing 500 Grafana dashboards across services
Cybersecurity devs learning observability by instrumenting IoT pipelines
Cloud platform engineers instrumenting critical auth microservices
Some work at hyperscalers. Others are solo operators. But they all share one thing: Go is core to their stack, and observability needs to be easier.
The path forward
We’re at a turning point. OpenTelemetry has won mindshare – but the Go developer experience is still catching up. Developers want tools that help them debug faster, ship safer, and see clearer – without rewriting their apps.
We’d like to address many of these challenges through OpenTelemetry Go Compile Instrumentation – an effort to bring flexible, low-friction instrumentation to Go without polluting your codebase. If you’re struggling with observability in Go, we’d love your feedback, ideas, or contributions.
Gopher artwork inspired by Ashley McNamara’s Gopher, licensed under CC BY-NC-SA 4.0. Modified from the original.