I’ve always found logging in Java/Scala to be overly complicated for what seems to be simply pushing text to files.
There are plenty of options, and standardising is difficult because dependencies often make their own choice. SLF4J is supposed to solve this by providing one-interface-to-rule-them-all, but it only partly helps.
There remain several problems:
- too many implementations/choices
- too broad an interface
- lack of standardisation over formats
It turns out that logging is pretty complicated after all.
Some of this complexity is warranted - logging is a key source of poor application performance and so care must be taken to avoid blocking. At the Guardian we experienced this recently. Several apps were performing badly and returning errors and autoscaling wasn’t helping, if anything it was making things worse. After a day or two thinking and also, crucially, looking at some thread dumps, it became clear the applications were blocking in the main threads on logging - they were Play apps so this is a big no no. It turned out each app was writing to the same Kinesis stream and shards and when the shards hit capacity the client library would block. The failure here was pathalogical; a single app could, just by logging, bring down multiple services! Needless to say, swimlaning is more subtle than people realise. But even accounting for performance, the libraries still seem unjustifiably complex.
Let’s step back a moment. What does good logging look like? I would say:
- it should be built into the language (at least the interface), to avoid bikeshedding and dependency hell
- asynchronous (because performance - yes you might lose records but for web apps at least availability is typically more important)
- a simple interface (info and debug only, with only the latter toggleable)
- sent to standard out/err only (don’t make assumptions about your environment)
- with an agreed, machine-friendly, format that is not fragile to changes (i.e. not a text-based format that is order-dependent)
The simple interface part was particularly interesting to me when I first read Dave Cheney’s article. To summarise it here, the argument is that need just two logging levels, info and debug, and only the latter should be optional (via a flag of some kind).
And note, some of these are clearly in tension; for example, your volume and performance requirements may dictate your format.
In reality, we’re stuck with what we’ve got, but we can do some things to help:
- refuse to use most log levels
- pick a good format and stick to it
- don’t make assumptions about the file system
- don’t block