As we are happily, directly listening into the world for requests on port
:443, and sometimes not know how to fulfill the desired request, we are presented the names of such unknown domains to us:
journalctl -u boringproxy | grep -o 'no certificate available for .*' | cut -d' ' -f5 | sort -u | less
When looking at the digest from our aggregated knowledge of these omnious behaviours, we learn that
'*' anything in the world can be requested at times. Not really caring about this, as well as the obviously unobvious attempts to speak with our host directly,
and sites we have long abandoned over time, there appear plenty of automatically generated subdomains for each, but also high amounts of other garbage.
How to deal with this in the long-term, if not running behind another filtering instance; is this in any way dangerous? What should we do with legitimate, preexisting addresses, that we do not want to see here anymore?
I wonder about this sometimes. Basically we’re completely blind to many changes in the behavior of our systems, unless the change is big enough to be obvious, like bringing down a server.
I would love to have a bit more visibility into this. For example, if you suddenly got 10x the number of usual request at 2am, it would be nice if boringproxy sent you an email. You could imagine all sorts of useful graphs and data points. I’m sure there are already great tools for this, that can probably even ingest logs if formatted properly. But it would be fun to at least add some to boringproxy.
One good start here will be to add a JSON logging output option, in so it can easily be ingested by such systems. Also different levels of logging verbosity are a common way to delineate the origin of a logging line.
As chance would have it, a couple weeks ago I started a deep dive on logging/monitoring for my day job. I’ve already learned a ton that I plan on incorporating into boringproxy eventually. Basically it comes down to structured logging (possibly with JSON), as you indicated. I think for the most part monitoring makes sense as a separate system that operates on the output logs, most likely Prometheus.
Greatly appreciate to see it on the roadmap now, that ever changing scope.