All things monitoring related.
My preferred stack: Prometheus, Grafana, Loki
- Node Exporter: Prometheus exporter for server/OS statistics
- Elk Stack for Log Monitoring: ELK tends to be a bit heavy, but keeping this around just in case
- Changd: Notify if WebUI changes.
- Performance related articles at http://www.brendangregg.com/index.html
- Internet Monitoring (globally)
- Pingdom’s State of the Internet
- Down Detector
- Oracle Internet Intelligence
- The Outage Mailing List - Network admins chatting about global issues
- Internet Monitoring (locally)
- Open Speed Test: Browser based, no client login required.
- strace - Almost always available. Potentially A LOT of performance impact
- Sysdig: Combo of strace and tcpdump - and with less performance impact
- Sysdig Inspect: Potential GUI for sysdig output
- eBPF.io: Resources for eBPF
- KubeCTL Trace: Easily run eBFP from kubectl
- Pixie Labs: Troubleshoot K8S apps relatively easily, leveraging eBFP
- Scaling Mastodon - Also some great general tips for Rails, Sidekiq, and Redis.
- Scaling Mastodon to 128K Users
Also see plan for actual retrospective stores - as those are the basis for planning improvements
- Everyone Should Be On-Call - with appropriate life balance and compensation
- End Of Life: Quick End of Life Reference