Now that my day job involves more specific DevOps practices, I've grown my Sysadmin phylosophy to be more all-encompassing.
Beyond just automation of system configuration, modern practices requires more encompassing practices such as tight integration with developers, version control, test driven development, and continuous integration and delivery.
- Collaboration and organization: A primary focus of Devops is around culture. Instead of an “us versus them” mentality between production and development teams, high integration of those roles is necessary.
- Automate everything mindset: In today fast-paced world, automation is key. This includes everything from infrasturcture deployment, automated testing frameworks, and CI/CD tooling so releases require no manual work.
- Lean flow accelerates delivery: In line with an Agile mindset, reduce the “work in progress” tasks, reduce the batch sizes themselves, and make sure work queues are low. Instead of a few large releases, aim for continuous delivery of a lot of small batches.
- Measurement of Everything: Correct analytics is necessary to quickly trace down any issues that might occur in a continuous delivery framework. As quoted from the ScaledAgileFramework site “…“the facts are always friendly” rather than intuition”
- Recovery Enabled Low Risk Releases: In order to implement a continuous delivery mindset, there must be the ability to quickly and easily roll back releases that experience unexpected issues. This includes having the tooling in place to perform such operations and actually performing practice runs of pulling back or fixing forward.
- Amazon Kinesis Aug 25 2020
- Matrix Apr 2019
- Google Cloud June 2019
- Danluu's Post Mortem Repo
- Kubernetes Fail Stories
- Monzo Payment Outage
- First Kubernetes Outage - Helm Related
System Build Tools
- TheForeman: Used currently in some of my work, but a little larger/more cumberson that I would like.
- DigitalRebar: A new project that I'd like to investigate more heavily.
- Kubernetes: Main Kubernetes page. Docs are fairly
good and in depth.
- Kubernetes Production Check List: Good list of best practices when running kubernetes in production
- Kubernetes YAML Validation Tools
- Bitnami Kubernetes Production Runtime: A great starting point for a lot of
- Kapitan: Templating system for Kubernetes and Terraform. Potential replacment for Helm/Kustomize/etc.
- Kyverno: Native k8s Policy Management
- CRI: Container Runtimes
- CNI: Network Plugins
- CNI Comparison services that are potentially needed with running kuberntes in production. I don't run the full production runtime, but I have used a majority of these services successfully.
- CSI: Storage Plugins
- Docker: Extremely light weight container system.
Infrastructure As Code
- Infracost: Terraform Infrastructure Cost estimation, which can be baked into a CD pipeline for better review of what infrastructure costs will be.
- Git: All configurations should be in a version management system, and git is probably the best available. For any open source code Github is pretty much the defacto host for a lot of projects.
- CFEngine: The oldest, and also the most complex to setup.
- Puppet: Designed to be easier than CFEngine. True for the most part, but requires a bit of work to bootstrap, at least for the pure open source version.
- Salt: Need to look at this one. Requires a client install (unlike Ansible), but with 0MQ could be far more scalable.
- Ansible: I really like the lightweight distribution and no client side libraries required. However, I do worry about scalability because of the use of SSH as a transport protocol. Although this could be mitigated by a pull architecture, although report collection because a little more difficult.
- Vagrant: Designed to quickly deploy test virtual machines in a specific configuration. Can pull in configuration management from Ansible, Salt, Puppet, etc.
- Goss: Simple server testing framework (light-weight version of inSpec/ServerSpec)
- Jenkins: Generally used for CI on code, could be integrated with the above to perform full integration testing on a stack
- Collectd: Monitoring OS system stats.
- Elk Stack for Log Monitoring
- Performance related articles at http://www.brendangregg.com/index.html
- Internet Monitoring (globally)
- strace - Almost always available. Potentially A LOT of performance impact
- Sysdig: Combo of strace and tcpdump - and with less performance impact
- Sysdig Inspect: Potential GUI for sysdig output
- eBPF.io: Resources for eBPF
- KubeCTL Trace: Easily run eBFP from kubectl
(need to research…probably Bacula or Amanda)
- ServerSpec: Perhaps start for TDD for entire stack.
- IpIFY: Light-weight API to get your public IP address. Basic IP info is free to use. Can register to get GeoIP Information.
- ip-api.com. Provids IP, geo-location and ISP information at
http://ip-api.com/jsonFree for non-commercial purposes.
- NoIP and XIP: Automatice DNS based on IP address
- NSA Proof Your Email System
- Awesome Sysadmin
- JQPlay: Quick way to debug jq filters. Also available as a Docker image