Scalable Internet Architectures - Theo Schlossnagle
I have had the benefit of seeing Theo speak a couple times at the Surge conferences in 2010 and 2011 and he definitely knows what he is talking about. He has written a really good book which goes over a number of techniques and strategies for scaling internet applications and contributes to many open source projects such as Apache Traffic Server. He has also developed a really cool open source monitoring platform called Reconnoiter which he then spun into company to run it in the cloud.
- The pathology of failure drives what we do in operations. If you haven't failed then you haven't learned because you have to experience a failure to really learn from it.
- When designing the architecture of a system you need to think about everything rather than just the software being designed. You need to know the entire system from power and cooling to client side javascript in order to make informed decisions.
- Lack of awareness of other disciplines is a bad thing. You can't know everything but you should be aware of everything. Without this decisions are made in isolation which leads to unreasonable requirements by others.
- Know and practice with your tools during the good times in order to make their use effortless when times get bad. The last tool you want to use when you are firefighting is man!
- To be excellent you need to treat operations as a craft. Become a craftsman through experience and learn discipline. Through practice achieve excellence.
- Build the tools that are needed to do your work for you so that you can focus on the more interesting part of your job.
- Think about the biggest technical mistake that you have made and what did you learn from it? This is one of my favorite interview questions.
- Version control and monitoring are crucial to running a successful business. Monitor everything because you need historical information or you are just looking at numbers that mean nothing.
- Strive for computational reuse and caching as you shouldn't do work you don't have to.
- You should never prevent caching but rather control it.