My Ramblings

If you are reading this you must be pretty bored…

Scalable Internet Architectures

http://ecx.images-amazon.com/images/I/51J8DvW24ML._SL160_.jpgThis one of my favorite books because so many of the topics discussed within it are issues that I have had to deal with first hand at my current position.  It gives some good insight as to what it means to work in large complex systems as well as some instructing solutions to real world problems.

Here are a few of my notes from book which I like to check in on from time to time to get a refresher.

 

 

  • It is important to understand how what you are building will be used and design for scalability where parts need to scale
  • Horizontal scaling is when the capacity of a system can be increased by adding more of the same hardware or software; this is the only "real" way to scale
  • Vertical scaling is accomplished by adding faster and bigger hardware.  It is an expensive strategy and should only be used for solving problems where no good solutions are available on the market
  • The goal should be to build a system that can sustain n users and architect it in a way that n+/n- (scaling out/scaling down) users won't require changes in architecture or application design
  • Adding independent components to an architecture complicates linearly; adding components on which other components depend to an architecture complicates it exponentially
  • An architecture with efficient and maintainable components can scale much better than an efficient and maintainable monolithic architecture
  • Anticipating change is what makes a good Internet architect
  • Scaling systems is a balance between cost, complexity, maintainability, and implementation latency
  • Having the operations group available and participating regularly in the business and development meetings is extremely valuable; one of the most important ingredients of good design
  • Developers have no problems pushing code to satisfy business needs without regard to the fact that they can take down the production environment
  • It is important to identify a set of principles for avoiding failure
  • Buy today's commodity hardware for a better return on investment as you will eventually saturate a machine when trying to scale it up
  • Uncontrolled change is the biggest cause of failure
  • You need a plan when implementing any change in the production environment; this plan should identify steps to get from A to B, B back to A, restoring back to bare metal and a test of the first 2 scenarios
  • It is always a good idea to have a tested plan for reverting a change
  • Having a plan will result in 100% confidence that you can recover and that downtime will be kept to a minimum when failures occur
  • Unit testing is great for ensuring that a system will arrive at an expected outcome given a certain input; it can't test every possible condition
  • Any reasonable means of reducing the amount of failures and increasing the overall product quality should be considered so even though unit testing takes a little bit more time it is worth it
  • Version control allows you to understand how code and configuration changes, by whom, and for what reason; it is critical for troubleshooting production issues
  • Having certain directories on hosts automatically checked in helps to lessen the madness of undocumented and emergency changes
  • The ability to restore a configuration is much less valuable than understanding how it changed over time
  • Language selection and scalability have little to do with each other; architectural design and implementation strategy dictate how scalable a system is
  • It is essential that you can see the oversell architectural plans and understand the purpose of the overall system
  • It is not essential that every participant be an expert in any or all of those areas, it is essential they be wholly competent in at least one are and always cognizant of the others
  • The criticality of an environment has nothing to do wheat its scale
  • Load balancing attempts to compile multiple resources to handle higher loads and is completely related to scale
  • HA is simply taking a single service and ensuring that a failure of one of its components will not result in an outage
  • Building a system that guarantees100% availability is impossible 99.999% is only 5 minutes of downtime a year
  • Availability can be thought of as a lack unplanned outages; planned maintenance is a good thing
  • Monitoring the architecture from top to bottom and the bottom up is necessary to ensure that failed pieces are caught early and dealt with quickly
  • Monitoring should account for system metrics as well as business metrics
  • Monitoring things no longer of importance while failing to monitor newly introduced metrics can results in a false sense of security
  • Monitors should never be taken offline; services should be monitored but failures will not be escalated during maintenance
  • Staging should be an exact copy of production
  • A good architecture must allow for operations and engineering to watch things break as watching these things happen leads to understanding the cause and in turn finding solutions
  • One aspect of being cost effective is minimizing the required infrastructure and another is minimizing the host of maintain the architecture
  • Independent architecture components added to a system complicate it linearly while dependent components added complicate it exponentially
  • Scalability means that the architecture can grow and shrink without fundamental change
  • The performance of any one component can drastically affect how efficiently a system can scale
  • The only way to increase performance of a complex system is to reduce the resource consumption of one of its individual components; it is fundamental that tuning strategies be employed through the entire stack
  • The 90/10 principle indicates that 90%of the execution time is spent in 10% of the code; do not choose the slowest component or code to focus on but rather that which has the most common execution path
  • The most valuable lessons in performance tuning come from building things wrong; it teaches the analytical processes required to anticipate further problems before they manifest themselves
  • If you rely on redundant hardware for handling routine load then it isn't redundant hardware
  • Highly available architectures result in more costs, equipment, services and complexity
  • Load balancing is not HA
  • Traditional HA system take the failover approach where there are a lot of system pairs where one machine is always idle
  • In a peer based HA system the cluster is responsible for providing services; each machine in the cluster assumes the responsibility over a subset of those services
  • A systems engineer is the performance and availability engineer response for ensuring the system continues to work in the light of failure
  • RR load balancing is flawed because it doesn't give an overworked server a chance to settle down (unless a health check is failed)
  • Least connection load balancing is not a good idea if there are several load balancers making independent decisions
  • Weighting load balancing is strange because load balancing is about effectively allocating available resources rather than total resources
  • Having multiple load balancers always complicates things no matter what the load balancing algorithm
  • Liner scaling is a falsehood because our algorithms for allocating requests are not perfect
  • Expect for 70% utilization on each server in clusters with 3 or more nodes
  • Using session affinity has some pretty obvious implications on fault tolerance
  • Most content, by volume and by count, is static
  • Web requests are short lived and this results in a lot of context switching of processes off and on the cpu
  • Any time a task must perform i/o (network, disk, etc) it has to wait and thus is switched out
  • Reverse proxy caches/accelerators have high throughput and high concurrency; they reduce traffic to backend servers as well as TCP overhead that backend servers would have if dealing with slow clients
  • ARP spoofing is done by sending unsolicited/gratuitous ARP responses to devices on local network
  • By providing static resources closer to the user we reduce latency and reduce the amount of congested networks though which the requests flow
  • DNS round trip time can be used by local name servers to use authoritative servers which are closer and those optimize the query
  • Anycast works by giving multiple machines the same IP address and having the networks to which they are attached announce their IP addresses over the route with the shortest path and thus closest server
  • Anycast works real well with UDP protocols that just require a single request and response such as DNS
  • Application tuning can only increase performance but this won't help if it can't scale horizontally
  • A proxy cache is an accelerator that sits between the client and the application; this provides reduced latency and connection overhead between application and client since the client is on the same network and allows for the application to focus on doing the "real" work of the request by generating the content
  • Integrated caches sit within the application and are used for computational reuse and expensive data storage; requires cache invalidation strategy
  • The vast majority of data on the Internet has a high read-to-write ratio
  • Write-back caches are used to store expensive write operations of data that is commonly read in cache; once it is full and entries are removed they are written to the backing store; they are not fault tolerant unless there is a backup device such as a battery in a RAID controller
  • Write-true caches exploit the fact that reads often occur on data most recently written; they write the data to cache and the backing store
  • Distributed caches distribute data across multiple places; when one node stores the data in cache the others will benefit
  • Caching solutions speed things up but more importantly they create scalability by reducing the contention on a shared resource; this resource access takes time, requires context switches, and is generally good to avoid
  • As applications scale horizontally the stress on shared resources increases so the goal should be to eliminate most of the shared resource usage
  • TTL based caches are not great because the timeouts are arbitrary and not reflective of the underlying data which might or might not have changed; this might be a good solution for applications which have a tolerant margin of error
  • Ideally we can cache things forever and when they change we purge them
  • The cookie is a great tool for scaling the storage of user pertinent data
  • The key to successful caching is understanding the try nature of the data, how frequently it changes and where it is used