This is a summary of what I got out of this paper on the Akamai network.
Overview
- Akamai pioneered the concept of CDN (content delivery network).
- 61,000 servers in 1000 networks across 70 countries.
- Akamai delivers around 15% to 20% of all web traffic worldwide.
- Beyond content delivery, Akamai also provides web and IP acceleration, EdgeComputing, delivery of live/on-demand HD media, high availability storage, analytics and authoritative DNS services.
Internet Delivery Challenges
- Peering points, where internet traffic exchange occurs, lack sophistication due to lack of investment. This makes peering points as bottlenecks causing packet loss and latency.
- Border Gateway Protocol (BGP) is inefficient as it mainly relies upon hop count to decide routes without considering topologies, latencies and congestion. Also, BGP is slow to react to outages.
- Networks are unreliable. Cable cuts, DDoS attacks, misconfigured routers, power outages, natural disasters etc. contribute to this.
- Inefficient communication protocols. TCP, designed for reliability and congestion-avoidance, carries significant overhead and performs suboptimally for routes with high latency or packet loss.
- Scalability of internet applications requires installing enough server capacity to handle peak traffic which would sit underutilized for most of the time. In addition to origin scalability, there should be adequate bandwidth available at all points between the origin and the user in order to achieve a good user experience. This is a serious problem for internet video.
- Application limitations and slow rate of change adoption also hinders introducing new techniques / protocols to overcome the performance challenges.
Anatomy of Delivery Network
- DNS mapping system: converts domain names to a nearby edge server IP address. This mapping relies upon a huge amount of historical and current data about global network conditions.
- Edge server: part of edge server platform and is responsible for processing requests from nearby users. The edge server platform comprises of a large deployment of edge servers distributed across thousands of sites around the world.
- Transport system: used to connect to the origin server (if required) with high reliability and performance in order to serve user requests.
- Communication and control system: used to disseminate status information, control messages and configuration updates in a fault tolerant and timely fashion.
- Data collection and analysis system: collects and analyses data from various sources such as server logs, client logs, network and server information. Collected data can be used for monitoring, alerting, analytics, reporting and billing.
- Management portal: helps enterprise customers to have fine grained control of how their content is served to end users. This portal also helps enterprise customers gain visibility on how end users are interacting with their application and content, including reports on user demographics and traffic metrics.
Design Principles
- Design for reliability: multiple levels of fault tolerance, protocols such as PAXOS for decentralized leader election.
- Design for scalability: More than 60,000 machines across the globe. Need to handle more traffic, content and customers. Analyzing increasingly large volumes of log data. Making the communication, control and mapping systems to scale to the ever increasing number of distributed machines.
- Limit human management: Design for automatic failure recovery. Helps keep cost low. Currently only 60 operations personnel to manage 60,000 machines across the globe.
- Design for performance: having fewer machines handle more traffic would help save energy costs.
High Performance Streaming
- The edge servers are placed not only in large Tier 1 and Tier 2 data centers, but also in large number of end user ISPs.
- Bandwidth usage for Obama inauguration was around 2 Tbps. In a few years, the peak bandwidth usage for such one-time events is expected to reach 100 Tbps. A single well connected data center or even a bunch of such data centers cannot handle such huge traffic. On the other than, thousands of edge servers can serve tens of Gbps and together can easily handle such traffic.
- Stream delivery quality is measured by Akamai using the following metrics: stream availability, startup time, frequency and duration of interruptions, effective bandwidth. Akamai has deployed monitoring/measurement ‘agents’ around the world to simulate users playing video streams and testing the quality.
- Tiered distribution helps reduce the load on the origin server.
- Tiered distribution combined with overlay networks (edge server reaching out to multiple parent servers on disjoint network routes) helps enhance live streaming performance with minimal packet loss.
High Performance Application Delivery
- Path optimization (between the origin and edge servers) helps to reduce latency. Performance data from Akamai’s mapping system is used to decide the path(s).
- Packet loss reduction is also achieved by path optimization.
- TCP is tailored for high performance data transfer between Akamai servers (persistent connections, optimal TCP window size, intelligent retransmission).
- Application level optimization like prefetching of HTML resources while the main page is served. Compression of resource content.
- Application protocols supported include HTTP, SSH, FTP, RDP, SSL VPN etc.
Distributing applications to the Edge
- Running the applications themselves in the edge servers would offer the ultimate boost in performance.
- While highly transactional applications that need to chat with origin databses may not be the best choices for edge computing, applications that do content aggregation/transformation, that deal with relatively static databases, that do just data collection are ideal candidates.
- Even applications that are quite complex but which can minimize the DB interactions by means of caching can be pushed to the edge.
Platform Components
- Edge server platform is highly configurable via the metadata configuration mechanism. Configurable capabilities include: origin server url, cache control parameters, cache indexing (case sensitive, query parameters etc.), authentication/authorization, origin server failure response, edge computing, performance optimization etc.
- Mapping system uses real-time as well as historic data about the health of the internet to decide the edge clusters for end users. Within an edge cluster, the end user is mapped to a specific edge server based on factors including the likelihood of cache hit for requested content in that machine. Hardware and network faults in edge servers are monitored and the failing servers are suspended till the problem gets fixed. Mapping system itself is fault tolerant distributed platform that can survive multiple data center failures.
- Communication and Control systems take care of real time distribution of status and control information, RPC and web services, dynamic configuration updates, key management and software and machine configuration management.
- Data collection and analysis system takes care of log collection, real time data collection, monitoring, analytics and reporting.
- Akamai has a global deployment of highly available and fault tolerant authoritative DNS servers. These servers are primarily used for mapping end user IPs to edge clusters that can best handle the requests. Further, the DNSs servers can serve customer zone records also; this is achieved by fetching the customer DNS zone records in a secure fashion.
- Monitoring agents are deployed globally to monitor network and website performance. Monitoring tests are configured both by mapping system (for real time network analysis) and customers (to analyze site performance).
- Global Traffic Manager (GTM) is a DNS mapping service provided to customers who have origin servers deployed at multiple geographies. An agent runs at customer origin servers to analyze internet performance parameters and feed into GTM, based on which end users as well as Akamai edge servers are routed to the most suitable origin server.
- A high availability storage system takes care of storing the many content types (static files, media etc.).
Tidbits
- Content that is less than 4.2 KB in size doesn’t benefit from compression since it would be small enough to fit into 3 data packets, which is the default size of initial TCP congestion window. This content can be sent without any TCP ACKs.