What Web3 could learn from Twitter’s timeline architecture
Introduction
On November 13th, Elon Musk apologized for Twitter doing more than 1,000 RPCs to render a user’s home timeline. At first glance, such a high number of RPCs seems absurd. Today, Twitter serves 260 million monthly active users and can do so in near real-time. To solve for mass adoption at sub-second latency, Twitter pioneered a number of solutions including Apache Storm, Heron, DistributedLog, and Aurora; a major contributor to Scala, including finagle RPC framework, and innovations like lambda architecture, Snowflake ID, and Segcache. So, why would an innovative and global company such as Twitter need so many calls to get a user’s timeline data?
The issues that Twitter faces remind us of the current growing pains in Web3: developers are often forced to serially call many APIs one by one to get the data for assembling the business logic. This results in unreliable and unpredictable performance, even for the simplest use cases, such as getting a user’s transaction history. And in terms of growth, transactions across the top 10 chains have multiplied by 100x in two years. In Figure 1, we show a comparison between the tweets per second (2006–2013, blue) and the Web3 transactions per second (2017–2022, red, excluded non-user transactions). If Web3 continues on the trajectory depicted in the figure, then most of today’s Web3 data infrastructure solutions will be ill-equipped to handle the growth.
Figure 1: Early days write traffic QPS comparison between tweets and Web3 top 10 chains.
In this blog post, we highlight what Web3 can learn from Twitter’s solution to scale. Specifically, we discuss the following:
We outline Twitter’s timeline infrastructure journey, argue that their current architecture does indeed make sense for particular use cases, and conclude that some critiques (such as Elon Musk’s recent tweet apologizing for the numerous RPCs to render the home timeline) may be misplaced.
We dive deeper into the technical similarities between Twitter and Web3, and explore how solutions in the former may benefit solutions in the latter.
We analyze the current Web3 growth trends coupled with the lack of existing, performant data infrastructure solutions, and conclude that significant upgrades are needed if we want to support real-time Web3 data access, and how ZettaBlock solution could help developers reduce the development time by 70% and boost performance by 10x, a demo could be found here[website, video]
Twitter’s data infrastructure journey
In the beginning, Twitter started with Vanilla MySQL. This quickly became a problem as the number of tweets increased 10x year-over-year for the first few years. From 2007 to 2012, Twitter’s monthly active users grew from a few thousand to more than 138 million. The known wisdom of horizontal and vertical shardings could not provide Twitter with the performance to handle the high level of traffic brought to the platform, especially for rendering the home timeline.
The timeline is one of Twitter’s primary platform features. Generally, Twitter’s timeline has two main operations, which are enumerated below:
Write path: This path is for when a user posts a Tweet. In 2012, Twitter handled on average 4.6k write requests per second, or 12k RPS during peak hours.
Read path: This path is for users requesting their timelines. In 2012, Twitter handled around 300k read requests per second.
To better understand how Twitter renders the timeline, let’s dive deeper into the rendering flow, depicted in Figure 2. When a Twitter user posts a tweet today, Twitter first writes it to Manhattan, a distributed key-value database that stores user tweets, direct messages, account details, etc. The tweet is fanned out to all of this user’s followers in the timeline cache. Although this increases the write amplification from 4.6k requests per second to 345k requests per second, it also greatly decreases the read latency for users. So instead of doing a join table between followers and tweets, the timeline rendering just fetches tweets in the cache from a single table. These fan out operations are usually completed in less than 5 seconds. By distributing data being written, the system survives hypergrowth by removing the table join. As a result, the read latencies are improved to a few hundred milliseconds, which is in the 99th percentile.
Figure 2: Twitter’s timeline rendering flow. Note that each tweet in a timeline requires at least one RPC.
The aforementioned rendering flow may be sufficient for the vast majority of users (with a write amplification of <100 in most cases), but what about “superhub users”? Superhub users are users with many followers (10s to 100s of millions). The described fan out pattern, in the case of superhub users, could be amplified 120+ million times! That is why during the early days of Twitter, there were dedicated racks of servers just for Justin Bieber. To accommodate superhub users, a special service called Earlybird is used. With Earlybird, tweets are fetched differently and separately for superhub and normal users. This process is depicted in Figure 3 below.
Figure 3: The left-hand side depicts an abstract illustration of the hybrid timeline for Twitter users, and the right-hand side depicts the corresponding read SQL.
Now that we have described the complexities behind providing real-time Tweet timelines, it becomes clear why single timeline rendering would require many RPCs. For example, for a timeline of just 100 tweets, the RPC calls could easily go beyond 1000 because several RPC calls are needed just to fetch one tweet. The solution might not be intuitive at first glance, but it is a deliberate trade-off to provide optimized and predictable read performance for end users.
The final result implemented by Twitter was very positive: the 99th percentile of latency was only about a few hundred milliseconds. This infrastructure has proved to be reliable for handling the hypergrowth of Twitter’s traffic for the last 10 years without major changes.
Note that we left out some other aspects of the Twitter timeline, including scoring, ranking, and hydration. For more details on this, see the references listed at the end of this post.
The similarities between Web3 and Twitter data
Figure 4: The similarities between Twitter and Web3 data
There are many similarities between Twitter and the Web3 eco-system:
Web3 is naturally a social graph, tweets are similar to transactions, and replies are similar to logs. This is depicted in Figure 4, which compares sequential timeline renderings with sequential blockchain blocks.
Super hub effects exist for Web3 protocols and Twitter. The most popular NFT platform has 1000x more trades than the 10th one.
Both Web3 and twitter are open platforms, visible to all users, and allow certain API access.
If we zoom in a bit, there are more data access pattern similarities between Twitter and Web3:
Read heavy, but each record is very small. The average size for logs and transactions is just a few KB on EVM chains.
Most recent data will be viewed more often, where the majority of views come from the first few hours after publishing.
Data is immutable after a short period of time. On-chain data could be reverted by reorg for the latest blocks. Similarly, nowadays a user could edit tweets for a limited time right after posting.
What Web3 Could Learn from Twitter’s Architecture
The transaction volumes of the top 10 chains have already increased by almost 100x compared to earlier 2020. The status quo of Web3 data infra is similar to the earlier Twitter days around 2008 when the majority of traffic depended on horizontally sharded databases from different providers. Therefore, existing Web3 data infrastructure will have a very difficult time providing performant access to data as Web3 continues to grow.
The fan out service from Twitter is about putting the related data in the same place (like memory) at the same time. Thus, when a request comes, the system can easily find the related data in one place, which results in the data already being preprocessed and ready to use. This makes the system scalable, with predictable performance.
Web3 applications following the current status quo are missing a big component to aggregate the relevant data efficiently. Specifically, developers must call APIs one by one to get the data. This results in unreliable and unpredictable performance even for the simplest use cases such as getting users’ transaction histories (depicted in Figure 5).
Figure 5: How current Web3 applications need to call many different APIs serially, even for simple transaction aggregations.
Since all Web3 data is publicly available, ZettaBlock has built a state-of-the-art data infrastructure to handle the fan out part for all Web3 developers. An app developer only needs to specify which related data they want to query via a single API and let ZettaBlock aggregate all relevant data. This is depicted in Figure 6. By using ZettaBlock, the development time and API latency are reduced by 70% and 90%, correspondingly. Checkout our demo at https://demo.zettablock.dev/. More technical details will be shared in the future.
Figure 6: In contrast to Figure 5, ZettaBlock abstracts multiple Web3 datasets into a single, user-friendly, and efficient API.
Conclusion
In this blog post, we’ve dissected Twitter’s architecture and compared its data models to Web3, finding many similarities. And if we can take away one message, it is that many existing Web3 data infrastructure solutions, much like the early days of Twitter, will not be able to keep up with the coming data demands.
This is why we’ve built ZettaBlock. ZettaBlock is a full-stack Web3 data infrastructure platform that provides real-time, reliable APIs and analytics to power your apps in minutes. The aforementioned fan out process (putting the related data in the same place at the same time) is just one feature of many available to developers and enterprises on ZettaBlock. We are trusted by leading web3 companies like Polygon, Crypto.com, Circle etc. Our vision is the go-to platform for web3 data infrastructure.
Check out our demo / video for details.
Acknowledgment
I would like to take this opportunity to express my heartfelt gratitude to all the people who have helped me with this post. Special thanks to Kevin Ros, Chi Zhang, Maria Adamjee, Raphael Serrano, Zhenzhong Xu, Paul Tluczek, Tianzhou Chen, Hemanth Soni, Nitish Sharma, Ryan Kim, Alex Xu, Vivek Gopalan, Nazih Kalo, Nirmal Krishnan, Timothy Chen, Min Hao, Bo Yang
References
Timelines at Scale: https://www.infoq.com/presentations/Twitter-Timeline-Scalability/
How Twitter uses redis to scale 105TB RAM: http://highscalability.com/blog/2014/9/8/how-twitter-uses-redis-to-scale-105tb-ram-39mm-qps-10000-ins.html
What Database does Twitter use? https://scaleyourapp.com/what-database-does-twitter-use-a-deep-dive/
Twitter Data Storage and Processing: https://ankush-chavan.medium.com/twitter-data-storage-and-processing-dd13fd0fdb30#:~:text=That%20equals%20to%20the%2084,time%20the%20request%20is%20made
More Articles
Ready to speed up your Web3 development?
Simple, out-of-box, ultra fast, low cost, and high performance.