Skip to main content

Rate Limits / Throughput

What is Throughput?

Throughput is the maximum number of requests that an application can handle each second. This metric, also known as the application's "rate limit," is crucial for understanding an application's performance capabilities.

When a high volume of requests is sent simultaneously, it's possible to reach the maximum throughput capacity. Some systems, like Alchemy's elastic throughput model, guarantee a specific limit of throughput, expressed in compute units per second. Interestingly, actual throughput often exceeds this guaranteed limit in real-world scenarios.

Reaching the throughput limit usually doesn't degrade the user experience. It's advisable to include retry mechanisms in your application design. This ensures that requests exceeding the limit are automatically retried in the next second. A good rule of thumb is to use retry strategies if less than 30% of requests are rate-limited.

What are API Requests?

API Requests

API requests are individual calls made to a web-based API. Each request seeks to perform an action, like retrieving data or executing a function. The nature and complexity of these requests can vary significantly, depending on the API's design and the specific operation being requested.

The Token Bucket Algorithm

Understanding the Token Bucket Algorithm

We use the token bucket algorithm to manage the rate limiting of API requests. This algorithm is a flexible method for enforcing rate limits and ensuring fair access to resources.

How It Works

  • Token Allocation: The algorithm works by allocating tokens into a bucket at a fixed rate. Each token represents permission to send a certain amount of data or a certain number of requests.
  • Token Consumption: When a request is made, it consumes tokens from the bucket. If enough tokens are available, the request is processed; otherwise, it's either rejected or queued until enough tokens are available.
  • Burst Capacity: The bucket has a capacity limit, which allows for burst traffic. When the bucket is full, it can accommodate a burst of requests up to its capacity, providing flexibility in handling sudden increases in demand.


  • Flexibility: The token bucket algorithm allows for short bursts of traffic beyond the average rate limit, offering flexibility without compromising overall system stability.
  • Fairness: This method ensures users are granted access to resources at a consistent rate, preventing any single user from monopolizing the service.
  • Predictability: It provides a predictable method for understanding how many requests can be made and when, aiding in efficient application design and resource management.

Implementing the token bucket algorithm helps to maintain a balanced load on our systems, ensuring high availability and a consistent user experience.