The core mechanism of the Binance API rate limit is the Weight Bucket: each IP can consume a maximum of 6000 weight per minute (Spot). Each endpoint deducts between 1 and 100 weight points based on its complexity. Exceeding this limit returns a 429 Too Many Requests error, and repeated violations can escalate to a 418 error, resulting in an IP ban ranging from 2 minutes to 3 days. This article provides a complete guide—covering weight calculation, response header monitoring, client-side rate limiting, and WebSocket alternatives—to ensure your strategy runs stably without triggering risk controls. Users who do not yet have an API key should complete KYC on the Binance Official Website; those without an account can use Free Registration.
1. The Three Dimensions of Rate Limiting
Binance imposes three independent limits on the same IP / API Key:
| Dimension | Spot Limit | Futures Limit | Violation Response |
|---|---|---|---|
| Request Weight (REQUEST_WEIGHT) | 6000 / minute | 2400 / minute | 429 |
| Order Count (ORDERS) | 100 / 10s; 200,000 / day | 300 / 10s; 1200 / minute | 429 |
| Connection Count (RAW_REQUESTS) | 61,000 / 5 minutes | 61,000 / 5 minutes | 429 |
| IP Ban | Repeated violations | Repeated violations | 418 |
Key Insight: Although the weight of an ordering endpoint is only 1, it simultaneously consumes the ORDERS bucket. Hitting the limit of either bucket will trigger a rate limit.
2. Querying Current Weight Quotas
You can retrieve the real-time quotas for your account via the rateLimits field of GET /api/v3/exchangeInfo:
curl -s "https://api.binance.com/api/v3/exchangeInfo" | \
jq '.rateLimits'
Returns:
[
{"rateLimitType": "REQUEST_WEIGHT", "interval": "MINUTE", "intervalNum": 1, "limit": 6000},
{"rateLimitType": "ORDERS", "interval": "SECOND", "intervalNum": 10, "limit": 100},
{"rateLimitType": "ORDERS", "interval": "DAY", "intervalNum": 1, "limit": 200000},
{"rateLimitType": "RAW_REQUESTS", "interval": "MINUTE", "intervalNum": 5, "limit": 61000}
]
Accounts with higher VIP levels can apply for higher weight limits, but the standard 6000 weight is sufficient for 90% of strategies.
3. Common Endpoint Weight Comparison Table
| Endpoint | Weight | Description |
|---|---|---|
| GET /api/v3/ping | 1 | Connectivity test |
| GET /api/v3/time | 1 | Server time |
| GET /api/v3/exchangeInfo | 20 | Trading rules (cache for 1 hour) |
| GET /api/v3/ticker/price (single symbol) | 1 | Single symbol price |
| GET /api/v3/ticker/price (all) | 4 | Fetch all prices at once |
| GET /api/v3/ticker/24hr (single symbol) | 1 | Single symbol 24h stats |
| GET /api/v3/ticker/24hr (all) | 80 | Stats for all pairs |
| GET /api/v3/depth limit=5/10/20/50/100 | 1 | Order book depth |
| GET /api/v3/depth limit=500 | 5 | Order book depth |
| GET /api/v3/depth limit=1000 | 10 | Order book depth |
| GET /api/v3/depth limit=5000 | 50 | Order book depth (use with caution) |
| GET /api/v3/klines | 2 | K-lines (candlesticks) |
| GET /api/v3/historicalTrades | 5 | Historical trade data |
| GET /api/v3/account | 20 | Account balances |
| GET /api/v3/openOrders (single symbol) | 6 | Current open orders |
| GET /api/v3/openOrders (all) | 80 | All open orders |
| GET /api/v3/allOrders | 20 | Historical orders |
| POST /api/v3/order | 1 | Create order |
| DELETE /api/v3/order | 1 | Cancel order |
| DELETE /api/v3/openOrders | 1 | Cancel all open orders |
Performance Trap: Avoid looping calls to ticker/24hr for individual symbols. Use the parameterless endpoint to fetch all at once, reducing weight from 300 symbols × 1 = 300 down to 80.
4. Reading Response Headers for Adaptive Rate Limiting
Every REST request response includes the currently used weight. Clients should read this to adjust dynamically:
import requests, time
BASE_URL = "https://api.binance.com"
class RateLimiter:
def __init__(self, max_weight=6000, safety_ratio=0.8):
self.max_weight = max_weight
self.safety = safety_ratio # Use only 80% to prevent cold-start errors
self.used_weight = 0
def update_from_headers(self, headers: dict):
used = headers.get("X-MBX-USED-WEIGHT-1m")
if used:
self.used_weight = int(used)
def should_wait(self) -> float:
"""Returns recommended wait seconds; 0 means ready to call."""
threshold = self.max_weight * self.safety
if self.used_weight >= threshold:
# Estimate seconds until next minute reset
return 60 - (int(time.time()) % 60)
return 0
limiter = RateLimiter()
def safe_get(path, params=None):
wait = limiter.should_wait()
if wait > 0:
print(f"[Rate Limit] {limiter.used_weight} weight used, pausing for {wait}s")
time.sleep(wait)
r = requests.get(f"{BASE_URL}{path}", params=params, timeout=10)
limiter.update_from_headers(r.headers)
return r.json()
# Usage
data = safe_get("/api/v3/ticker/24hr")
print(f"Current weight usage: {limiter.used_weight}/6000")
5. Token Bucket Rate Limiting (Proactive Client Control)
A more reliable approach than passively watching headers is a Client-side Token Bucket:
import time, threading
class TokenBucket:
def __init__(self, capacity=6000, refill_per_sec=100):
self.capacity = capacity
self.tokens = capacity
self.refill = refill_per_sec # 6000/60 = 100 per second
self.last = time.time()
self.lock = threading.Lock()
def acquire(self, cost=1):
with self.lock:
now = time.time()
elapsed = now - self.last
self.tokens = min(self.capacity, self.tokens + elapsed * self.refill)
self.last = now
if self.tokens < cost:
wait = (cost - self.tokens) / self.refill
time.sleep(wait)
self.tokens = 0
else:
self.tokens -= cost
bucket = TokenBucket(capacity=6000, refill_per_sec=100)
def call(path, weight):
bucket.acquire(weight)
return requests.get(f"{BASE_URL}{path}").json()
# Use for both creating orders (weight 1) and checking balances (weight 20)
call("/api/v3/ticker/price", 1)
call("/api/v3/account", 20)
6. Correct Handling of 429 and 418 Errors
1. Receiving a 429
def request_with_retry(method, url, **kwargs):
for attempt in range(3):
r = requests.request(method, url, **kwargs)
if r.status_code == 429:
retry_after = int(r.headers.get("Retry-After", 60))
print(f"Rate limit triggered, sleeping for {retry_after}s")
time.sleep(retry_after)
continue
if r.status_code == 418:
print("IP banned, application must stop!")
raise SystemExit(1)
return r
raise Exception("Max retries exceeded")
The Retry-After header provides the exact number of seconds to wait; simply sleep for that duration.
2. Receiving a 418
A 418 error is a severe warning: Continuing to send requests after a 429 will result in an IP ban starting at 2 minutes, potentially escalating to 3 days. Once received, all requests must stop immediately, and you must wait at least the duration specified by Retry-After before resuming.
7. WebSocket Alternative: Near-Zero Weight Consumption
REST polling consumes weight with every request. WebSocket subscriptions only consume weight once during the initial connection; real-time pushes do not count toward weight:
import json, websocket
def on_message(ws, message):
data = json.loads(message)
print(f"{data['s']} Price {data['c']}, 24h Vol {data['v']}")
ws = websocket.WebSocketApp(
"wss://stream.binance.com:9443/stream?streams=btcusdt@ticker/ethusdt@ticker",
on_message=on_message
)
ws.run_forever()
Cost Comparison: Polling real-time tickers for 10 pairs via REST every second = 600 calls/minute × 1 weight = 600 weight. WebSocket subscription = 0 weight.
8. Practical Weight Optimization Tips
- Cache exchangeInfo: Updating once per hour is sufficient; higher frequencies just waste 20 weight per call.
- Prioritize Batch Queries: Fetching all tickers at once without parameters via
/ticker/24hrsaves 90% weight compared to individual loops. - Only Fetch Necessary Depth: Limit=20 is enough for most limit orders (weight 1); don't pull 5000 levels.
- Use userDataStream for Order Status: It is more efficient and consumes zero weight compared to periodic
GET /ordercalls. - Use Batch Cancel for All Orders:
DELETE /openOrders?symbol=BTCUSDTcosts 1 weight, saving more than individual cancels. - Time-based Limiting: Settlement windows (UTC 00:00, 08:00, 16:00) often have higher weight consumption; adjust your strategy to avoid these peaks.
9. FAQ
Q1: Is the weight calculated by IP or API Key?
A: Primarily by IP. Multiple keys under the same IP share the weight bucket. Switching IPs can bypass IP-level weight limits, but order count limits are still tied to the account. Using proxies to rotate IPs is identifiable by Binance and not recommended.
Q2: What is the difference between X-MBX-USED-WEIGHT and X-MBX-USED-WEIGHT-1m in response headers?
A: X-MBX-USED-WEIGHT-1m is the officially recommended header, explicitly denoting the 1-minute window. X-MBX-USED-WEIGHT is a legacy field with the same value. Use the one with the -1m suffix.
Q3: How long will I be banned after a 418?
A: The first violation is usually 2 minutes, escalating to 5/15/60 minutes, and up to 3 days in extreme cases. The Retry-After header gives the precise time. Optimize your code immediately upon recovery to prevent escalating bans.
Q4: How do I calculate weight for concurrent calls in a multi-threaded app?
A: Use a process-wide global token bucket (see section 5) or a distributed Redis-based token bucket for multi-machine deployments. A threading.Lock is sufficient for a single machine.
Q5: Are the weight rules the same on the testnet (testnet.binance.vision)?
A: Testnet weight limits are generally more relaxed (often 10x the mainnet), but the rule structure is identical. Do not estimate mainnet performance based on testnet consumption; always validate on the mainnet with low traffic before going live.
After reviewing rate-limiting strategies, return to the Category Navigation to select the "API Integration" category for WebSocket and signature tutorials.