Rate Limiting

djangosdk supports token-based rate limiting — rate limits based on the number of tokens consumed, not the number of requests. This is more accurate for AI workloads where a single request can consume vastly different amounts of tokens.

Configuration

AI_SDK = {
    "RATE_LIMITING": {
        "ENABLED": True,
        "BACKEND": "django_cache",
        "PER_USER_TOKENS_PER_MINUTE": 50000,
        "PER_USER_TOKENS_PER_DAY": 500000,
    },
}

The @ai_rate_limit Decorator

Apply rate limiting to a view or function:

from djangosdk.ratelimit.decorators import ai_rate_limit

@ai_rate_limit(tokens_per_minute=10000, tokens_per_day=100000)
def chat_view(request):
    agent = SupportAgent()
    response = agent.handle(request.POST["message"])
    return JsonResponse({"text": response.text})

If the rate limit is exceeded, the decorator raises RateLimitExceeded (HTTP 429).

Rate Limit Backend

The default backend (django_cache) uses Django's cache framework to track token usage. Any Django cache backend works (Redis, Memcached, database, etc.).

Custom Backend

Implement AbstractRateLimitBackend to use a custom store:

Per-User Limits

Rate limits are tracked per user. The default key is request.user.id. Override the key function for custom scoping:

Last updated

Was this helpful?