Rate Limits

Per-endpoint request throttling with distributed sliding-window rate limiting.

Overview

GaaS enforces rate limits to prevent abuse, ensure fair resource allocation, and maintain system stability. Rate limits are applied per API key (organization) and vary by endpoint.

When you exceed a rate limit, the API returns an HTTP 429 Too Many Requests response with a Retry-After header indicating when you can retry.

Distributed Rate Limiting: In production (when GAAS_DATABASE_URL is set), GaaS uses Postgres-backed distributed rate limiting with atomic sliding-window counters. This ensures accurate rate limiting across multiple API server instances.

Per-Endpoint Rate Limits

Different endpoints have different rate limits based on their resource intensity:

Endpoint Category Limit Window Examples
Intent Submission 20 requests Per minute POST /v1/intents
POST /v1/intents/batch
Backtest 5 requests Per minute POST /v1/learning/backtest
Onboarding 10 requests Per minute POST /v1/onboarding/quickstart
POST /v1/onboarding/intake
Authentication (IP-based) 5 requests Per minute Dashboard login, signup, MFA verification
General Endpoints 120 requests Per minute All other API endpoints (decisions, escalations, audit, webhooks, etc.)
Chat Rate Limit: The dashboard conversational chat interface is limited to 10 requests per minute per authenticated user (tracked by session).

429 Response Format

When you exceed a rate limit, the API returns a 429 Too Many Requests response:

POST https://api.gaas.is/v1/intents
X-API-Key: your_api_key

# Response (when rate limit exceeded):
HTTP/1.1 429 Too Many Requests
Retry-After: 42
X-RateLimit-Limit: 20
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1707824460
Content-Type: application/json

{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Rate limit exceeded for POST /v1/intents. Limit: 20 requests per minute. Retry after 42 seconds.",
    "retry_after_seconds": 42
  }
}

Rate Limit Headers

All successful responses (2xx) include rate limit headers:

On 429 responses, the Retry-After header indicates the number of seconds to wait before retrying.


Rate Limit Scoping

Per API Key (org_id)

Most rate limits are scoped to your API key (organization). This means all requests from your organization count toward the same limit, regardless of which agent or service is making the request.

Per IP Address

Authentication endpoints (login, signup, MFA verification) are rate-limited per IP address (not per API key) to prevent brute-force attacks. This applies to the dashboard authentication routes only.


Handling Rate Limits

Python (gaas_sdk)

import time
from gaas_sdk import GaaSClient
from gaas_sdk.exceptions import RateLimitError

async with GaaSClient("https://api.gaas.is", headers={"X-API-Key": api_key}) as client:
    try:
        response = await client.submit_intent(intent)
    except RateLimitError as e:
        retry_after = e.retry_after_seconds
        print(f"Rate limited. Retrying after {retry_after} seconds...")
        time.sleep(retry_after)
        response = await client.submit_intent(intent)  # Retry

TypeScript (@gaas/sdk)

import { GaaSClient, RateLimitError } from '@gaas/sdk';

const client = new GaaSClient({ baseUrl: 'https://api.gaas.is', headers: { ... } });

try {
  const response = await client.submitIntent(intent);
} catch (error) {
  if (error instanceof RateLimitError) {
    const retryAfter = error.retryAfterSeconds;
    console.log(`Rate limited. Retrying after ${retryAfter}s...`);
    await new Promise(resolve => setTimeout(resolve, retryAfter * 1000));
    const response = await client.submitIntent(intent);  // Retry
  }
}

Java (com.gaas.sdk)

import com.gaas.sdk.GaaSClient;
import com.gaas.sdk.exceptions.RateLimitException;

try {
    GaaSResponse response = client.submitIntent(intent);
} catch (RateLimitException e) {
    int retryAfter = e.getRetryAfterSeconds();
    System.out.println("Rate limited. Retrying after " + retryAfter + " seconds...");
    Thread.sleep(retryAfter * 1000);
    GaaSResponse response = client.submitIntent(intent);  // Retry
}

Best Practices

1. Implement Exponential Backoff

When a request is rate-limited, wait for the duration specified in Retry-After, then retry. If you hit the limit again, double the wait time with each subsequent retry:

async def submit_with_backoff(client, intent, max_retries=3):
    retry_delay = 1
    for attempt in range(max_retries):
        try:
            return await client.submit_intent(intent)
        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise  # Final attempt failed
            retry_delay = e.retry_after_seconds * (2 ** attempt)
            await asyncio.sleep(retry_delay)

2. Batch Requests Where Possible

Use the bulk intent submission endpoint (POST /v1/intents/batch) to submit up to 50 intents in a single request. This counts as 1 request toward the rate limit instead of 50.

3. Monitor Rate Limit Headers

Check X-RateLimit-Remaining on every response. If it's low (e.g., <10), slow down your request rate proactively to avoid hitting the limit.

remaining = int(response.headers['X-RateLimit-Remaining'])
if remaining < 10:
    print("Approaching rate limit. Slowing down requests...")
    time.sleep(5)  # Pause before next request

4. Use Shadow Mode for Load Testing

Shadow mode intent submissions (?mode=shadow) are still subject to rate limits. When load testing, ensure your test traffic doesn't exceed the 20 req/min intent submission limit. Consider using multiple API keys to distribute load.

5. Cache Read-Only Data

Responses for read-only endpoints (e.g., GET /v1/intents/{intent_id}/decision) include ETag headers. Cache these responses locally and use If-None-Match headers to get 304 Not Modified responses, which don't count toward rate limits.


Upgrading Rate Limits

Rate limits are not tied to pricing tiers—all organizations (including Developer tier) have the same rate limits. This is intentional to ensure fair access.

If you need higher rate limits for high-volume production use cases, contact sales@gaas.is to discuss custom Enterprise rate limits. Enterprise plans can negotiate:


Troubleshooting

Getting 429 Errors Despite Low Traffic

Cause: Multiple services or agents sharing the same API key are collectively exceeding the limit.

Solution: Monitor X-RateLimit-Remaining headers. Consider using separate API keys for different services (requires separate organizations).

Rate Limit Not Resetting

Cause: Sliding window means the limit resets gradually as old requests age out of the 1-minute window, not all at once.

Solution: Wait at least 60 seconds from your last successful request before retrying at full speed.

Auth Endpoints Blocked by IP Rate Limit

Cause: Dashboard authentication endpoints (login, signup, MFA) are limited to 5 req/min per IP address.

Solution: If multiple users share a single outbound IP (e.g., corporate NAT), requests may be rate-limited collectively. Contact support for IP allowlisting (Enterprise only).


Related Pages