Client Rate Limiting in .NET, done right

When you inevitably will need to connect to and consume a web API, it will come with concerns of maximising throughput and somehow not overloading the server. You may have set up a connection already, but frequently get hit with annoying 429s.

Rate limits aren’t suggestions

Vendors set a tenant‑wide budget. If you thought of spinning up five handlers with five limiters, congrats! you just quintupled your “budget” - or so you think. In reality you’ll get 429s, throttling, and sad dashboards. The fix is boring and effective: centralize the budget.

You’re wasting throughput

If you frequently run into 429s even with a generous RPM, you’re probably also leaving capacity on the table. The usual suspects are

Per‑client “budgets” Multiple handlers each with their own limiter are not multiplying capacity. Use one singleton per upstream/tenant.
Bursty traffic Spiky producers quickly hit your minute limit, then sit idle. Using a Sliding window with segments gives you a more consistent throughput.
Double‑spend on retries If retries skip your limiter rules, they just make things worse. Keep retries inside the same pipeline so they still spend from your budget. Be picky about what you choose to retry.
Ignoring Retry‑After When you do see 429s, back off exactly as told. Add a 429 policy that reads Retry‑After and delays as instructed.
Infinite queues & ambitious timeouts Giant queues do not hide an overload of requests - they are just hidden. Cap the queue and fail fast at the edge that can actually shed load.
Background jobs that you forgot about Batch jobs, webhooks, and “just a small sync” must use the same limiter or they’ll ruin your shared budget.

A single rate limiter

Register a singleton RateLimiter in DI. I like a sliding window because it smooths bursts, and gives you a more consistent throughput

private static Func<IServiceProvider, RateLimiter> CreateRateLimiter()
{
    return sp =>
    {
        var opts = sp.GetRequiredService<IOptions<ApiOptions>>().Value;
        return new SlidingWindowRateLimiter(new SlidingWindowRateLimiterOptions
        {
            // Global budget per minute
            PermitLimit = opts.MaxRequestsPerMinute,
            Window = TimeSpan.FromMinutes(1),
            // Smoothing allows smaller bursts. 6 segments/minute is a great tradeoff between safety and performance
            SegmentsPerWindow = 6,
            QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
            QueueLimit = opts.QueueLimit, // Ask your server for a per-minute rate limit
            AutoReplenishment = true
        });
    };
}

Wire it up

Wire it into the pipeline along with a retry policy so everything plays nice. I’ve used Polly here.

services.AddSingleton<RateLimiter>(CreateRateLimiter());

services
    .AddHttpClient<IApiClient, ApiClient>()
    .ConfigureHttpClient((sp, c) =>
    {
        var opts = sp.GetRequiredService<IOptions<ApiOptions>>().Value;
        c.BaseAddress = new(opts.BaseUrl);
        c.Timeout = opts.Timeout;
    })
    .AddResilienceHandler("rl", (b, ctx) =>
    {
        var limiter = ctx.ServiceProvider.GetRequiredService<RateLimiter>();

        // 1) Everyone asks for a permit before hitting the wire
        b.AddRateLimiter(new HttpRateLimiterStrategyOptions
        {
            RateLimiter = async (_, ct) => await limiter.AcquireAsync(1, ct)
        });

        // 2) Only retry what deserves it. Ask your server.
        b.AddRetry(new HttpRetryStrategyOptions
        {
            MaxRetryAttempts = opts.MaxRetryAttempts,
            Delay = TimeSpan.FromSeconds(1),
            BackoffType = DelayBackoffType.Exponential,
            ShouldHandle = args => new ValueTask<bool>(
                args.Outcome.Result?.StatusCode == HttpStatusCode.TooManyRequests)
            OnRetry = args => logger.LogInformation(
                "Retry status: {Status}",
                args.Outcome.Result?.StatusCode);
        });
    });

Quick sanity check

Fire 200 requests at a harmless endpoint. You should see a flat ceiling at your configured RPM, growing latency as the queue fills, and zero 429s. Tweak SegmentsPerWindow for burstiness; tweak QueueLimit for fairness vs. latency.

Ship it. Your upstream’s SREs will sleep better — and so will you.