MCP Server Security: Auth, Rate Limits, Audit Logs (2026)

How to secure a Model Context Protocol server in production: OAuth 2.1, RFC 8707 audience binding, rate limiting, audit logs, and the mistakes I keep seeing.

By Muhammad Moeed · May 9, 2026 · 16 min read

Most of the MCP servers I have looked at in the last six months would not pass a basic security review. They ship with a static API key in an environment variable, no audience binding on tokens, no rate limits, no audit log, and a tool surface that lets the agent reach anything on the public internet. They work. Right up until the day they do not.

The reason this matters more for MCP than for a regular API is the blast radius. An MCP server is not just an API — it is an API that an autonomous agent can call hundreds of times in a row, sometimes with arguments the agent invented. The threat model is closer to a bot than to a human user. If your auth, rate limits, or logging are weak, you find out the expensive way.

This post is the security guide I keep sending to clients. The threat model in plain English, the pieces of the OAuth 2.1 flow that MCP actually requires, what RFC 8707 audience binding is and why every server needs it, how to think about rate limits when the caller is an agent, and what to put in an audit log so an incident is not a guessing game.

TL;DR. For any remote MCP server, do five things: implement OAuth 2.1 authorization with PKCE, bind tokens to your server with RFC 8707 resource indicators, validate the audience claim on every request, rate-limit per client and per tool, and log every tool call with a redacted argument hash. Skip any of these and your security posture has a hole the size of an LLM.

Why MCP server security is its own topic

A normal web API has one caller per session — a human, or a job that behaves like one. An MCP server has an LLM as a caller. That changes three things.

The caller is creative. The agent will compose tool calls in ways you did not predict. Defensive design has to assume “any combination of tools, in any order”.
The caller is fast. Agents loop. A small bug — a tool that triggers a retry, or a prompt that spins — can turn into thousands of calls in a minute.
The caller is steerable from outside. Anyone who can put text in front of the agent can in principle steer the agent’s tool calls. That is prompt injection, and on an authenticated MCP server it is a privilege escalation problem, not a content problem.

The takeaway is that MCP server security is mostly the same as good API security, but you cannot get away with the corners that web APIs get away with. No static keys. No “the audience claim is optional, no one will check”. No “we’ll add rate limits later”. Later is too late.

A short threat model for MCP servers

Before writing any code, it helps to have the threats sorted in your head. These are the five that actually show up.

Threat	What it looks like	Defence
Token passthrough	A token issued for service A is replayed against your MCP server	Audience binding (RFC 8707), strict `aud` validation
Prompt-injected tool calls	A document the agent reads tells it to call `delete_account`	Least-privilege scopes, tool-level rate limits, audit logs
Static key leakage	An `MCP_API_KEY` env var ends up in a screenshot or log	OAuth flow with rotating tokens, never log secrets
Runaway agent loops	One bug in the agent causes 10k tool calls in a minute	Per-client + per-tool rate limits
Silent compromise	Something is wrong but no log shows it	Structured audit logs, including denied calls

Notice that three of the five are auth issues, one is a rate-limit issue, and one is a logging issue. That is not a coincidence — those are the three things you have to design in from day one. Everything else can be added later.

Step 1: pick the right transport, then auth follows

MCP supports two transports, and your auth story depends on which one you pick.

STDIO transport (local servers)

If your MCP server runs as a local subprocess of the client — the most common Claude Code setup — auth is simpler. The server can read environment variables, OS keychain, or a credentials file the user already has. There is no remote attacker, the client is trusted, and a lot of OAuth machinery is overkill.

The mistake people make here is bundling secrets into the server itself (hardcoded keys, baked-in tokens). Don’t. Read them at startup from the environment.

Streamable HTTP transport (remote servers)

The moment your server is reachable over HTTP, you are in OAuth territory. The MCP spec mandates OAuth 2.1, with PKCE for the authorization code flow. Static API keys are explicitly not enough. From the official MCP authorization docs:

Stop shipping static API keys. If your MCP server uses only environment variables as authentication, every security audit will flag it.

The rest of this post focuses on remote servers. That is where almost all of the real risk lives.

Step 2: implement OAuth 2.1 the way MCP expects

The MCP authorization spec specifies a standard OAuth 2.1 flow with a few MCP-specific details. The full mechanics are in the MCP authorization tutorial, but here is the shape you actually need.

1. Client hits MCP server without a token.
2. Server returns 401 with WWW-Authenticate pointing to a
   Protected Resource Metadata document at a /.well-known path.
3. Client fetches that metadata to discover the authorization
   server, supported scopes, and resource URI.
4. Client either uses Dynamic Client Registration (RFC 7591)
   or pre-registered credentials.
5. Client runs the OAuth 2.1 authorization code flow with PKCE.
6. Client receives an access token and replays calls with
   `Authorization: Bearer <token>`.
7. Server validates the token, checks the audience, and serves
   the request if everything matches.

The reason this looks heavier than a static API key is that every step is doing something an API key cannot. The metadata discovery makes the server self-describing. PKCE prevents authorization-code interception. The audience claim makes the token unusable against any other service. None of these are optional.

A practical 401 response looks like this — note the resource_metadata parameter, which is what makes auto-discovery work:

HTTP/1.1 401 Unauthorized
WWW-Authenticate: Bearer realm="mcp",
  resource_metadata="https://mcp.example.com/.well-known/oauth-protected-resource"

And the metadata it points at:

{
  "resource": "https://mcp.example.com/mcp",
  "authorization_servers": ["https://auth.example.com"],
  "scopes_supported": ["mcp:tools:read", "mcp:tools:write"]
}

That is enough for any compliant MCP client to discover and complete the auth flow without you writing a single line of client code.

Step 3: bind tokens to your server with RFC 8707

This is the single biggest change in the 2026 MCP spec, and it is the one most existing servers still get wrong.

RFC 8707 resource indicators require the client to include a resource parameter in the token request, naming the specific MCP server the token is for. The authorization server then puts that resource into the token’s aud (audience) claim. Your MCP server validates that audience on every request.

The point: a token issued for mcp.example.com cannot be replayed against mcp.other.com, even if the same authorization server signed both. This kills the token-passthrough class of bug, where a stolen token from a low-value service is used against a high-value one.

The MCP spec is unambiguous on this:

MCP servers MUST NOT accept any tokens that were not explicitly issued for the MCP server. Token passthrough bypasses security controls like rate limiting and request validation.

In practice, your server’s token validation needs four checks:

The token is signed by your trusted authorization server.
The token is not expired (exp claim).
The token is active — usually verified via token introspection (RFC 7662) so revoked tokens fail closed.
The token’s aud claim matches your server’s URL. Reject if missing, reject if different.

Here is the audience check in Python, as it appears in the MCP reference implementation:

def _validate_resource(self, token_data: dict) -> bool:
    aud = token_data.get("aud")
    if aud is None:
        return False
    if isinstance(aud, list):
        return any(self._is_valid_resource(a) for a in aud)
    if isinstance(aud, str):
        return self._is_valid_resource(aud)
    return False

Five lines of code. They are the difference between a token-replay attack working and not.

Step 4: scope your tools, not just your server

OAuth scopes are how you express permissions in tokens. The temptation, when wiring up an MCP server, is to define one scope — mcp:tools — and call it a day. Don’t.

Splitting scopes by capability gives you two things. First, least privilege: a client that only needs to read does not get write access. Second, blast radius control: if a token leaks, the damage is bounded by what that scope allows.

A reasonable scope split for a typical MCP server looks like this:

Scope	What it grants
`mcp:tools:read`	List tools, call read-only tools
`mcp:tools:write`	Call mutating tools
`mcp:tools:admin`	Tools that touch user data or settings
`mcp:resources:read`	Read MCP resources

Then enforce per-route. Every tool handler checks the required scope, denies on mismatch, and the deny is logged. The log part matters — denials are exactly the events you want visible when something is wrong.

Step 5: rate limit like the caller is a bot

Rate limits on a normal API target abuse. On an MCP server they target the agent itself, including the well-behaved ones. A perfectly innocent agent can loop. A bug in a prompt can turn one user request into a thousand tool calls. Rate limits are the seatbelt.

A workable rate-limit shape for an MCP server has three layers.

Layer	Example limit	Why
Per client (token)	600 requests / minute	Stops one bad token from drowning the service
Per tool	60 calls / minute for `delete_`, 600 for `list_`	Tools have different blast radii — limits should reflect that
Per organization	60,000 requests / minute	Final cap so one customer cannot starve another

The numbers will be specific to your workload — the structure is what matters. One global rate limit is not enough. You want different ceilings on read_user and delete_user, and you want a per-token cap so a leaked token cannot drain your quota.

When a request hits a limit, return 429 Too Many Requests with a Retry-After header. Compliant clients will back off. Non-compliant clients are now on a list you can investigate.

Step 6: audit logs that survive an incident

The right time to design your audit log is before the incident, not during. The wrong time to discover that you do not log denied tool calls is the morning after the breach.

Here is the minimum I recommend logging for every MCP request:

Field	Why
`timestamp` (ISO 8601, UTC)	Correlate across systems
`request_id`	Tie request to response and to downstream logs
`client_id`	Who is calling — from the validated token
`subject` (`sub` claim)	Which end-user authorised this
`tool_name`	What tool was called
`arguments_hash` (SHA-256)	Reconstruct an arg pattern without storing the args
`scopes`	The scopes the token presented
`result`	`success`, `denied_auth`, `denied_rate`, `error`
`latency_ms`	Spot pathological calls

Two things deserve emphasis.

Hash the arguments, do not log them raw. The args might contain customer data. Hashing lets you confirm a pattern (“the same argument was sent 200 times”) without becoming a secondary data store.

Log the denials, not just the successes. A successful tool call is interesting once. A run of 200 denied tool calls from the same client is a security event. You will only see the second one if your logging captures both.

A small but important rule: never log the Authorization header, never log raw tokens, never log secrets. Scrub them in your structured logger. If you ever have to debug auth in production, do it with token IDs (the JWT jti claim), not the tokens themselves.

Step 7: secrets management is the boring half

Most MCP server breaches I have seen did not come from a clever attack. They came from a credential ending up somewhere it should not. The discipline is dull, but it is the discipline that keeps you out of the news.

Three rules.

No secrets in source control. Not even encrypted ones. Use a real secret manager — Vault, AWS Secrets Manager, GCP Secret Manager, your platform’s keystore.
No reusing the MCP server’s client secret for end-user flows. The server’s credentials authenticate the server. End users go through the OAuth flow with their own short-lived access tokens.
Rotation, not forever. Tokens should be short-lived. Client secrets should rotate. Long-lived static credentials are a bug, not a feature.

Almost every “production MCP” tutorial on the open web ignores one or more of these. Do not take the tutorial as the standard.

A pre-prod checklist you can copy

Before any remote MCP server goes live, run through this.

This list is a couple of pages of work, not a couple of weeks. Front-loading it is much cheaper than retrofitting it after launch.

Common questions

Do I really need OAuth 2.1 if my MCP server is internal? For STDIO local servers, no — environment-based credentials are fine. For any HTTP-reachable server, yes, even on a private network. “Internal” perimeter security is not a real defence in 2026.

What is the difference between the audience claim and the scope claim? Audience answers which server is this token for. Scope answers what is this token allowed to do at that server. You need both. Audience without scope is a token that opens every door of one building. Scope without audience is a token that opens one door of every building.

Is API-key auth ever acceptable? For development against a server you control, sure. For anything multi-tenant or production, no. Static keys do not rotate, do not bind to a resource, and almost always end up logged or committed by accident.

How short should my token TTL be? Short enough that a leaked token is mostly worthless, long enough that you are not refreshing on every request. 10–60 minutes for access tokens is a reasonable starting point, with refresh tokens used to extend sessions.

Should I implement my own token validation? No, and the MCP authorization docs are explicit about this. Use a well-tested library — your language’s standard JWT library or your authorization server’s introspection endpoint. Hand-rolled validation is one of the easiest places to get security wrong.

How do I defend against prompt-injected tool calls? This is hard, and there is no single fix. Layered defences: least-privilege scopes per tool, rate limits per tool, human-in-the-loop confirmation for destructive tools, and audit logs you actually read. Treat any tool that touches money, user data, or external state as needing a confirmation step.

Where do I start if my MCP server is already running and none of this is in place? In this order: turn on audit logging first (so you have visibility), add per-tool rate limits (so a runaway agent cannot do unbounded damage), then plan the OAuth migration. Doing OAuth without logging means you cannot tell if the migration is working.

Where to go next

MCP security is moving fast, and the spec is going to keep tightening. The pieces above are the stable, recommended baseline as of mid-2026. If you are building a new server, the order I would suggest is: get OAuth 2.1 + audience binding correct first, then add scopes, then rate limits, then logs. Each layer is independently useful. Skipping any of them leaves a hole that future-you, or a future incident, will pay for.

If you are at the start of the MCP journey, the first MCP server walkthrough is a good prerequisite — get the basics working, then come back here and harden it. The best MCP servers for 2026 post is also useful for seeing what serious production servers do, and the MCP authorization docs are the canonical spec.

If you want a deeper architectural view of how MCP fits with skills, subagents, and hooks inside Claude Code, the Claude Code feature comparison and the hooks guide are the next two posts to read. For the government-grade baseline that enterprise procurement teams now expect, the NSA MCP security guidance translated for developers covers the four named controls and what to ship this week. And once your agent is shipping, the Microsoft RAMPART hands-on guide covers the pytest-native safety tests that catch the failure modes the NSA controls leave open.

Security work is rarely the most exciting thing on the roadmap. It is, however, the thing that decides whether your MCP server is a tool you ship or a story you tell. Do this part early.