8.1 MCP Client IP Forwarding
#The Risk
fastapi-mcp makes internal HTTP calls to your FastAPI endpoints. Without configuration, ALL MCP queries appear to come from the server's own IP. This means per-IP rate limiting and concurrency limits treat every MCP user as the same person - either everyone is rate-limited or nobody is.
The Solution
Configure the MCP server to forward the original client's IP address when it makes internal calls to your API. By default, all MCP requests appear to come from localhost, so your rate limiting can't tell users apart. Pass through headers like cf-connecting-ip and x-forwarded-for. CRITICAL: the base_url MUST be localhost - if you use the external domain, the request goes through Cloudflare on the way back in, which overwrites cf-connecting-ip with your server's own IP, destroying the real client IP.
The Fix
import httpx
from fastapi_mcp import FastApiMCP
mcp = FastApiMCP(
app,
headers=["authorization", "cf-connecting-ip",
"x-forwarded-for", "x-real-ip"],
http_client=httpx.AsyncClient(
# MUST be localhost - external URL goes through Cloudflare
# and overwrites cf-connecting-ip with server's own IP
base_url="http://localhost:8000",
timeout=60.0,
),
)This is the #1 MCP misconfiguration. Using the public URL (https://app.example.com) instead of localhost looks correct but silently breaks per-user rate limiting for all MCP clients.
8.2 Open vs Secured MCP Endpoints
#The Risk
A single MCP endpoint forces a choice: open to all (insecure) or locked down (unusable for demos). Running two endpoints side by side lets you serve both use cases, with all security layers (rate limiting, SQL validation, concurrency) active on both.
The Solution
Run two separate MCP endpoints: one open (no login required) for demos and testing, and one secured with authentication (OAuth + JWT + email whitelist) for production use. Both endpoints still get all the other protections - rate limiting, SQL validation, concurrency limits. The difference is just who's allowed to connect.
The Fix
# Two MCP endpoints:
/mcp - open, no auth (demo/testing)
/mcp-secure - Auth0 OAuth + JWT + email whitelist
# Both get: rate limiting, SQL validation, concurrency
# /mcp-secure adds: identity verification via OAuthAuth setup (Auth0/OAuth, JWT verification, email whitelists) is covered in detail in Section 10 (Authentication & Authorization) of this checklist. For a working implementation with both open and secured MCP endpoints, see the example repos in the Live Examples section.
8.3 MCP Transport Compatibility
#The Risk
fastapi-mcp 0.4.0 uses SSE transport. Some newer MCP clients (like Claude Code) use Streamable HTTP, which sends a POST to /mcp - and your SSE endpoint returns 405 Method Not Allowed. This silently breaks connectivity for newer clients.
The Solution
Be aware that the MCP protocol has two transport methods - SSE (older) and Streamable HTTP (newer). If your MCP library only supports SSE, newer clients that expect Streamable HTTP will get errors when connecting. Keep your MCP library updated and watch for releases that add support for the newer transport method.
The Fix
# Monitor fastapi-mcp releases for Streamable HTTP support
# Current workaround: clients must use SSE transport
# Future: upgrade fastapi-mcp when POST transport is supportedThis is a library limitation, not a configuration issue. Keep fastapi-mcp updated. A separate trap on the same streaming transport: an observability or logging middleware built on Starlette BaseHTTPMiddleware silently BREAKS SSE / MCP responses - it buffers the response body to measure it, which throws on a live stream and also drops that request from your logs, so the streaming endpoint quietly goes unmonitored. Keep anything that wraps responses (logging, metrics) on pure ASGI (observe only the status line, forward every chunk untouched), never BaseHTTPMiddleware on a backend that streams.
8.4 Failed-Auth Rate Limiting on Secured Endpoint
#The Risk
Your secured MCP endpoint validates JWT tokens on every connection. Without rate limiting on failed attempts, an attacker can send thousands of fake tokens probing for valid ones, timing differences, or error message leaks. Your application-level rate limits (SlowAPI) protect query endpoints but not the SSE connection handshake where JWT validation happens.
The Solution
Track failed auth attempts per IP in memory. After a configurable threshold, block that IP from further attempts with a 429 response. Choose thresholds based on your use case - a public API with many users might allow 20 failures per hour, while a private internal tool might allow only 3 failures with a 24-hour block. This stops brute-force probing without affecting legitimate users who authenticate successfully on first try.
The Fix
# In-memory failed-auth tracker
_auth_failures: dict[str, list[float]] = {}
# Tune these to your use case:
# Public API with many users: AUTH_FAIL_MAX=20, WINDOW=3600 (1hr)
# Internal tool, few users: AUTH_FAIL_MAX=3, WINDOW=86400 (24hr)
# General-purpose default: AUTH_FAIL_MAX=5, WINDOW=86400 (24hr)
AUTH_FAIL_MAX = int(os.getenv("AUTH_FAIL_MAX", "5"))
AUTH_FAIL_WINDOW = int(os.getenv("AUTH_FAIL_WINDOW", "86400"))
AUTH_FAIL_MAX_IPS = 1000 # cap tracked IPs to bound memory
def _is_auth_blocked(ip: str) -> bool:
if ip not in _auth_failures:
return False
now = time.time()
cutoff = now - AUTH_FAIL_WINDOW
# Prune expired entries
_auth_failures[ip] = [t for t in _auth_failures[ip] if t > cutoff]
if not _auth_failures[ip]:
del _auth_failures[ip]
return False
return len(_auth_failures[ip]) >= AUTH_FAIL_MAX
def _record_auth_failure(ip: str):
now = time.time()
_auth_failures.setdefault(ip, []).append(now)
# Periodic cleanup: evict expired IPs when dict grows too large
if len(_auth_failures) > AUTH_FAIL_MAX_IPS:
cutoff = now - AUTH_FAIL_WINDOW
expired = [k for k, v in _auth_failures.items()
if not v or v[-1] < cutoff]
for k in expired:
del _auth_failures[k]
async def verify_oauth_token(request: Request):
client_ip = get_client_ip(request)
if _is_auth_blocked(client_ip):
raise HTTPException(429, "Too many failed attempts")
try:
# ... validate JWT ...
pass
except JWTError:
_record_auth_failure(client_ip)
raise HTTPException(401, "Invalid token")SEPARATE FROM SLOWAPI: SlowAPI limits total requests per IP; this specifically targets failed authentication only. A user who authenticates successfully is never affected. MEMORY BOUNDED: Cap the dictionary size (e.g. 1000 IPs) and evict expired entries on overflow - otherwise a distributed attack from thousands of IPs could grow the dict unboundedly. MAKE IT CONFIGURABLE: Load thresholds from env vars so you can tighten or loosen without redeploying. WINDOW STRATEGY: Short windows (5-15 min) with moderate limits suit high-traffic public APIs. Long windows (24hr) with low limits suit private or internal tools where legitimate users rarely fail auth.