Designing API endpoints and headers to prevent abuse

After an attacker sent 10,000+ phishing emails through our system, we rebuilt how we think about API security. The patterns we learned: database-based rate limiting, tenant validation on every request, and why Redis alone is not enough.

Summary

  • API endpoint abuse prevention is about layers - This is our personal, candid experience building it at Tallyfy. Not theory. Headers, rate limits, tenant validation, and webhook batching work together. No single mechanism stops determined attackers
  • Database-based rate limiting beats Redis for accountability - Redis is fast but ephemeral. When you need to prove what happened during an incident, database records win. The 30-day rolling window matters for billing and audit trails
  • Multi-tenant validation must happen on every request - The pattern exists:table,id,deleted_at,NULL,tenant_where saved us from cross-tenant data leaks. IDOR vulnerabilities hide where you least expect them
  • Webhook batching prevents flooding your integrations - Each event as a separate webhook call creates a scale exploit. See our webhooks documentation for how batching works today

We thought we had API security figured out. We had authentication. We had authorization. We had the usual rate limiting middleware.

Then someone figured out they could use our comment functionality to send thousands of phishing emails through Tallyfy infrastructure. The attack exploited a gap between our security layers - a place where the protections we assumed were there simply were not.

This post documents what we learned and how we rebuilt our approach to API endpoint security.

The incident that changed everything

In 2023, we discovered an attacker had exploited our comment API. The scope was alarming:

“Attacker exploited comment API to bypass ALL guest creation limits and send 10,000+ phishing emails”

The mechanism was clever. Our guest creation had rate limits. Our notification system had rate limits. But comments? Comments were just a feature. Nobody thinks of comments as a security surface until someone weaponizes them.

The deeper we investigated, the worse it looked:

“Comment-based guest creation has NO limits regardless of account status”

Our paid customers had generous limits because they were paying customers. Trial accounts had tighter restrictions. But the comment system had carved out its own path through the authorization layer - one that bypassed all of it.

The contrast was stark:

“Trial accounts: Limited to 30 notifications/hour - Paid accounts: ZERO LIMITS on guest creation”

Zero limits. Not high limits. Zero. For years, this had been fine because nobody thought to abuse it. Then someone did.

Why Redis alone is not enough

The obvious fix after an incident like this is throwing Redis at the problem. Redis is fast. Redis can count things. Redis can expire things automatically. Problem solved?

Not quite.

Our security review after the incident led to a different conclusion. We needed database-based rate limiting alongside the Redis layer:

“Rate limiting solution with database-based tracking”

The reasoning comes down to three things: accountability, durability, and billing.

Accountability - When someone claims they did not send those emails, you need records. Redis is ephemeral by design. The data disappears. A database record persists. During the incident investigation, we wished we had better audit trails of exactly who created exactly which guests at exactly what times.

Durability - Redis restarts happen. Failovers happen. When your rate limiting resets accidentally, attackers get another 10,000 attempts. Database-backed limits survive infrastructure hiccups.

Billing - Our rate limits tie to account tiers. The 30-day rolling window for notification quotas needs to survive across Redis instances. Running SELECT COUNT(*) FROM notifications WHERE created_at > DATE_SUB(NOW(), INTERVAL 30 DAY) AND user_id = ? is slower than incrementing a Redis counter, but it is correct.

The pattern we landed on uses both:

  • Redis for hot path protection (burst limiting, per-second caps)
  • Database for accountability (rolling windows, audit trails, billing)

The specification explicitly called this out after the review: “Rate limits are high enough for legitimate use (100 guests/day per user = 3,000/month)” - but now every creation was logged and traceable.

The X-Tallyfy-Client header pattern

One lesson from the security audit: know who is calling your API.

We introduced a required header that identifies the calling application:

“X-Tallyfy-Client: APIClient”

Every request must declare what type of client is making the call. Web application, mobile app, API integration, webhook callback. This sounds simple, but it enables important patterns:

Different rate limits by client type - An automated integration might legitimately need higher throughput than a human clicking buttons. But it should be explicitly identified as automation.

Abuse pattern detection - When you see 10,000 requests from something claiming to be a web browser, you know something is wrong. Legitimate web users do not make 10,000 API calls per minute.

Incident forensics - During the phishing investigation, knowing which client type was making the calls helped narrow down the attack vector.

The header is not security through obscurity. Anyone can set any header. But it adds a layer of information that makes anomalies visible.

Multi-tenant validation on every request

This one surprised us during the security audit. We thought we had tenant isolation. We had checked the obvious places. Then the audit found:

“OrganizationUsersPictureController IDOR - Uses findByIDOrUsername() without tenant verification”

IDOR - Insecure Direct Object Reference. The classic vulnerability where you can access resources by guessing IDs. In a multi-tenant system, IDOR means accessing another organization’s data.

The fix required a systematic approach. Every database query that retrieved resources by ID needed tenant verification:

'field' => 'exists:table,id,deleted_at,NULL,tenant_where'

That validation pattern appears hundreds of times in our codebase now. It checks:

  1. The record exists in the table
  2. It has the specified ID
  3. It is not soft-deleted
  4. It belongs to the current tenant

The tenant_where part is the critical addition. Every existence check includes organization context. Every resource lookup verifies ownership.

The audit numbers were sobering:

“76 models analyzed, 50 repositories audited, 100+ controllers reviewed”

And the results:

“72 security vulnerabilities found in comprehensive audit”

Most were not critical. Many were minor information disclosures or theoretical attack vectors. But the IDOR issues? Those needed immediate attention. A user in Organization A should never see data from Organization B, period.

The webhook flooding problem

Webhooks create a different kind of abuse vector. Not abuse by attackers, but abuse by scale.

We discovered this the hard way when integrations started failing:

“Webhook Flooding Without Batching - Each guest = separate webhook = scale exploit. Result: 10,000+ webhook calls in rapid succession”

A customer set up a Zapier automation to trigger when guests were added to their processes. Reasonable. Then they ran a process that added hundreds of guests at once. Each guest addition fired a separate webhook. Zapier received thousands of calls in seconds. Their integration broke.

This was not malicious. It was legitimate usage hitting an architectural limitation. But the pattern - one event equals one webhook equals one HTTP call - does not scale.

The solution required rethinking how webhooks work:

“Each guest = separate webhook = scale exploit”

Instead of firing immediately, webhook events now queue for a short window. Multiple events batch into single payloads. Per-URL rate limiting prevents overwhelming any single endpoint. Exponential backoff handles failures gracefully.

Our webhooks documentation describes the current behavior, but the key insight was recognizing webhooks as a potential amplification vector.

Secrets that should never leak

During the comprehensive audit, we found something that made us uncomfortable:

“SAML private key exposed in API responses”

The SAML integration was returning configuration data that included sensitive cryptographic material. Not in error messages - in normal responses. The private key was just… there.

This led to a systematic review of what data appears in API responses. The principle: assume every API response will eventually be logged, cached, or displayed somewhere inappropriate.

The specification that came out of this was explicit:

“Client secrets MUST be stored securely and MUST NOT be transmitted to clients after initial creation… Show secret ONLY on creation”

Once you have shown a user their API key, you never show it again. They can generate a new one. They can revoke the old one. But the system never transmits secrets after the initial creation response.

This applies to:

  • API keys and tokens
  • OAuth client secrets
  • SAML private keys
  • Webhook signing secrets
  • Any cryptographic material

The validation chain

One thing that emerged from the audit was a clear validation sequence for incoming requests. The Open API documentation describes the happy path, but internally we think about it as layers of rejection:

Layer 1: Format validation Does the request parse? Are required headers present? Is the JSON well-formed?

Layer 2: Authentication Is the token valid? Is it expired? Is it the right type for this endpoint?

Layer 3: Authorization Does this user have permission for this operation? Are they in the right organization? Is their account in good standing?

Layer 4: Rate limiting Have they exceeded their quotas? Are they showing suspicious patterns?

Layer 5: Business validation Does this operation make sense? Is the target resource in a valid state? Are dependencies satisfied?

The key insight: fail at the earliest possible layer. If the JSON is malformed, do not bother checking authentication. If authentication fails, do not bother checking authorization. Each layer that passes is more work for the system and more information potentially leaked to attackers.

The replay protection pattern

After implementing JWT tokens for email actions, we added replay protection:

“Store token for replay protection… Check rate limiting… Validate task assignment”

The scenario: someone receives an email with a one-click action link. They click it. The action completes. Then someone finds that email in a forwarded thread and clicks it again. Should the action happen twice?

Usually no. So each action token gets logged when used. Reusing it returns a friendly error rather than performing the action again.

This creates a database table that grows indefinitely, so it needs maintenance. Tokens older than their expiry plus a safety margin can be purged. The token itself contains the expiry, so the purge logic is straightforward.

What we left out

Several patterns we considered but ultimately rejected:

IP-based rate limiting - Too many legitimate use cases involve shared IPs. Corporate proxies, mobile carriers, VPN services. Blocking or limiting by IP hits innocent users more often than it stops attackers.

CAPTCHA on API endpoints - This breaks automation. The whole point of an API is programmatic access. Adding human verification defeats the purpose.

Request signing - We explored requiring HMAC signatures on all requests. The complexity cost outweighed the benefit for our use case. For financial APIs or high-value operations, this makes sense. For workflow management, it was overkill.

Geographic restrictions - Briefly considered blocking requests from certain regions. Immediately rejected as both ineffective (VPNs exist) and potentially discriminatory.

The decisions you do not make matter as much as the ones you do. Security theater - things that look protective but are not - wastes engineering time and frustrates legitimate users.

The ongoing work

API security is not a project you complete. It is a practice you maintain.

Every new endpoint needs the same scrutiny:

  • What rate limits apply?
  • What tenant validation is required?
  • What data appears in responses?
  • How could this be weaponized?

The phishing incident hurt. We had users who trusted us receive malicious emails that appeared to come from our infrastructure. That trust violation matters more than the technical fixes.

But it also taught us that security surfaces hide in unexpected places. Comments are a feature until they are an attack vector. Webhooks are helpful until they are an amplifier. Guest creation is a workflow capability until it is a phishing pipeline.

The patterns in this post - layered rate limiting, mandatory headers, tenant validation on every query, webhook batching - they all came from incidents. They came from watching attackers find the gaps we did not know existed.

That is probably the most honest thing I can say about API security: you do not really understand your attack surface until someone exploits it.

About the Author

Amit is the CEO of Tallyfy. He is a workflow expert and specializes in process automation and the next generation of business process management in the post-flowchart age. He has decades of consulting experience in task and workflow automation, continuous improvement (all the flavors) and AI-driven workflows for small and large companies. Amit did a Computer Science degree at the University of Bath and moved from the UK to St. Louis, MO in 2014. He loves watching American robins and their nesting behaviors!

Follow Amit on his website, LinkedIn, Facebook, Reddit, X (Twitter) or YouTube.

Automate your workflows with Tallyfy

Stop chasing status updates. Track and automate your processes in one place.

Discover Tallyfy