Skip to main content

Command Palette

Search for a command to run...

AI Code Review Tools Are Making Code Worse

Automated PR review tools are shipping buggy code faster while creating the illusion of thorough review. Here's why they're making the problem worse.

Updated
6 min read

AI code review tools promise to catch bugs before they hit production. In practice, they're creating a false sense of security while making it easier to ship bad code.

The problem isn't that AI code review doesn't work at all. It's that it works just well enough to be dangerous.

False Security

When you have an AI tool that flags 20 issues in a PR, and 18 of them are noise, developers learn to ignore them. The two real issues get lost in the noise. This is worse than no automated review at all.

Traditional code review works because there's accountability. A human reviewer stakes their reputation on approving code. They know if they approve something that breaks production, it reflects on them. AI tools have no such incentive.

The result: developers treat AI code review as a checkbox. "The bot approved it" becomes justification for merging without actual human review.

What AI Code Review Actually Catches

  • Linting issues - things your IDE already flagged

  • Style violations - whitespace, formatting, naming conventions

  • Simple pattern matching - detecting banned functions or obvious anti-patterns

  • Surface-level type errors - things TypeScript/mypy would catch anyway

What it doesn't catch:

  • Logic errors - off-by-one errors, incorrect conditionals, race conditions

  • Security vulnerabilities - SQL injection, XSS, authentication bypasses (unless they match exact training patterns)

  • Architecture issues - this function shouldn't exist, wrong abstraction, tight coupling

  • Business logic bugs - the code does what it says, but what it says is wrong

  • Context-dependent problems - this change breaks an assumption made elsewhere

The entire value of code review comes from catching the second category. AI tools are optimised for the first.

Training Data

AI code review tools are trained on existing codebases. Existing codebases are full of bugs. The model learns to accept buggy code because that's what it was trained on.

The model has no way to know if code it was trained on later caused production incidents. It treats "code that was merged" as "good code" when in reality it just means "code that someone approved."

This is the same fundamental flaw as LLM model collapse. The training data is contaminated with the exact problems the tool is supposed to prevent.

Alert Fatigue

I've seen AI code review tools flag the following as "security issues":

  • Using JSON.parse() (flagged as potential RCE)

  • Any SQL query (flagged as potential injection, even with parameterised queries)

  • eval() in a test file

  • setTimeout with a variable delay (flagged as timing attack)

Every single one was a false positive. When developers see 15 false positives per PR, they stop reading the AI's feedback. The one real SQL injection vulnerability gets merged because nobody takes the bot seriously anymore.

Speed

AI code review tools are marketed on speed. "Get feedback in seconds, not hours!" This optimises for the wrong metric.

Code review isn't slow because humans type slowly. It's slow because understanding context takes time. Reading the related code, understanding the business logic, thinking through edge cases - none of this can be rushed.

AI tools optimise for fast feedback. Developers optimise for getting code merged. The combination produces code that passes automated checks but doesn't actually work correctly.

Economics

Companies buy AI code review tools for two reasons:

  1. To reduce headcount - fewer senior engineers needed if AI does code review

  2. To ship faster - remove the bottleneck of waiting for human reviewers

Both reasons directly conflict with code quality. You're replacing experienced judgment with pattern matching, and adding pressure to merge quickly.

The incentives are clear: vendors sell speed and cost reduction, companies buy those metrics, and code quality suffers. Nobody's incentive is aligned with "catch more bugs."

What Works

The solution isn't better AI code review. It's better human code review.

  • Pair programming - real-time review catches issues before they're even committed

  • Small PRs - easier to review thoroughly, less likely to hide bugs

  • Domain expertise - reviewers who understand the system, not just the syntax

  • Accountability - reviewers whose names are attached to what they approve

  • Time - allowing reviewers to actually think through the changes

None of these scale as well as AI code review. That's the point. Code review shouldn't scale. If you're merging so much code that human review is a bottleneck, you have a process problem, not a tooling problem.

The Truth

AI code review tools exist because companies want to ship faster with fewer experienced engineers. The tools work well enough to justify the purchase, but not well enough to actually improve code quality.

What you get is:

  • Junior developers who think the AI caught all the issues

  • Senior developers who ignore the AI because of alert fatigue

  • Management who sees "100% of PRs reviewed by AI" as a quality metric

  • Production incidents that would have been caught by a human reviewer

The feedback loop ensures this gets worse over time. AI-approved code trains the next version of the AI. Bad code becomes the baseline.

What I Do

I don't use AI code review tools on projects I care about. I use:

  • Linters - for style and simple pattern matching (what AI code review actually does well)

  • Static analysis - language-specific tools that understand semantics, not just syntax

  • Human reviewers - people who understand the system and business logic

  • Tests - including the weird edge cases AI tools never think to check

Is it slower? Yes. Does it catch more bugs? Absolutely.

The industry has spent the last decade optimising for development speed. We've gotten very fast at shipping bugs to production. Maybe it's time to optimise for correctness instead.

Final Thoughts

AI code review isn't useless. It's worse than useless - it creates the illusion of thorough review while making it easier to ship bad code.

The problem isn't the technology. It's that the technology is being used to replace judgment with pattern matching, and accountability with automation.

Companies don't want to admit they're using AI code review to cut costs and ship faster. They frame it as "augmenting" human reviewers. In practice, it's replacing them. And the code quality shows it.

The fix requires admitting that code review is valuable because it's slow and thoughtful, not despite it. AI tools optimised for speed and cost will never provide the same value as a senior engineer who actually understands the system.

But that requires paying senior engineers and accepting slower ship times. So instead, we'll keep using AI code review, keep shipping bugs, and keep wondering why code quality is declining.

The tools aren't making code better. They're just making it easier to pretend we reviewed it.