AI Code Review Tools Are Making Code Worse
Automated PR review tools are shipping buggy code faster while creating the illusion of thorough review. Here's why they're making the problem worse.
AI code review tools promise to catch bugs before they hit production. In practice, they're creating a false sense of security while making it easier to ship bad code.
The problem isn't that AI code review doesn't work at all. It's that it works just well enough to be dangerous.
False Security
When you have an AI tool that flags 20 issues in a PR, and 18 of them are noise, developers learn to ignore them. The two real issues get lost in the noise. This is worse than no automated review at all.
Traditional code review works because there's accountability. A human reviewer stakes their reputation on approving code. They know if they approve something that breaks production, it reflects on them. AI tools have no such incentive.
The result: developers treat AI code review as a checkbox. "The bot approved it" becomes justification for merging without actual human review.
What AI Code Review Actually Catches
Linting issues - things your IDE already flagged
Style violations - whitespace, formatting, naming conventions
Simple pattern matching - detecting banned functions or obvious anti-patterns
Surface-level type errors - things TypeScript/mypy would catch anyway
What it doesn't catch:
Logic errors - off-by-one errors, incorrect conditionals, race conditions
Security vulnerabilities - SQL injection, XSS, authentication bypasses (unless they match exact training patterns)
Architecture issues - this function shouldn't exist, wrong abstraction, tight coupling
Business logic bugs - the code does what it says, but what it says is wrong
Context-dependent problems - this change breaks an assumption made elsewhere
The entire value of code review comes from catching the second category. AI tools are optimised for the first.
Training Data
AI code review tools are trained on existing codebases. Existing codebases are full of bugs. The model learns to accept buggy code because that's what it was trained on.
The model has no way to know if code it was trained on later caused production incidents. It treats "code that was merged" as "good code" when in reality it just means "code that someone approved."
This is the same fundamental flaw as LLM model collapse. The training data is contaminated with the exact problems the tool is supposed to prevent.
Alert Fatigue
I've seen AI code review tools flag the following as "security issues":
Using
JSON.parse()(flagged as potential RCE)Any SQL query (flagged as potential injection, even with parameterised queries)
eval()in a test filesetTimeoutwith a variable delay (flagged as timing attack)
Every single one was a false positive. When developers see 15 false positives per PR, they stop reading the AI's feedback. The one real SQL injection vulnerability gets merged because nobody takes the bot seriously anymore.
Speed
AI code review tools are marketed on speed. "Get feedback in seconds, not hours!" This optimises for the wrong metric.
Code review isn't slow because humans type slowly. It's slow because understanding context takes time. Reading the related code, understanding the business logic, thinking through edge cases - none of this can be rushed.
AI tools optimise for fast feedback. Developers optimise for getting code merged. The combination produces code that passes automated checks but doesn't actually work correctly.
Economics
Companies buy AI code review tools for two reasons:
To reduce headcount - fewer senior engineers needed if AI does code review
To ship faster - remove the bottleneck of waiting for human reviewers
Both reasons directly conflict with code quality. You're replacing experienced judgment with pattern matching, and adding pressure to merge quickly.
The incentives are clear: vendors sell speed and cost reduction, companies buy those metrics, and code quality suffers. Nobody's incentive is aligned with "catch more bugs."
What Works
The solution isn't better AI code review. It's better human code review.
Pair programming - real-time review catches issues before they're even committed
Small PRs - easier to review thoroughly, less likely to hide bugs
Domain expertise - reviewers who understand the system, not just the syntax
Accountability - reviewers whose names are attached to what they approve
Time - allowing reviewers to actually think through the changes
None of these scale as well as AI code review. That's the point. Code review shouldn't scale. If you're merging so much code that human review is a bottleneck, you have a process problem, not a tooling problem.
The Truth
AI code review tools exist because companies want to ship faster with fewer experienced engineers. The tools work well enough to justify the purchase, but not well enough to actually improve code quality.
What you get is:
Junior developers who think the AI caught all the issues
Senior developers who ignore the AI because of alert fatigue
Management who sees "100% of PRs reviewed by AI" as a quality metric
Production incidents that would have been caught by a human reviewer
The feedback loop ensures this gets worse over time. AI-approved code trains the next version of the AI. Bad code becomes the baseline.
What I Do
I don't use AI code review tools on projects I care about. I use:
Linters - for style and simple pattern matching (what AI code review actually does well)
Static analysis - language-specific tools that understand semantics, not just syntax
Human reviewers - people who understand the system and business logic
Tests - including the weird edge cases AI tools never think to check
Is it slower? Yes. Does it catch more bugs? Absolutely.
The industry has spent the last decade optimising for development speed. We've gotten very fast at shipping bugs to production. Maybe it's time to optimise for correctness instead.
Final Thoughts
AI code review isn't useless. It's worse than useless - it creates the illusion of thorough review while making it easier to ship bad code.
The problem isn't the technology. It's that the technology is being used to replace judgment with pattern matching, and accountability with automation.
Companies don't want to admit they're using AI code review to cut costs and ship faster. They frame it as "augmenting" human reviewers. In practice, it's replacing them. And the code quality shows it.
The fix requires admitting that code review is valuable because it's slow and thoughtful, not despite it. AI tools optimised for speed and cost will never provide the same value as a senior engineer who actually understands the system.
But that requires paying senior engineers and accepting slower ship times. So instead, we'll keep using AI code review, keep shipping bugs, and keep wondering why code quality is declining.
The tools aren't making code better. They're just making it easier to pretend we reviewed it.


