Code Review Culture: How I Reduced Production Bugs by 40%

When I took over as Tech Lead at Pranshtech Solutions, code review was theater. PRs went up, a teammate would click through the diff, comment 'LGTM', and merge. We were shipping 30–40 PRs a week. We were also debugging production incidents every few days.

Three months after I rebuilt the review process, production incidents dropped by roughly 40%. Not because the developers got better they were already good. Because the process gave them the right information at the right moment, and made the friction go in the right direction.

The Before State

Diagnosing the problem first. The PRs we had looked like this:

Average diff size: 520 lines changed per PR
Average review time: 4 minutes (I checked the GitHub timestamps)
Review comments per PR: 0.8 most PRs had zero comments
Common review comment: 'LGTM' or 'Looks good, merge'
Merge-to-incident correlation: impossible to trace because no one linked commits to incidents

The underlying issue wasn't laziness. Large diffs are genuinely impossible to review properly. You can't hold 500 lines of context in your head and spot the off-by-one in the pagination logic and the missing null check in the auth middleware at the same time. People defaulted to LGTM because there was no other rational option.

Step 1: The PR Template

I added .github/pull_request_template.md. Every new PR now auto-populates with this structure:

.github/pull_request_template.md

## What changed
<!-- One paragraph. What does this PR do? Not HOW, but WHAT. -->

## Why
<!-- What problem does this solve? Link the Jira ticket or issue. -->
Closes #

## How to test
<!-- Step-by-step. Don't assume the reviewer knows the feature. -->
1.
2.
3.

## Screenshots (for UI changes)
<!-- Before and after. Drag images here. -->

## Checklist
- [ ] No `console.log` left in
- [ ] Error cases handled (what happens if the API is down? if input is invalid?)
- [ ] Tests added or updated
- [ ] No new dependencies without discussion
- [ ] Database migrations are reversible

The template does two things. It forces the author to think through the change before requesting review. And it gives the reviewer a map they know what they're supposed to be verifying before they read a single line of code.

Tip

The 'How to test' section alone cut our review time significantly reviewers stopped asking clarifying questions in Slack and started just running the code. The template moved that context from chat history into the PR itself.

Step 2: The Review Checklist

I wrote a review checklist and shared it in our engineering Notion. Not mandatory for every review, but a reference to check against when something feels off:

Security: Is user input sanitized before going into a query? Are API endpoints authenticated? Are secrets in env vars, not in code?
Edge cases: What happens with empty arrays, null values, network timeouts, concurrent requests?
Performance: Any N+1 queries? Any operations in a loop that should be batched? Any missing database indexes?
Error handling: Are errors caught and logged? Does the user get a meaningful message or a raw stack trace?
Readability: Would a developer unfamiliar with this code understand it in 5 minutes? Are variable names descriptive?
Tests: Do the tests actually test the behavior, or just the happy path?

I didn't ask reviewers to explicitly check every box on every PR. The list exists to jog memory and to create a shared vocabulary. When I leave a comment about an N+1 query, everyone on the team knows what that means and why it matters.

Step 3: The Size Limit

This was the most controversial change: no PR over 200 lines of changed code without a written justification in the PR description.

The resistance was immediate. 'Migrations are 600 lines, I can't split that.' 'This refactor touches 40 files, there's no way to split it.' Fair so the rule isn't a hard block, it's a conversation trigger. A PR over 200 lines needs a comment explaining why it can't be smaller. That comment forces the author to think about whether they've actually tried to split it.

In practice, about 70% of large PRs could be split. The other 30% were legitimately large (schema migrations, dependency upgrades) and got labeled large-pr which meant they needed two reviewers instead of one, and got blocked for a dedicated review session rather than the async queue.

.github/workflows/pr-size-check.yml

name: PR Size Check

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  check-size:
    runs-on: ubuntu-latest
    steps:
      - name: Check PR size
        uses: actions/github-script@v7
        with:
          script: |
            const { data: pr } = await github.rest.pulls.get({
              owner: context.repo.owner,
              repo: context.repo.repo,
              pull_number: context.issue.number,
            });

            const additions = pr.additions;
            const deletions = pr.deletions;
            const total = additions + deletions;

            if (total > 200) {
              // Add a label  doesn't block, just flags
              await github.rest.issues.addLabels({
                owner: context.repo.owner,
                repo: context.repo.repo,
                issue_number: context.issue.number,
                labels: ["large-pr"],
              });

              core.warning(`This PR changes ${total} lines. Consider splitting into smaller PRs. If this size is necessary, add a justification to the PR description.`);
            }

Step 4: Reviews as Mentoring

This is the change that had the biggest long-term effect, and it's the hardest to automate. I changed how I wrote review comments.

Before: 'Change this to use Promise.allSettled instead.'

After: 'This uses Promise.all, which will reject the entire batch if any single request fails. In a user-facing API, that means one bad user ID causes everyone else's data to disappear. Promise.allSettled processes all items and lets you handle successes and failures individually see MDN for examples. Worth switching here.'

The longer comment takes 2 more minutes to write. But the developer now understands why and they won't reach for Promise.all in this situation again. Over 6 months, each developer on the team absorbs dozens of these explanations. That compounds.

Note

Distinguish blocking from non-blocking comments. I prefix non-blocking suggestions with nit: or optional:. This lets the author merge without addressing minor style preferences, while still seeing the feedback. Conflating 'must fix' and 'would be nice' is how review threads become adversarial.

Step 5: Measuring It

The 40% reduction is a real number, but measuring it required agreeing on a definition first.

A 'production bug' in our tracking: any issue that required a hotfix deploy, caused user-visible errors (captured in Sentry), or triggered a customer support ticket referencing broken behavior. We excluded performance degradation, missing features, and UX improvements those aren't bugs in the relevant sense.

Month before process change: 23 production bugs across 8 deployments
Month 1 after: 19 bugs smaller drop, team was still learning the new process
Month 2 after: 16 bugs trend confirmed
Month 3 after: 14 bugs roughly 39% reduction vs baseline

Causation is hard to prove. We also onboarded a new monitoring tool around the same time, and caught some latent bugs that way. But the timing correlation is strong, and qualitatively the team started catching the same categories of issues (missing null checks, unhandled promise rejections, N+1 queries) in review rather than in production.

Handling the Resistance

The argument you'll hear: 'Code review slows us down.' The correct response is data, not philosophy.

A production incident at Pranshtech typically costs 2–4 hours: the alert, the diagnosis, the hotfix, the deploy, the post-mortem. A thorough code review takes 30–45 minutes. One prevented incident pays for 4–8 review sessions. The math isn't close.

The real slowdown isn't reviews it's large PRs and unclear PR descriptions, which make reviews take longer and require async Q&A. Both of those are fixed by the process changes above, not by doing fewer reviews.

What NOT To Do

Don't fight style wars in reviews that's what ESLint and Prettier are for. Configure them once, enforce them in CI, never comment on indentation or quote style again.
Don't block on non-blocking issues if a comment is optional, say so explicitly. Using nit: or optional: syntax reduces friction significantly.
Don't review code you don't understand if a PR is in a domain you're unfamiliar with, ask the author for a 10-minute walkthrough before reviewing async. A confused LGTM is worse than no review.
Don't use reviews to score points the goal is to ship good software, not to demonstrate your knowledge. A comment that says 'change X to Y' with no explanation serves your ego, not the team.
Don't make reviews feel adversarial start comments with 'I think' or 'one option might be' for suggestions. Reserve direct imperatives for actual bugs or security issues.

“The best code review comment is one that the author reads, thinks 'oh, I didn't consider that,' and then never needs to see again because it became part of how they think about code.”
Something I tell my team