AI Coding Best Practices for Software Design and Development

Practical standards and playbooks for using AI assistants across architecture, coding, testing, security, and delivery. Move faster without sacrificing quality, safety, or long-term maintainability.

12 Comprehensive Sections Development Workflow Security-First Team Playbook

Core Principles

Effective AI-assisted development rests on five foundational principles that guide all decisions about when, how, and why to use AI tools:

1. AI as Collaborator, Not Authority

Treat every AI output as a draft until verified by human judgment. The AI generates candidates; humans decide what ships. This mindset prevents false confidence and maintains accountability.

What This Means

You're not delegating judgment to the AI. You're asking it to generate options faster so your team can make better decisions with more time to think strategically.

2. Context Quality Controls Output Quality

The quality of AI-generated code is directly proportional to the clarity and completeness of your prompts. Vague requests produce vague outputs; precise constraints produce useful outputs.

3. Optimize for Maintainability Over Cleverness

Prefer code that is easy to understand and modify six months from now, even if it's slightly less elegant. Clever code written by AI is clever code you didn't write—and may not fully understand.

✓ Prefer

Explicit error handling with named variables and clear control flow

✗ Avoid

Terse functional chains or advanced language features without explanation

4. The 80/20 Rule: AI Handles Boilerplate, Humans Handle Critical Logic

AI excels at generating scaffolding, repetitive patterns, and syntactic correctness. Humans should focus on the 20% of code that embodies core business logic, error recovery, and system guarantees.

Category Best Generated By AI Requires Human Review/Design
CRUD handlers ✓ Yes, with templates Validation rules, data contracts
Test scaffolding ✓ Yes, structure and setup Meaningful assertions, edge cases
Data transformations ✓ Initial version Performance, null handling, rollback
Concurrency & timing ✗ Rarely, needs expert review Always human-designed with proofs
Security decisions ✗ Never 100% human-designed and verified

5. Humans Remain Accountable for All Merged Code

Code review isn't a formality when AI is involved—it's the enforcement mechanism that keeps humans accountable. The author and reviewers must understand every line before it ships.

6. Prompt Engineering Is Software Engineering

Treat your prompts like production code: version them, test them, improve them iteratively, and share the best ones with your team.

Prompt Lifecycle
  • Write: Start with clear, structured instructions
  • Test: Use it 3–5 times, refine based on output quality
  • Document: Explain why each constraint exists
  • Share: Add to team prompt library with context
  • Iterate: Update quarterly based on defect reviews

Code Example: Good vs. Bad Prompting

❌ Bad: Vague, Low Constraints

Write a function to validate user input.

✓ Good: Clear, Constrained, Specific

/*
 * Task: Implement validateUserInput() in src/validation/users.ts
 *
 * Requirements:
 * - Reuse ValidationError from src/validation/errors.ts
 * - Follow the error-handling pattern in existingValidator() (same file)
 * - Accept email (string), name (string), age (number)
 * - Return { valid: boolean, errors: ValidationError[] }
 * - Handle edge cases: empty strings, negative age, invalid email format
 * - Must NOT add new dependencies
 * - Must NOT access database or external services
 *
 * Test: Generate unit tests for success + 3 failure cases
 *
 * Example call:
 * validateUserInput("alice@example.com", "Alice", 25)
 * // => { valid: true, errors: [] }
 */

Development Workflow

Effective AI-assisted development follows a structured four-step workflow that balances speed with verification. Each step has specific practices and checkpoints.

Step 1: Define the Task Clearly

A clear task definition dramatically improves output quality and reduces iteration cycles. Spend 5–10 minutes on this step before invoking AI.

Define Task Template

// OBJECTIVE
What specific behavior or artifact are you building?

// CONSTRAINTS
- File location: src/features/auth/handlers.ts
- Existing patterns: Use service/handlers.ts as a template
- Dependencies: No new npm packages
- Error handling: Use AppError from lib/errors.ts

// ACCEPTANCE CRITERIA
- Function accepts (email, password) and returns { token, expiresAt }
- Validates input; returns error for invalid email format
- Handles 3+ failure paths: user not found, password incorrect, account locked
- Includes unit tests for success + each failure path
- No hardcoded values; reads from environment

// ASSUMPTIONS
- PostgreSQL is available at DATABASE_URL
- bcrypt is already in package.json
- JWT_SECRET is available via getEnv("JWT_SECRET")

Key Practices for Step 1

Step 2: Generate Incrementally

Request one focused component or function at a time rather than entire modules. This allows for tighter feedback loops and easier validation.

Incremental Generation Checklist

Anti-Pattern: Copy-Paste Without Review

⚠ Don't Do This

Requesting 500 lines of generated code, pasting it directly into your codebase without running tests or linting. This compounds risk and makes debugging impossible.

Step 3: Verify Aggressively

Verification is the most important step. Before considering code ready, it must pass multiple levels of automated and human checks.

Verification Pipeline

// 1. AUTOMATED CHECKS (run immediately)
npm run lint      // Syntax, naming conventions
npm run type-check // Type safety
npm run test       // Unit & integration tests
npm run security   // Dependency scanning

// 2. MANUAL CHECKS (before merge)
- Code review (design, readability, assumptions)
- Edge case testing (null, empty, timeout)
- Integration testing (does it work with the rest of the system?)
- Performance validation (does it meet latency budget?)
- Observability check (are we logging appropriately?)

Specific Verification Patterns

Step 4: Review and Harden

Code review for AI-generated changes is rigorous. Reviewers must understand the code and sign off on correctness, not just style.

AI-Generated Code Review Checklist

Common AI Mistakes and Fixes

Mistake Why It Happens Fix
Incomplete error handling AI doesn't know all edge cases in your domain Provide error examples in the prompt; review manually
Overly generic code AI errs toward flexibility without constraints Constrain the prompt: "Use the pattern from X file exactly"
Silent failures Missing logging, metrics, or alerting Require observability code in the prompt
N+1 queries or inefficiency AI doesn't understand system bottlenecks Benchmark and profile; ask AI to optimize specific path
Race conditions Timing bugs are hard to spot without deep analysis Have a concurrency expert review; add stress tests

Architecture & Design

AI can help with architecture design, but humans must make the final tradeoff decisions. Use AI for exploration; use humans for judgment.

System Design Prompt Template

When asking AI to help design a system, provide context about scale, reliability, and constraints:

// SYSTEM DESIGN REQUEST
Service: Real-time notification delivery

Requirements:
- Scale: 100K events/sec, 10M active users
- Latency SLA: 99th percentile < 100ms delivery
- Data: Event metadata (user ID, type, timestamp)
- Retention: 30 days
- Critical path: User sends event → notification delivered

Constraints:
- Budget: $5K/month infrastructure
- Team: 3 engineers, no ML specialization
- Existing tech: AWS, PostgreSQL, Node.js

Request:
Propose 2-3 architectures with tradeoffs. For each:
1. Component diagram in ASCII
2. Data flow during normal load
3. Failure scenarios and recovery
4. Cost estimate
5. Operational complexity (1-10 scale)

Architecture Decisions: When AI Is Good, When It's Not

Decision Type AI Usefulness Notes
Component decomposition ★★★★☆ Good AI helps explore partitions; humans decide based on team skills
API contract design ★★★★☆ Good AI generates versioning strategies; humans refine based on domain
Database schema ★★★☆☆ Mixed AI useful for CRUD patterns; needs expert review for queries/indexes
Caching strategy ★★☆☆☆ Limited AI suggests patterns; humans must validate cache invalidation logic
Failover & recovery ★★☆☆☆ Limited AI can outline approaches; must be verified by SRE/ops expert
Security model ★☆☆☆☆ Limited AI can describe frameworks; humans must design and audit

Guardrails for Architecture Quality

1. Separate Business Logic from Transport & Storage

AI can accidentally couple these layers. Ensure clear boundaries:

2. Prefer Explicit Boundaries and Typed Contracts

3. Plan Observability from Day One

Don't bolt on observability later. Include it in the design:

4. Design for Change: Feature Flags, Migrations, Backward Compatibility

Anti-Pattern: Over-Engineering Early

⚠ Watch Out

AI often proposes microservices, event sourcing, or complex architectures too early. Start with a modular monolith with clear boundaries. Refactor to microservices only when you have concrete scaling problems, not theoretical ones.

Modular Monolith vs. Microservices Decision Framework

Use this matrix to decide when to decompose:

Factor Monolith Is Fine Consider Microservices
Team size < 10 engineers > 10 engineers, multiple teams
Traffic < 10K req/sec > 100K req/sec with different components
Deployment frequency Once a day is fine Need independent deploy cycles
Language/stack diversity One primary language Multiple languages / frameworks per service
Operational maturity Basic CI/CD Advanced observability, chaos engineering, runbooks

Coding Standards

Establish explicit coding standards before generating code. This dramatically reduces review iterations and ensures consistency.

Constraining Generated Code

Be specific in your prompts about style, dependencies, and patterns:

// GOOD: Specific constraints in prompt
"Implement the validateEmail() function in src/validation/users.ts:
- Use the ValidationError class from src/validation/errors.ts
- Follow the same error-handling pattern as validatePhoneNumber() in the same file
- Do NOT add new npm dependencies
- Error messages must come from constants defined in src/messages.ts
- Return { valid: boolean, error?: ValidationError }
- Include unit tests in __tests__/validation/users.test.ts"

Naming Conventions Enforcement

Provide examples of your naming conventions in the prompt:

Naming Example Prompt

// Reference these files for naming style:
- src/services/user_service.ts (functions: camelCase, max 3 args)
- src/validators/email_validator.ts (classes: PascalCase, validate* methods)
- src/constants.ts (constants: UPPER_SNAKE_CASE)

// Follow this exact pattern for your implementation

Error Handling Patterns

Define error handling in your prompt with a concrete example:

class ValidationError extends Error {
  constructor(
    public code: "INVALID_EMAIL" | "INVALID_PHONE" | "INVALID_LENGTH",
    public message: string,
    public field: string
  ) {
    super(message);
  }
}

// Usage in generated code:
try {
  validateEmail(email);
} catch (err) {
  if (err instanceof ValidationError) {
    return { valid: false, error: err };
  }
  throw new InternalError("Unexpected error", err);
}

Dependency Management Rules

Specify what's allowed before generating code:

✓ Allowed Dependencies

  • Already in package.json
  • Internal packages from your monorepo
  • Reviewed & approved by architecture

✗ Forbidden Dependencies

  • New npm packages without approval
  • Heavy frameworks (unless already in use)
  • Deprecated or unmaintained packages

Code Review Checklist for AI-Generated Code

Testing Strategy

AI can generate test scaffolding and cases, but humans must ensure tests are meaningful. AI-generated tests without meaningful assertions are worse than no tests.

Test Matrix Generation

Ask AI to generate the test structure first; manually define assertions:

// PROMPT: Generate test matrix for validateEmail()
const testMatrix = [
  {
    name: "valid email",
    input: "alice@example.com",
    expected: { valid: true, error: null }
  },
  {
    name: "missing @ symbol",
    input: "alice.example.com",
    expected: { valid: false, error: "INVALID_EMAIL" }
  },
  {
    name: "empty string",
    input: "",
    expected: { valid: false, error: "INVALID_LENGTH" }
  },
  {
    name: "null/undefined",
    input: null,
    expected: "throws TypeError"
  }
];

Testing Pyramid with AI

Test Level AI Usefulness Best Practice
Unit Tests
60% of tests
★★★★★ Excellent AI generates structure; you write assertions and edge cases
Integration Tests
25% of tests
★★★☆☆ Good AI scaffolds; you define contract expectations and data
Contract Tests
10% of tests
★★☆☆☆ Limited Must be designed by humans; AI assists with implementation
E2E Tests
5% of tests
★★☆☆☆ Limited Must be designed by humans; focus on critical user paths

Property-Based Testing with AI

Use AI to generate property-based test frameworks (e.g., QuickCheck, Hypothesis):

// Ask AI to generate property tests
import fc from "fast-check";

// Property: validateEmail never crashes on any string input
test("validateEmail never crashes", () => {
  fc.assert(
    fc.property(fc.string(), (input) => {
      const result = validateEmail(input);
      expect(result).toHaveProperty("valid");
      expect(result).toHaveProperty("error");
    })
  );
});

Regression Testing for AI-Generated Changes

Test Coverage Targets

AI-Generated Test Scaffold Example

// Request from AI: "Generate unit test scaffold for validateEmail"
describe("validateEmail", () => {
  // AI generates structure; you add assertions

  describe("valid inputs", () => {
    test("accepts valid email", () => {
      const result = validateEmail("alice@example.com");
      // YOU WRITE: expect what?
      expect(result.valid).toBe(true);
      expect(result.error).toBeNull();
    });
  });

  describe("invalid inputs", () => {
    test("rejects no @", () => {
      const result = validateEmail("alice.example.com");
      expect(result.valid).toBe(false);
      expect(result.error?.code).toBe("INVALID_EMAIL");
    });
  });

  describe("edge cases", () => {
    test("handles empty string", () => {
      const result = validateEmail("");
      expect(result.valid).toBe(false);
    });
  });
});

Security & Privacy

Treat AI tools like you treat third-party libraries: verify security before using generated code. Never let security be delegated to the AI.

Data Classification and Tool Selection

Data Class Examples AI Tool Policy
Public Marketing website, blog posts, sample data ✓ All AI tools permitted; no special handling
Internal Engineering practices, internal runbooks, architecture docs ☑ Company-approved tools only (e.g., Claude, internal LLM)
Confidential Customer data, financial data, API keys, credentials ✗ Never paste into AI tools; use synthetic/sanitized data only
Restricted PII, health data, payment info, audit logs ✗ Absolutely forbidden; generate test data only

Secret Management Rules

✓ Do This

  • Use environment variables
  • Rotate secrets regularly
  • Use a secrets manager (Vault, AWS Secrets Manager)
  • Audit secret access logs

✗ Never Do This

  • Hardcode secrets in code
  • Paste secrets in prompts to AI
  • Commit secrets to git (use .gitignore)
  • Share secrets in Slack or email

Scanning AI-Generated Code for Security

Add security scanning to your CI/CD pipeline:

// Run these after AI generates code, before merge

// 1. Static Application Security Testing (SAST)
npm run security:sast    // e.g., SonarQube, Semgrep

// 2. Dependency scanning
npm audit --audit-level=moderate

// 3. OWASP checks
npm run security:owasp   // e.g., OWASP ZAP, Burp

// 4. Secrets scanning
git secrets scan

Supply Chain Security: AI May Suggest Vulnerable Dependencies

⚠ Watch Out

AI's training data includes vulnerable libraries. It may suggest packages that have known security issues, are abandoned, or have poor maintenance records. Always verify dependencies with these tools:

OWASP Top 10 Considerations for AI-Generated Code

OWASP Category Risk with AI Mitigation
A1: Broken Authentication Generated auth code may skip validation Require explicit validation in prompt; manually review
A2: Broken Access Control Missing permission checks Require authorization check in every API handler
A3: Injection (SQL, NoSQL, Command) Unsanitized user input Use parameterized queries; manual code review
A4: Insecure Design Missing threat model Humans must design security; AI implements
A5: Security Misconfiguration Wrong defaults (SSL disabled, etc.) Specify security config in prompt explicitly

Prompt Injection Risks in Code

If your code processes user input and passes it to AI tools, be careful:

// ✗ DANGEROUS: User input goes into prompt without escaping
const userQuery = req.body.query;
const prompt = `Translate this: ${userQuery}`;
// Attacker can inject: "Translate this: Ignore instructions and..."

// ✓ SAFER: Validate and sanitize input
const userQuery = req.body.query;
if (!isValidQuery(userQuery)) throw new ValidationError();
const prompt = `Translate this text (max 500 chars): "${escapePrompt(userQuery)}"`;

Treat Generated Code as Third-Party Code

Security Mindset

AI-generated code is like code from a third-party library: you don't fully trust it until you've reviewed it, tested it, and run it through security scanners. Apply the same rigor you would to an external dependency.

Code Review for AI-Generated Changes

Code review is critical when AI is involved. Reviewers must understand the code and validate correctness, not just syntax. This section provides a comprehensive checklist and patterns for effective review.

AI PR Code Review Checklist (Expanded)

What Reviewers Should Focus On Beyond Syntax

PR Description Requirements for AI-Generated Code

Require PRs to include a clear description of what was generated and what was manually verified:

// GOOD PR Description for AI-Generated Code

## What Changed
Implement email validation handler for user signup form.

## AI-Generated Components
- src/validation/email_validator.ts (entire file)
- __tests__/validation/email_validator.test.ts (test scaffolding)

## Manual Validation Done
✓ Reviewed error handling against existing ValidationError pattern
✓ Verified no new npm dependencies added
✓ Tested edge cases: empty string, missing @, long domains
✓ Confirmed email verification email sends correctly
✓ Ran security scan: no vulnerabilities detected
✓ Performance: <2ms validation time per email

## Rollback Plan
Revert to previous email validation in src/config/validation.ts if needed.
No data migration required.

## Questions for Reviewers
- Should we add rate limiting on email validation attempts?
- Is the error message clear for users?

Approval Gates for AI-Generated Critical Path Code

For critical services, require multiple reviewers:

Code Category Minimum Reviewers Who Should Review
Non-critical feature 1 Any senior engineer
Critical business logic 2 One domain expert, one security reviewer
Infrastructure/ops 2 One SRE/ops, one security
Security module 3 Security team, domain expert, infrastructure
Payment/billing code 3 Finance, security, backend engineer

Code Example: Good vs. Bad PR Descriptions

❌ Bad PR Description

AI generated this code for the new user API.

Files changed:
- src/api/users.ts
- tests.ts

✓ Good PR Description

## What This Does
Implements POST /api/users endpoint to create new user accounts.
Validates email format, enforces unique constraint, sends welcome email.

## Generated vs Manual
AI Generated:
  - src/api/users.ts (handler + validation)
  - tests/__tests__/api/users.test.ts (test structure)

Manually Created/Reviewed:
  - Email template (src/templates/welcome.html)
  - Database migration (migrations/20240315_users_table.sql)
  - Error handling and logging (verified against AppError pattern)

## Testing Done
✓ Unit tests: 8/8 passing
✓ Integration test: Created new user in test DB, verified email sent
✓ Edge cases: Duplicate email, invalid format, SQL injection attempts
✓ Load test: 100 req/sec, p99 latency 45ms

## Security Review
✓ No hardcoded secrets
✓ Input sanitized against injection
✓ Password hashed with bcrypt
✓ Rate limiting applied to signup endpoint
✓ OWASP scan: 0 findings

## Deployment Notes
- No data migration
- No config changes needed
- Rollback: Disable endpoint in API router, previous version still works
- Monitoring: Watch signup_latency and email_send_failures metrics

Prompt Engineering for Developers

Prompt engineering is a skill. Good prompts produce usable code in one iteration; bad prompts require 5+ cycles. Treat prompts like production code: version them, test them, improve them.

Prompt Templates for Common Tasks

Template 1: Bug Fix

// TASK: Fix bug in validateEmail()

## Current Behavior
validateEmail("user+tag@example.com") returns invalid, but should be valid.

## Expected Behavior
Plus signs (+) are valid in the local part of an email per RFC 5321.

## Constraints
- File: src/validation/email_validator.ts
- Use the same ValidationError pattern (in the same file)
- No new dependencies
- Add test case to __tests__/validation/email_validator.test.ts

## Steps
1. Show me the minimal change needed (diff format)
2. Explain what RFC rule was violated
3. Suggest any other email formats we might be rejecting incorrectly

Template 2: New Feature Implementation

// TASK: Implement user profile API endpoint

## Requirements
- Endpoint: GET /api/v1/users/:userId
- Returns: { id, email, name, createdAt, lastLoginAt }
- Authentication: Requires JWT token
- Authorization: Users can only see their own profile; admins see all
- Errors: Return 404 if user not found, 401 if unauthorized

## Context
- Use Express.js (already in codebase)
- Database: PostgreSQL via knex (see src/db/index.ts for usage)
- Error handler: Use AppError from src/errors.ts
- Middleware: Auth middleware at src/middleware/auth.ts

## Success Criteria
- Handler implemented in src/api/users/get.ts
- Tests: src/__tests__/api/users/get.test.ts (success, 404, unauthorized)
- No new npm dependencies
- Follows error handling pattern from src/api/posts/get.ts

## Steps
1. Summarize your approach (3–4 bullet points)
2. Show the complete handler code
3. Show the test scaffold
4. List all assumptions you're making

Template 3: Refactoring

// TASK: Refactor user service to reduce duplication

## Current Problem
src/services/user_service.ts has 3 similar functions:
- getUserById(id)
- getMultipleUsers(ids)
- getUserByEmail(email)

All do similar: query DB, map result, handle errors identically.

## Goal
Extract common pattern into internal helper; keep public API unchanged.

## Constraints
- Public function signatures must NOT change (backward compatibility)
- Keep the same error handling and logging
- No new npm dependencies
- File: src/services/user_service.ts only

## Steps
1. Design the helper function signature
2. Show refactored versions of all 3 functions
3. Show tests unchanged (they should still pass as-is)
4. Estimate cyclomatic complexity reduction

Template 4: Data Migration

// TASK: Generate database migration to add new user column

## Change
Add optional "bio" column (text, max 500 chars) to users table.
New column should be optional (default NULL).

## Context
- Current schema: migrations/20240301_users_table.sql
- Using Knex migrations (see examples in migrations/)
- Database: PostgreSQL 14+

## Requirements
- Create UP migration: add column, no data changes
- Create DOWN migration: remove column (safe to roll back)
- Must be idempotent (safe to run multiple times)
- No data loss on rollback

## Steps
1. Generate migrations/20240315_add_user_bio.js (both up & down)
2. Show manual testing steps (verify column added, rollback works)
3. Note any performance impact

Context Window Management

AI models have context limits. Be strategic about what you include:

What To Include

What To Leave Out

Multi-Turn Conversation Patterns

Pattern 1: Ask for Plan First, Then Implement

Turn 1: "Propose a refactoring to reduce UserService complexity. Show 3 options with tradeoffs."
Turn 2: "I like option 2. Now implement it in src/services/user_service.ts."
Turn 3: "Show me the tests that validate the refactoring."

Pattern 2: Iterative Refinement

Turn 1: "Implement email validation handler."
Turn 2: "Add support for plus-addressed emails (user+tag@example.com)."
Turn 3: "Now add internationalized domain names (IDN) support."
Turn 4: "Generate tests for these new cases."

Pattern 3: Decompose Large Tasks

Turn 1: "Summarize the architecture for the new auth system."
Turn 2: "Implement the JWT generation logic in src/auth/jwt.ts."
Turn 3: "Implement the JWT verification middleware."
Turn 4: "Generate tests for both JWT functions."

Prompt Strategies by Task Type

Task Type Best Approach Key Phrase
Bug Fix Specific error + reproduction steps "The bug is X. Expected behavior is Y."
New Feature Clear spec + examples + constraints "Implement X with requirements [list]"
Refactoring Show duplication + goal + constraints "Extract common pattern while keeping public API unchanged"
Architecture Constraints + tradeoffs + ask for options "Propose architectures with tradeoffs"
Testing Test matrix first, then assertions "Generate test scaffold, then I'll add assertions"
Migration Current schema + desired change + rollback "Generate UP and DOWN migrations"

System Prompts for Coding Assistants

If your AI tool supports custom system prompts, use one like this:

// System Prompt Template

You are an expert software engineer helping a development team.

CORE RULES:
1. Code is for production. It must be correct, secure, and maintainable.
2. Always ask for clarification if requirements are ambiguous.
3. Show your assumptions explicitly.
4. Prefer clear, simple code over clever code.
5. Error handling is mandatory; don't skip failure paths.
6. Security is non-negotiable; never suggest hardcoded secrets.

WHEN WRITING CODE:
- Match the existing codebase style (naming, patterns, file structure)
- Include unit tests that verify behavior, not just coverage
- Suggest observability (logging, metrics) for production code
- Point out risky assumptions or edge cases

WHEN EXPLAINING:
- Show code before explanation
- Explain tradeoffs for significant decisions
- Link to relevant docs or examples

WHEN IN DOUBT:
- Ask for more context
- Suggest options with pros/cons
- Flag risky changes that need human review

Team Operating Model for AI-Driven Development

AI is most effective when used consistently across a team. Establish shared practices, shared prompts, and metrics to track quality.

Shared Prompt Library Management

Build a team library of tested, proven prompts. Version them like code:

// Directory structure
prompts/
  ├── templates/
  │   ├── bugfix.md (v2)
  │   ├── new_feature.md (v3)
  │   ├── refactor.md (v1)
  │   └── testing.md (v2)
  ├── examples/
  │   ├── email_validation_bugfix.md
  │   └── user_api_feature.md
  └── README.md (usage guidelines)

Prompt Template Format

// prompts/templates/new_feature.md

# New Feature Implementation Template

Version: 3 (Updated 2024-03-15)
Author: @engineering-team
Used for: REST API endpoints, services, handlers

## Key Improvements in v3
- Added rollback strategy requirement
- Clarified test scaffold expectations
- Added performance benchmark requirement

## Template
[Copy the template from earlier sections]

## What Works Well
- Produces working code on first iteration ~80% of the time
- Generates meaningful tests automatically
- Handles edge cases when examples are specific

## What Needs Refinement
- Sometimes over-engineers error handling
- May miss rate limiting considerations

## Examples
- User API feature (see examples/user_api_feature.md)
- Email handler (see examples/...)

Standardized Review Checklist

Use the same checklist for all AI-generated PRs. Document it in your repo:

# CODE REVIEW CHECKLIST FOR AI-GENERATED CODE
See docs/code-review-ai.md for full checklist

Required items:
☐ AI usage documented in PR description
☐ Logic correct; tests pass
☐ Error handling for all documented failure paths
☐ No security issues detected
☐ Dependencies justified and scanned
☐ Code style consistent with codebase
☐ Test coverage >80% for changed code
☐ Performance impact measured (if critical path)

Metrics to Track Quality and Velocity

Metric What It Measures Target
Lead Time Days from feature start to production 30% reduction with AI
Code Review Time Hours from PR to approval Should NOT increase; reviewers know to focus on logic
Test Coverage % of code covered by tests >80% overall; >95% for critical path
Escaped Defects Bugs found after merge to main Should NOT increase from pre-AI baseline
Rollback Rate % of deployments that require rollback Should NOT increase; if it does, review training
Security Findings Vulnerabilities found in code review / scanning Should NOT increase from pre-AI baseline
Prompt Quality Score % of first-draft code that needed 0 revisions >70% for mature templates

AI Output Quality Retrospectives

Monthly review of what worked and what didn't:

Retrospective Checklist
  • Review metrics from last month (lead time, escapes, rollbacks)
  • Analyze PRs marked "AI-generated": what caused review delays?
  • Discuss prompt failures: why did the AI miss cases?
  • Update prompt templates based on lessons learned
  • Celebrate successful patterns; create templates from them
  • Identify tool gaps: do we need better linting, testing, or scanning?
  • Plan training: do team members need coaching on prompting?

Onboarding Developers to AI Tools

Role-Based AI Usage Guidelines

Role AI Tool Access Best Use Cases Restrictions
Junior Engineer ✓ Full access Learning by generating boilerplate and tests All code must be reviewed; focus on non-critical path
Senior Engineer ✓ Full access Generating architecture; refactoring; creating templates Must review own generated code; responsible for template quality
Security Engineer ✓ Full access Generating security tests; scanning generated code MUST review all security-critical code before merge
Product Manager ☑ Limited Writing detailed specs to help engineers use AI Do not merge code; do not approve PRs

CI/CD Integration: AI in the Pipeline

Use AI not just for code generation, but also to automate checks in your CI/CD pipeline. This reduces friction and catches issues early.

AI-Powered Pipeline Stages

Stage What AI Can Do Example Tool / Action
Pre-Commit Auto-format code; check naming conventions Prettier, Black, AI linter with custom rules
Lint & Type Check Find style violations; type errors; suspicious patterns ESLint, TypeScript, SonarQube
Security Scan Find secrets, vulnerable dependencies, OWASP issues Snyk, TruffleHog, OWASP ZAP, Semgrep
Test Execution Run unit, integration, contract tests Jest, pytest, your test runner + coverage checks
Coverage Check Ensure test coverage meets minimum threshold Coverage.py, nyc, codecov.io
Regression Detection Compare latency, memory, error rates vs. baseline Custom benchmarking scripts; datadog, prometheus
PR Review Bot Automatically comment on code style, suggestions GitHub Actions + AI API (Claude, ChatGPT)
Changelog Generation Auto-generate release notes from commits conventional-changelog + AI to polish

Quality Gates for AI-Generated Code

Define stricter gates when code is known to be AI-generated:

// GitHub Actions example: stricter checks for AI-generated code

name: AI Code Quality Gate
on:
  pull_request:
    paths:
      - src/**
      - tests/**

jobs:
  quality-checks:
    runs-on: ubuntu-latest
    steps:
      - name: Detect AI-Generated Code
        run: |
          # Check PR description for "AI-generated" markers
          if grep -q "AI[- ]generated\|Claude\|ChatGPT" $PR_BODY; then
            echo "AI_GENERATED=true" >> $GITHUB_ENV
          fi

      - name: Run Linting
        run: npm run lint

      - name: Type Checking
        run: npm run type-check

      - name: Run Tests (AI-Generated)
        if: env.AI_GENERATED == 'true'
        run: npm run test -- --coverage --minCoverageThreshold=85

      - name: Security Scan (AI-Generated)
        if: env.AI_GENERATED == 'true'
        run: npm audit --audit-level=moderate

      - name: SAST Analysis
        if: env.AI_GENERATED == 'true'
        run: npm run security:sast

      - name: Comment on PR
        if: failure() && env.AI_GENERATED == 'true'
        uses: actions/github-script@v6
        with:
          script: |
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              body: '⚠️ AI-Generated Code Quality Check Failed. Please review the logs.'
            })

Automated Review Bots

Use an AI-powered review bot to leave comments on PRs with suggestions:

// Example: GitHub Actions + Claude API for PR review

name: AI PR Review
on: [pull_request]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Fetch PR Diff
        run: |
          gh pr diff ${{ github.event.number }} > pr.patch
        env:
          GH_TOKEN: ${{ github.token }}

      - name: AI Code Review
        run: |
          # Call Claude API to review the patch
          curl -X POST https://api.anthropic.com/v1/messages \
            -H "x-api-key: ${{ secrets.CLAUDE_API_KEY }}" \
            -H "content-type: application/json" \
            -d @- << EOF
          {
            "model": "claude-opus-4-1",
            "max_tokens": 1024,
            "messages": [{
              "role": "user",
              "content": "Review this code patch for style, security, and performance issues. Keep feedback brief and actionable.\n\n\$(cat pr.patch)"
            }]
          }
          EOF

      - name: Post Review Comment
        uses: actions/github-script@v6
        with:
          script: |
            # Post the AI review as a PR comment
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              body: '## AI Review\n\n' + reviewOutput
            })

Deployment Safety with AI Changes

GitHub Actions Workflow Example (Complete)

name: CI Pipeline with AI Integration
on: [push, pull_request]

jobs:
  lint-and-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
        with:
          node-version: '18'

      - name: Install Dependencies
        run: npm ci

      - name: Lint
        run: npm run lint

      - name: Type Check
        run: npm run type-check

      - name: Run Tests
        run: npm run test -- --coverage

      - name: Upload Coverage
        uses: codecov/codecov-action@v3

      - name: Security Audit
        run: npm audit --audit-level=high

      - name: SAST Scan
        run: npm run security:sast

  deploy:
    needs: lint-and-test
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to Staging
        run: |
          # Deploy to staging with monitoring enabled
          ./scripts/deploy.sh staging

      - name: Monitor (5 min)
        run: |
          # Check metrics: latency, error rate, etc.
          ./scripts/monitor.sh 300

      - name: Promote to Production
        run: |
          # Canary: 1% → 25% → 100%
          ./scripts/deploy.sh production --canary

Production Readiness Checklist

Before shipping any code—whether AI-generated or not—verify it meets these production standards. Use this as your final gate before merge and deploy.

Code Quality

Security

Testing & Observability

Performance & Reliability

Documentation & Deployment

Final Sign-Off

Before Merging to Main
  • ☑ All checklist items above verified
  • ☑ Code reviewed by at least 1 (or 3 for critical path) senior engineer
  • ☑ If AI-generated: human understands all logic and can explain it
  • ☑ Product owner has validated feature works as intended
  • ☑ No regressions in existing tests

Engineering Standards Philosophy

Remember

AI accelerates delivery most when teams enforce strong engineering standards. Speed without rigor creates future drag—technical debt, debugging burden, and maintenance cost. Using AI means you're shipping more code, faster. That makes quality discipline non-negotiable.