AI Coding Best Practices for Software Design and Development
Practical standards and playbooks for using AI assistants across architecture, coding, testing, security, and delivery. Move faster without sacrificing quality, safety, or long-term maintainability.
Core Principles
Effective AI-assisted development rests on five foundational principles that guide all decisions about when, how, and why to use AI tools:
1. AI as Collaborator, Not Authority
Treat every AI output as a draft until verified by human judgment. The AI generates candidates; humans decide what ships. This mindset prevents false confidence and maintains accountability.
You're not delegating judgment to the AI. You're asking it to generate options faster so your team can make better decisions with more time to think strategically.
2. Context Quality Controls Output Quality
The quality of AI-generated code is directly proportional to the clarity and completeness of your prompts. Vague requests produce vague outputs; precise constraints produce useful outputs.
- Explicit constraints: "No new dependencies", "Use the existing error handler"
- Acceptance criteria: "Must handle timeouts and partial failures"
- Concrete examples: "Follow the pattern in service/handlers.ts"
- Environmental context: "This runs in a containerized environment with 512MB RAM"
3. Optimize for Maintainability Over Cleverness
Prefer code that is easy to understand and modify six months from now, even if it's slightly less elegant. Clever code written by AI is clever code you didn't write—and may not fully understand.
Explicit error handling with named variables and clear control flow
Terse functional chains or advanced language features without explanation
4. The 80/20 Rule: AI Handles Boilerplate, Humans Handle Critical Logic
AI excels at generating scaffolding, repetitive patterns, and syntactic correctness. Humans should focus on the 20% of code that embodies core business logic, error recovery, and system guarantees.
| Category | Best Generated By AI | Requires Human Review/Design |
|---|---|---|
| CRUD handlers | ✓ Yes, with templates | Validation rules, data contracts |
| Test scaffolding | ✓ Yes, structure and setup | Meaningful assertions, edge cases |
| Data transformations | ✓ Initial version | Performance, null handling, rollback |
| Concurrency & timing | ✗ Rarely, needs expert review | Always human-designed with proofs |
| Security decisions | ✗ Never | 100% human-designed and verified |
5. Humans Remain Accountable for All Merged Code
Code review isn't a formality when AI is involved—it's the enforcement mechanism that keeps humans accountable. The author and reviewers must understand every line before it ships.
6. Prompt Engineering Is Software Engineering
Treat your prompts like production code: version them, test them, improve them iteratively, and share the best ones with your team.
- Write: Start with clear, structured instructions
- Test: Use it 3–5 times, refine based on output quality
- Document: Explain why each constraint exists
- Share: Add to team prompt library with context
- Iterate: Update quarterly based on defect reviews
Code Example: Good vs. Bad Prompting
❌ Bad: Vague, Low Constraints
Write a function to validate user input.
✓ Good: Clear, Constrained, Specific
/*
* Task: Implement validateUserInput() in src/validation/users.ts
*
* Requirements:
* - Reuse ValidationError from src/validation/errors.ts
* - Follow the error-handling pattern in existingValidator() (same file)
* - Accept email (string), name (string), age (number)
* - Return { valid: boolean, errors: ValidationError[] }
* - Handle edge cases: empty strings, negative age, invalid email format
* - Must NOT add new dependencies
* - Must NOT access database or external services
*
* Test: Generate unit tests for success + 3 failure cases
*
* Example call:
* validateUserInput("alice@example.com", "Alice", 25)
* // => { valid: true, errors: [] }
*/
Development Workflow
Effective AI-assisted development follows a structured four-step workflow that balances speed with verification. Each step has specific practices and checkpoints.
Step 1: Define the Task Clearly
A clear task definition dramatically improves output quality and reduces iteration cycles. Spend 5–10 minutes on this step before invoking AI.
Define Task Template
// OBJECTIVE
What specific behavior or artifact are you building?
// CONSTRAINTS
- File location: src/features/auth/handlers.ts
- Existing patterns: Use service/handlers.ts as a template
- Dependencies: No new npm packages
- Error handling: Use AppError from lib/errors.ts
// ACCEPTANCE CRITERIA
- Function accepts (email, password) and returns { token, expiresAt }
- Validates input; returns error for invalid email format
- Handles 3+ failure paths: user not found, password incorrect, account locked
- Includes unit tests for success + each failure path
- No hardcoded values; reads from environment
// ASSUMPTIONS
- PostgreSQL is available at DATABASE_URL
- bcrypt is already in package.json
- JWT_SECRET is available via getEnv("JWT_SECRET")
Key Practices for Step 1
- State the objective first: "Implement email validation handler" before requesting code
- Provide file context: Show existing code patterns to follow
- Specify constraints: Dependencies, error handling style, naming conventions
- Ask for a plan first: For complex tasks, request a summary before code: "Summarize your approach in 5 bullet points, then implement"
- Define success explicitly: "This is done when unit tests pass AND we can manually test on staging"
Step 2: Generate Incrementally
Request one focused component or function at a time rather than entire modules. This allows for tighter feedback loops and easier validation.
Incremental Generation Checklist
- One responsibility per request: Generate the handler, then tests, then integration separately
- Request assumptions explicitly: Ask the AI to list what it assumes: "List all assumptions you're making about the environment"
- Ask for a diff/patch format: "Show changes as a patch or git diff" makes it easier to review
- Validate one piece before continuing: Run tests, lint, review before requesting the next piece
- Refine iteratively: "This looks good, now add retry logic with exponential backoff" is faster than asking for everything upfront
Anti-Pattern: Copy-Paste Without Review
Requesting 500 lines of generated code, pasting it directly into your codebase without running tests or linting. This compounds risk and makes debugging impossible.
Step 3: Verify Aggressively
Verification is the most important step. Before considering code ready, it must pass multiple levels of automated and human checks.
Verification Pipeline
// 1. AUTOMATED CHECKS (run immediately)
npm run lint // Syntax, naming conventions
npm run type-check // Type safety
npm run test // Unit & integration tests
npm run security // Dependency scanning
// 2. MANUAL CHECKS (before merge)
- Code review (design, readability, assumptions)
- Edge case testing (null, empty, timeout)
- Integration testing (does it work with the rest of the system?)
- Performance validation (does it meet latency budget?)
- Observability check (are we logging appropriately?)
Specific Verification Patterns
- Error paths: Manually test the failure cases. Does error handling work?
- Concurrency: If the code involves async/await or threading, run stress tests
- Dependencies: Verify no unexpected packages were added; check for security advisories
- Performance: Benchmark changes to hot paths; compare before/after
- Observability: Can you trace execution in production? Are logs meaningful?
Step 4: Review and Harden
Code review for AI-generated changes is rigorous. Reviewers must understand the code and sign off on correctness, not just style.
AI-Generated Code Review Checklist
- ☐ I understand the code logic and can explain it in plain English
- ☐ Error handling covers documented failure cases
- ☐ No hardcoded values; configuration comes from environment or config files
- ☐ Logging is sufficient to debug production issues
- ☐ No security shortcuts (credentials, permissions, data validation)
- ☐ Test coverage includes success path + edge cases
- ☐ Rollback strategy is documented if this is a data migration
- ☐ Dependencies are justified and scanned for security advisories
- ☐ Code style matches existing codebase (naming, file structure)
- ☐ Comments explain the "why", not the "what"
Common AI Mistakes and Fixes
| Mistake | Why It Happens | Fix |
|---|---|---|
| Incomplete error handling | AI doesn't know all edge cases in your domain | Provide error examples in the prompt; review manually |
| Overly generic code | AI errs toward flexibility without constraints | Constrain the prompt: "Use the pattern from X file exactly" |
| Silent failures | Missing logging, metrics, or alerting | Require observability code in the prompt |
| N+1 queries or inefficiency | AI doesn't understand system bottlenecks | Benchmark and profile; ask AI to optimize specific path |
| Race conditions | Timing bugs are hard to spot without deep analysis | Have a concurrency expert review; add stress tests |
Architecture & Design
AI can help with architecture design, but humans must make the final tradeoff decisions. Use AI for exploration; use humans for judgment.
System Design Prompt Template
When asking AI to help design a system, provide context about scale, reliability, and constraints:
// SYSTEM DESIGN REQUEST
Service: Real-time notification delivery
Requirements:
- Scale: 100K events/sec, 10M active users
- Latency SLA: 99th percentile < 100ms delivery
- Data: Event metadata (user ID, type, timestamp)
- Retention: 30 days
- Critical path: User sends event → notification delivered
Constraints:
- Budget: $5K/month infrastructure
- Team: 3 engineers, no ML specialization
- Existing tech: AWS, PostgreSQL, Node.js
Request:
Propose 2-3 architectures with tradeoffs. For each:
1. Component diagram in ASCII
2. Data flow during normal load
3. Failure scenarios and recovery
4. Cost estimate
5. Operational complexity (1-10 scale)
Architecture Decisions: When AI Is Good, When It's Not
| Decision Type | AI Usefulness | Notes |
|---|---|---|
| Component decomposition | ★★★★☆ Good | AI helps explore partitions; humans decide based on team skills |
| API contract design | ★★★★☆ Good | AI generates versioning strategies; humans refine based on domain |
| Database schema | ★★★☆☆ Mixed | AI useful for CRUD patterns; needs expert review for queries/indexes |
| Caching strategy | ★★☆☆☆ Limited | AI suggests patterns; humans must validate cache invalidation logic |
| Failover & recovery | ★★☆☆☆ Limited | AI can outline approaches; must be verified by SRE/ops expert |
| Security model | ★☆☆☆☆ Limited | AI can describe frameworks; humans must design and audit |
Guardrails for Architecture Quality
1. Separate Business Logic from Transport & Storage
AI can accidentally couple these layers. Ensure clear boundaries:
- Domain models live in a core package, independent of HTTP or database
- Use dependency injection or ports-and-adapters pattern
- Write domain logic tests that don't touch the network or database
2. Prefer Explicit Boundaries and Typed Contracts
- Define interfaces/contracts upfront, before AI generates implementations
- Use strong typing (TypeScript, Rust, Java) to catch integration errors
- Document assumptions in code comments
3. Plan Observability from Day One
Don't bolt on observability later. Include it in the design:
- Traces: Every request flows through the system; can we trace it?
- Metrics: Can we see latency, throughput, error rates per component?
- Logs: Are failure scenarios logged with enough context to debug?
- Alerts: What conditions should wake up on-call?
4. Design for Change: Feature Flags, Migrations, Backward Compatibility
- API versioning: New endpoints or request/response versions?
- Data migrations: How do you roll out schema changes without downtime?
- Feature flags: Can you deploy and enable incrementally?
- Rollback: If something breaks, how fast can you revert?
Anti-Pattern: Over-Engineering Early
AI often proposes microservices, event sourcing, or complex architectures too early. Start with a modular monolith with clear boundaries. Refactor to microservices only when you have concrete scaling problems, not theoretical ones.
Modular Monolith vs. Microservices Decision Framework
Use this matrix to decide when to decompose:
| Factor | Monolith Is Fine | Consider Microservices |
|---|---|---|
| Team size | < 10 engineers | > 10 engineers, multiple teams |
| Traffic | < 10K req/sec | > 100K req/sec with different components |
| Deployment frequency | Once a day is fine | Need independent deploy cycles |
| Language/stack diversity | One primary language | Multiple languages / frameworks per service |
| Operational maturity | Basic CI/CD | Advanced observability, chaos engineering, runbooks |
Coding Standards
Establish explicit coding standards before generating code. This dramatically reduces review iterations and ensures consistency.
Constraining Generated Code
Be specific in your prompts about style, dependencies, and patterns:
// GOOD: Specific constraints in prompt
"Implement the validateEmail() function in src/validation/users.ts:
- Use the ValidationError class from src/validation/errors.ts
- Follow the same error-handling pattern as validatePhoneNumber() in the same file
- Do NOT add new npm dependencies
- Error messages must come from constants defined in src/messages.ts
- Return { valid: boolean, error?: ValidationError }
- Include unit tests in __tests__/validation/users.test.ts"
Naming Conventions Enforcement
Provide examples of your naming conventions in the prompt:
- Files: snake_case for utilities (user_service.ts), PascalCase for classes (UserValidator.ts)
- Functions: camelCase, verb-noun pattern (validateEmail, buildResponse, handleError)
- Classes: PascalCase nouns (UserService, EmailValidator, ErrorHandler)
- Constants: UPPER_SNAKE_CASE (MAX_RETRIES, DEFAULT_TIMEOUT)
- Private members: Prefix with underscore or use # for class fields
Naming Example Prompt
// Reference these files for naming style:
- src/services/user_service.ts (functions: camelCase, max 3 args)
- src/validators/email_validator.ts (classes: PascalCase, validate* methods)
- src/constants.ts (constants: UPPER_SNAKE_CASE)
// Follow this exact pattern for your implementation
Error Handling Patterns
Define error handling in your prompt with a concrete example:
class ValidationError extends Error {
constructor(
public code: "INVALID_EMAIL" | "INVALID_PHONE" | "INVALID_LENGTH",
public message: string,
public field: string
) {
super(message);
}
}
// Usage in generated code:
try {
validateEmail(email);
} catch (err) {
if (err instanceof ValidationError) {
return { valid: false, error: err };
}
throw new InternalError("Unexpected error", err);
}
Dependency Management Rules
Specify what's allowed before generating code:
✓ Allowed Dependencies
- Already in package.json
- Internal packages from your monorepo
- Reviewed & approved by architecture
✗ Forbidden Dependencies
- New npm packages without approval
- Heavy frameworks (unless already in use)
- Deprecated or unmaintained packages
Code Review Checklist for AI-Generated Code
Testing Strategy
AI can generate test scaffolding and cases, but humans must ensure tests are meaningful. AI-generated tests without meaningful assertions are worse than no tests.
Test Matrix Generation
Ask AI to generate the test structure first; manually define assertions:
// PROMPT: Generate test matrix for validateEmail()
const testMatrix = [
{
name: "valid email",
input: "alice@example.com",
expected: { valid: true, error: null }
},
{
name: "missing @ symbol",
input: "alice.example.com",
expected: { valid: false, error: "INVALID_EMAIL" }
},
{
name: "empty string",
input: "",
expected: { valid: false, error: "INVALID_LENGTH" }
},
{
name: "null/undefined",
input: null,
expected: "throws TypeError"
}
];
Testing Pyramid with AI
| Test Level | AI Usefulness | Best Practice |
|---|---|---|
| Unit Tests 60% of tests |
★★★★★ Excellent | AI generates structure; you write assertions and edge cases |
| Integration Tests 25% of tests |
★★★☆☆ Good | AI scaffolds; you define contract expectations and data |
| Contract Tests 10% of tests |
★★☆☆☆ Limited | Must be designed by humans; AI assists with implementation |
| E2E Tests 5% of tests |
★★☆☆☆ Limited | Must be designed by humans; focus on critical user paths |
Property-Based Testing with AI
Use AI to generate property-based test frameworks (e.g., QuickCheck, Hypothesis):
// Ask AI to generate property tests
import fc from "fast-check";
// Property: validateEmail never crashes on any string input
test("validateEmail never crashes", () => {
fc.assert(
fc.property(fc.string(), (input) => {
const result = validateEmail(input);
expect(result).toHaveProperty("valid");
expect(result).toHaveProperty("error");
})
);
});
Regression Testing for AI-Generated Changes
- Run full test suite: Ensure no existing tests regress
- Run against real production data: Use anonymized production datasets
- Performance benchmarks: Compare before/after latency and memory
- Canary deployments: Deploy to 1% of traffic first; monitor
Test Coverage Targets
- Critical path logic: 95%+ coverage required
- Error handling: 90%+ coverage required
- Business logic: 80%+ coverage target
- Utility functions: 70%+ coverage target
- Infrastructure code: 30%+ coverage (hard to test fully)
AI-Generated Test Scaffold Example
// Request from AI: "Generate unit test scaffold for validateEmail"
describe("validateEmail", () => {
// AI generates structure; you add assertions
describe("valid inputs", () => {
test("accepts valid email", () => {
const result = validateEmail("alice@example.com");
// YOU WRITE: expect what?
expect(result.valid).toBe(true);
expect(result.error).toBeNull();
});
});
describe("invalid inputs", () => {
test("rejects no @", () => {
const result = validateEmail("alice.example.com");
expect(result.valid).toBe(false);
expect(result.error?.code).toBe("INVALID_EMAIL");
});
});
describe("edge cases", () => {
test("handles empty string", () => {
const result = validateEmail("");
expect(result.valid).toBe(false);
});
});
});
Security & Privacy
Treat AI tools like you treat third-party libraries: verify security before using generated code. Never let security be delegated to the AI.
Data Classification and Tool Selection
| Data Class | Examples | AI Tool Policy |
|---|---|---|
| Public | Marketing website, blog posts, sample data | ✓ All AI tools permitted; no special handling |
| Internal | Engineering practices, internal runbooks, architecture docs | ☑ Company-approved tools only (e.g., Claude, internal LLM) |
| Confidential | Customer data, financial data, API keys, credentials | ✗ Never paste into AI tools; use synthetic/sanitized data only |
| Restricted | PII, health data, payment info, audit logs | ✗ Absolutely forbidden; generate test data only |
Secret Management Rules
✓ Do This
- Use environment variables
- Rotate secrets regularly
- Use a secrets manager (Vault, AWS Secrets Manager)
- Audit secret access logs
✗ Never Do This
- Hardcode secrets in code
- Paste secrets in prompts to AI
- Commit secrets to git (use .gitignore)
- Share secrets in Slack or email
Scanning AI-Generated Code for Security
Add security scanning to your CI/CD pipeline:
// Run these after AI generates code, before merge
// 1. Static Application Security Testing (SAST)
npm run security:sast // e.g., SonarQube, Semgrep
// 2. Dependency scanning
npm audit --audit-level=moderate
// 3. OWASP checks
npm run security:owasp // e.g., OWASP ZAP, Burp
// 4. Secrets scanning
git secrets scan
Supply Chain Security: AI May Suggest Vulnerable Dependencies
AI's training data includes vulnerable libraries. It may suggest packages that have known security issues, are abandoned, or have poor maintenance records. Always verify dependencies with these tools:
- npm audit: Check for known vulnerabilities
- Snyk: Continuous dependency scanning and fixing
- FOSSA / Black Duck: License and supply chain compliance
- Dependabot: Automated dependency updates
OWASP Top 10 Considerations for AI-Generated Code
| OWASP Category | Risk with AI | Mitigation |
|---|---|---|
| A1: Broken Authentication | Generated auth code may skip validation | Require explicit validation in prompt; manually review |
| A2: Broken Access Control | Missing permission checks | Require authorization check in every API handler |
| A3: Injection (SQL, NoSQL, Command) | Unsanitized user input | Use parameterized queries; manual code review |
| A4: Insecure Design | Missing threat model | Humans must design security; AI implements |
| A5: Security Misconfiguration | Wrong defaults (SSL disabled, etc.) | Specify security config in prompt explicitly |
Prompt Injection Risks in Code
If your code processes user input and passes it to AI tools, be careful:
// ✗ DANGEROUS: User input goes into prompt without escaping
const userQuery = req.body.query;
const prompt = `Translate this: ${userQuery}`;
// Attacker can inject: "Translate this: Ignore instructions and..."
// ✓ SAFER: Validate and sanitize input
const userQuery = req.body.query;
if (!isValidQuery(userQuery)) throw new ValidationError();
const prompt = `Translate this text (max 500 chars): "${escapePrompt(userQuery)}"`;
Treat Generated Code as Third-Party Code
AI-generated code is like code from a third-party library: you don't fully trust it until you've reviewed it, tested it, and run it through security scanners. Apply the same rigor you would to an external dependency.
Code Review for AI-Generated Changes
Code review is critical when AI is involved. Reviewers must understand the code and validate correctness, not just syntax. This section provides a comprehensive checklist and patterns for effective review.
AI PR Code Review Checklist (Expanded)
What Reviewers Should Focus On Beyond Syntax
- Logic correctness: Does the code actually solve the problem stated in the ticket?
- Failure modes: What happens if the network fails? If the database is slow? If input is invalid?
- Assumptions: What does the code assume about the environment, data format, or dependencies?
- Long-term cost: Will this code be easy to debug and maintain in 6 months? Will it scale?
- Risk: Could this change cause data loss, downtime, or security issues?
PR Description Requirements for AI-Generated Code
Require PRs to include a clear description of what was generated and what was manually verified:
// GOOD PR Description for AI-Generated Code
## What Changed
Implement email validation handler for user signup form.
## AI-Generated Components
- src/validation/email_validator.ts (entire file)
- __tests__/validation/email_validator.test.ts (test scaffolding)
## Manual Validation Done
✓ Reviewed error handling against existing ValidationError pattern
✓ Verified no new npm dependencies added
✓ Tested edge cases: empty string, missing @, long domains
✓ Confirmed email verification email sends correctly
✓ Ran security scan: no vulnerabilities detected
✓ Performance: <2ms validation time per email
## Rollback Plan
Revert to previous email validation in src/config/validation.ts if needed.
No data migration required.
## Questions for Reviewers
- Should we add rate limiting on email validation attempts?
- Is the error message clear for users?
Approval Gates for AI-Generated Critical Path Code
For critical services, require multiple reviewers:
| Code Category | Minimum Reviewers | Who Should Review |
|---|---|---|
| Non-critical feature | 1 | Any senior engineer |
| Critical business logic | 2 | One domain expert, one security reviewer |
| Infrastructure/ops | 2 | One SRE/ops, one security |
| Security module | 3 | Security team, domain expert, infrastructure |
| Payment/billing code | 3 | Finance, security, backend engineer |
Code Example: Good vs. Bad PR Descriptions
❌ Bad PR Description
AI generated this code for the new user API.
Files changed:
- src/api/users.ts
- tests.ts
✓ Good PR Description
## What This Does
Implements POST /api/users endpoint to create new user accounts.
Validates email format, enforces unique constraint, sends welcome email.
## Generated vs Manual
AI Generated:
- src/api/users.ts (handler + validation)
- tests/__tests__/api/users.test.ts (test structure)
Manually Created/Reviewed:
- Email template (src/templates/welcome.html)
- Database migration (migrations/20240315_users_table.sql)
- Error handling and logging (verified against AppError pattern)
## Testing Done
✓ Unit tests: 8/8 passing
✓ Integration test: Created new user in test DB, verified email sent
✓ Edge cases: Duplicate email, invalid format, SQL injection attempts
✓ Load test: 100 req/sec, p99 latency 45ms
## Security Review
✓ No hardcoded secrets
✓ Input sanitized against injection
✓ Password hashed with bcrypt
✓ Rate limiting applied to signup endpoint
✓ OWASP scan: 0 findings
## Deployment Notes
- No data migration
- No config changes needed
- Rollback: Disable endpoint in API router, previous version still works
- Monitoring: Watch signup_latency and email_send_failures metrics
Prompt Engineering for Developers
Prompt engineering is a skill. Good prompts produce usable code in one iteration; bad prompts require 5+ cycles. Treat prompts like production code: version them, test them, improve them.
Prompt Templates for Common Tasks
Template 1: Bug Fix
// TASK: Fix bug in validateEmail()
## Current Behavior
validateEmail("user+tag@example.com") returns invalid, but should be valid.
## Expected Behavior
Plus signs (+) are valid in the local part of an email per RFC 5321.
## Constraints
- File: src/validation/email_validator.ts
- Use the same ValidationError pattern (in the same file)
- No new dependencies
- Add test case to __tests__/validation/email_validator.test.ts
## Steps
1. Show me the minimal change needed (diff format)
2. Explain what RFC rule was violated
3. Suggest any other email formats we might be rejecting incorrectly
Template 2: New Feature Implementation
// TASK: Implement user profile API endpoint
## Requirements
- Endpoint: GET /api/v1/users/:userId
- Returns: { id, email, name, createdAt, lastLoginAt }
- Authentication: Requires JWT token
- Authorization: Users can only see their own profile; admins see all
- Errors: Return 404 if user not found, 401 if unauthorized
## Context
- Use Express.js (already in codebase)
- Database: PostgreSQL via knex (see src/db/index.ts for usage)
- Error handler: Use AppError from src/errors.ts
- Middleware: Auth middleware at src/middleware/auth.ts
## Success Criteria
- Handler implemented in src/api/users/get.ts
- Tests: src/__tests__/api/users/get.test.ts (success, 404, unauthorized)
- No new npm dependencies
- Follows error handling pattern from src/api/posts/get.ts
## Steps
1. Summarize your approach (3–4 bullet points)
2. Show the complete handler code
3. Show the test scaffold
4. List all assumptions you're making
Template 3: Refactoring
// TASK: Refactor user service to reduce duplication
## Current Problem
src/services/user_service.ts has 3 similar functions:
- getUserById(id)
- getMultipleUsers(ids)
- getUserByEmail(email)
All do similar: query DB, map result, handle errors identically.
## Goal
Extract common pattern into internal helper; keep public API unchanged.
## Constraints
- Public function signatures must NOT change (backward compatibility)
- Keep the same error handling and logging
- No new npm dependencies
- File: src/services/user_service.ts only
## Steps
1. Design the helper function signature
2. Show refactored versions of all 3 functions
3. Show tests unchanged (they should still pass as-is)
4. Estimate cyclomatic complexity reduction
Template 4: Data Migration
// TASK: Generate database migration to add new user column
## Change
Add optional "bio" column (text, max 500 chars) to users table.
New column should be optional (default NULL).
## Context
- Current schema: migrations/20240301_users_table.sql
- Using Knex migrations (see examples in migrations/)
- Database: PostgreSQL 14+
## Requirements
- Create UP migration: add column, no data changes
- Create DOWN migration: remove column (safe to roll back)
- Must be idempotent (safe to run multiple times)
- No data loss on rollback
## Steps
1. Generate migrations/20240315_add_user_bio.js (both up & down)
2. Show manual testing steps (verify column added, rollback works)
3. Note any performance impact
Context Window Management
AI models have context limits. Be strategic about what you include:
What To Include
- Exact file paths: "src/validation/users.ts"
- Key patterns: Show 1–2 examples from existing code
- Constraints: Dependencies, error handling, naming
- Acceptance criteria: Tests that must pass
- Assumptions: Runtime environment, database, external services
What To Leave Out
- Full codebase: Just reference the parts that matter
- Unrelated files: "For context, here are 5000 lines of unrelated code" adds noise
- Comments and docs: Summarize the pattern instead of pasting
- Large data files: Use representative examples (1–2 rows, not 10,000)
- Secrets, PII, or confidential data: Never, ever paste real data
Multi-Turn Conversation Patterns
Pattern 1: Ask for Plan First, Then Implement
Turn 1: "Propose a refactoring to reduce UserService complexity. Show 3 options with tradeoffs."
Turn 2: "I like option 2. Now implement it in src/services/user_service.ts."
Turn 3: "Show me the tests that validate the refactoring."
Pattern 2: Iterative Refinement
Turn 1: "Implement email validation handler."
Turn 2: "Add support for plus-addressed emails (user+tag@example.com)."
Turn 3: "Now add internationalized domain names (IDN) support."
Turn 4: "Generate tests for these new cases."
Pattern 3: Decompose Large Tasks
Turn 1: "Summarize the architecture for the new auth system."
Turn 2: "Implement the JWT generation logic in src/auth/jwt.ts."
Turn 3: "Implement the JWT verification middleware."
Turn 4: "Generate tests for both JWT functions."
Prompt Strategies by Task Type
| Task Type | Best Approach | Key Phrase |
|---|---|---|
| Bug Fix | Specific error + reproduction steps | "The bug is X. Expected behavior is Y." |
| New Feature | Clear spec + examples + constraints | "Implement X with requirements [list]" |
| Refactoring | Show duplication + goal + constraints | "Extract common pattern while keeping public API unchanged" |
| Architecture | Constraints + tradeoffs + ask for options | "Propose architectures with tradeoffs" |
| Testing | Test matrix first, then assertions | "Generate test scaffold, then I'll add assertions" |
| Migration | Current schema + desired change + rollback | "Generate UP and DOWN migrations" |
System Prompts for Coding Assistants
If your AI tool supports custom system prompts, use one like this:
// System Prompt Template
You are an expert software engineer helping a development team.
CORE RULES:
1. Code is for production. It must be correct, secure, and maintainable.
2. Always ask for clarification if requirements are ambiguous.
3. Show your assumptions explicitly.
4. Prefer clear, simple code over clever code.
5. Error handling is mandatory; don't skip failure paths.
6. Security is non-negotiable; never suggest hardcoded secrets.
WHEN WRITING CODE:
- Match the existing codebase style (naming, patterns, file structure)
- Include unit tests that verify behavior, not just coverage
- Suggest observability (logging, metrics) for production code
- Point out risky assumptions or edge cases
WHEN EXPLAINING:
- Show code before explanation
- Explain tradeoffs for significant decisions
- Link to relevant docs or examples
WHEN IN DOUBT:
- Ask for more context
- Suggest options with pros/cons
- Flag risky changes that need human review
Team Operating Model for AI-Driven Development
AI is most effective when used consistently across a team. Establish shared practices, shared prompts, and metrics to track quality.
Shared Prompt Library Management
Build a team library of tested, proven prompts. Version them like code:
// Directory structure
prompts/
├── templates/
│ ├── bugfix.md (v2)
│ ├── new_feature.md (v3)
│ ├── refactor.md (v1)
│ └── testing.md (v2)
├── examples/
│ ├── email_validation_bugfix.md
│ └── user_api_feature.md
└── README.md (usage guidelines)
Prompt Template Format
// prompts/templates/new_feature.md
# New Feature Implementation Template
Version: 3 (Updated 2024-03-15)
Author: @engineering-team
Used for: REST API endpoints, services, handlers
## Key Improvements in v3
- Added rollback strategy requirement
- Clarified test scaffold expectations
- Added performance benchmark requirement
## Template
[Copy the template from earlier sections]
## What Works Well
- Produces working code on first iteration ~80% of the time
- Generates meaningful tests automatically
- Handles edge cases when examples are specific
## What Needs Refinement
- Sometimes over-engineers error handling
- May miss rate limiting considerations
## Examples
- User API feature (see examples/user_api_feature.md)
- Email handler (see examples/...)
Standardized Review Checklist
Use the same checklist for all AI-generated PRs. Document it in your repo:
# CODE REVIEW CHECKLIST FOR AI-GENERATED CODE
See docs/code-review-ai.md for full checklist
Required items:
☐ AI usage documented in PR description
☐ Logic correct; tests pass
☐ Error handling for all documented failure paths
☐ No security issues detected
☐ Dependencies justified and scanned
☐ Code style consistent with codebase
☐ Test coverage >80% for changed code
☐ Performance impact measured (if critical path)
Metrics to Track Quality and Velocity
| Metric | What It Measures | Target |
|---|---|---|
| Lead Time | Days from feature start to production | 30% reduction with AI |
| Code Review Time | Hours from PR to approval | Should NOT increase; reviewers know to focus on logic |
| Test Coverage | % of code covered by tests | >80% overall; >95% for critical path |
| Escaped Defects | Bugs found after merge to main | Should NOT increase from pre-AI baseline |
| Rollback Rate | % of deployments that require rollback | Should NOT increase; if it does, review training |
| Security Findings | Vulnerabilities found in code review / scanning | Should NOT increase from pre-AI baseline |
| Prompt Quality Score | % of first-draft code that needed 0 revisions | >70% for mature templates |
AI Output Quality Retrospectives
Monthly review of what worked and what didn't:
- Review metrics from last month (lead time, escapes, rollbacks)
- Analyze PRs marked "AI-generated": what caused review delays?
- Discuss prompt failures: why did the AI miss cases?
- Update prompt templates based on lessons learned
- Celebrate successful patterns; create templates from them
- Identify tool gaps: do we need better linting, testing, or scanning?
- Plan training: do team members need coaching on prompting?
Onboarding Developers to AI Tools
- 1. Foundations (30 min): Read this guide; understand core principles
- 2. Tool Basics (1 hour): Try the AI tool on a small task; see what it produces
- 3. Your First PR (2 hours): Generate code for a small feature; get it reviewed
- 4. Prompting Practice (1 week): Use templates, refine prompts, see improvement
- 5. Code Review Practice (ongoing): Review AI-generated PRs from teammates; learn patterns
Role-Based AI Usage Guidelines
| Role | AI Tool Access | Best Use Cases | Restrictions |
|---|---|---|---|
| Junior Engineer | ✓ Full access | Learning by generating boilerplate and tests | All code must be reviewed; focus on non-critical path |
| Senior Engineer | ✓ Full access | Generating architecture; refactoring; creating templates | Must review own generated code; responsible for template quality |
| Security Engineer | ✓ Full access | Generating security tests; scanning generated code | MUST review all security-critical code before merge |
| Product Manager | ☑ Limited | Writing detailed specs to help engineers use AI | Do not merge code; do not approve PRs |
CI/CD Integration: AI in the Pipeline
Use AI not just for code generation, but also to automate checks in your CI/CD pipeline. This reduces friction and catches issues early.
AI-Powered Pipeline Stages
| Stage | What AI Can Do | Example Tool / Action |
|---|---|---|
| Pre-Commit | Auto-format code; check naming conventions | Prettier, Black, AI linter with custom rules |
| Lint & Type Check | Find style violations; type errors; suspicious patterns | ESLint, TypeScript, SonarQube |
| Security Scan | Find secrets, vulnerable dependencies, OWASP issues | Snyk, TruffleHog, OWASP ZAP, Semgrep |
| Test Execution | Run unit, integration, contract tests | Jest, pytest, your test runner + coverage checks |
| Coverage Check | Ensure test coverage meets minimum threshold | Coverage.py, nyc, codecov.io |
| Regression Detection | Compare latency, memory, error rates vs. baseline | Custom benchmarking scripts; datadog, prometheus |
| PR Review Bot | Automatically comment on code style, suggestions | GitHub Actions + AI API (Claude, ChatGPT) |
| Changelog Generation | Auto-generate release notes from commits | conventional-changelog + AI to polish |
Quality Gates for AI-Generated Code
Define stricter gates when code is known to be AI-generated:
// GitHub Actions example: stricter checks for AI-generated code
name: AI Code Quality Gate
on:
pull_request:
paths:
- src/**
- tests/**
jobs:
quality-checks:
runs-on: ubuntu-latest
steps:
- name: Detect AI-Generated Code
run: |
# Check PR description for "AI-generated" markers
if grep -q "AI[- ]generated\|Claude\|ChatGPT" $PR_BODY; then
echo "AI_GENERATED=true" >> $GITHUB_ENV
fi
- name: Run Linting
run: npm run lint
- name: Type Checking
run: npm run type-check
- name: Run Tests (AI-Generated)
if: env.AI_GENERATED == 'true'
run: npm run test -- --coverage --minCoverageThreshold=85
- name: Security Scan (AI-Generated)
if: env.AI_GENERATED == 'true'
run: npm audit --audit-level=moderate
- name: SAST Analysis
if: env.AI_GENERATED == 'true'
run: npm run security:sast
- name: Comment on PR
if: failure() && env.AI_GENERATED == 'true'
uses: actions/github-script@v6
with:
script: |
github.rest.issues.createComment({
issue_number: context.issue.number,
body: '⚠️ AI-Generated Code Quality Check Failed. Please review the logs.'
})
Automated Review Bots
Use an AI-powered review bot to leave comments on PRs with suggestions:
// Example: GitHub Actions + Claude API for PR review
name: AI PR Review
on: [pull_request]
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Fetch PR Diff
run: |
gh pr diff ${{ github.event.number }} > pr.patch
env:
GH_TOKEN: ${{ github.token }}
- name: AI Code Review
run: |
# Call Claude API to review the patch
curl -X POST https://api.anthropic.com/v1/messages \
-H "x-api-key: ${{ secrets.CLAUDE_API_KEY }}" \
-H "content-type: application/json" \
-d @- << EOF
{
"model": "claude-opus-4-1",
"max_tokens": 1024,
"messages": [{
"role": "user",
"content": "Review this code patch for style, security, and performance issues. Keep feedback brief and actionable.\n\n\$(cat pr.patch)"
}]
}
EOF
- name: Post Review Comment
uses: actions/github-script@v6
with:
script: |
# Post the AI review as a PR comment
github.rest.issues.createComment({
issue_number: context.issue.number,
body: '## AI Review\n\n' + reviewOutput
})
Deployment Safety with AI Changes
- Canary deployment: Deploy to 1% of traffic first; monitor error rates
- Feature flags: Use flags to enable/disable AI-generated features independently
- Gradual rollout: Increase traffic 10% → 25% → 50% → 100% over hours
- Monitoring: Alert on latency p99, error rate, and custom business metrics
- Rollback ready: Ensure rollback can happen in < 5 minutes without manual intervention
GitHub Actions Workflow Example (Complete)
name: CI Pipeline with AI Integration
on: [push, pull_request]
jobs:
lint-and-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version: '18'
- name: Install Dependencies
run: npm ci
- name: Lint
run: npm run lint
- name: Type Check
run: npm run type-check
- name: Run Tests
run: npm run test -- --coverage
- name: Upload Coverage
uses: codecov/codecov-action@v3
- name: Security Audit
run: npm audit --audit-level=high
- name: SAST Scan
run: npm run security:sast
deploy:
needs: lint-and-test
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- name: Deploy to Staging
run: |
# Deploy to staging with monitoring enabled
./scripts/deploy.sh staging
- name: Monitor (5 min)
run: |
# Check metrics: latency, error rate, etc.
./scripts/monitor.sh 300
- name: Promote to Production
run: |
# Canary: 1% → 25% → 100%
./scripts/deploy.sh production --canary
Production Readiness Checklist
Before shipping any code—whether AI-generated or not—verify it meets these production standards. Use this as your final gate before merge and deploy.
Code Quality
Security
Testing & Observability
Performance & Reliability
Documentation & Deployment
Final Sign-Off
- ☑ All checklist items above verified
- ☑ Code reviewed by at least 1 (or 3 for critical path) senior engineer
- ☑ If AI-generated: human understands all logic and can explain it
- ☑ Product owner has validated feature works as intended
- ☑ No regressions in existing tests
Engineering Standards Philosophy
AI accelerates delivery most when teams enforce strong engineering standards. Speed without rigor creates future drag—technical debt, debugging burden, and maintenance cost. Using AI means you're shipping more code, faster. That makes quality discipline non-negotiable.