How to Generate Production Code with GPT-4o
Practical tutorial: Using GPT-4o for advanced code generation
How to Generate Production Code with GPT-4o
Table of Contents
- How to Generate Production Code with GPT-4o
- Create a virtual environment
- Install core dependencies
- code_generator.py
- Configure OpenAI [7] client
- Example usage
📺 Watch: Neural Networks Explained
Video by 3Blue1Brown
GPT-4o represents a significant advancement in large language models for code generation, offering improved reasoning capabilities and reduced latency compared to previous iterations. As of May 2026, this model has become a cornerstone tool for production software development, enabling developers to generate complex, multi-file applications with minimal manual intervention. In this tutorial, we'll build a production-grade microservice generator that creates complete, deployable Python services from natural language descriptions.
Understanding GPT-4o's Code Generation Architecture
Before diving into implementation, it's crucial to understand how GPT-4o processes code generation tasks differently from its predecessors. The model employs a mixture-of-experts architecture that separates reasoning from token generation, allowing it to maintain context over longer sequences while producing syntactically correct code.
The key architectural advantage lies in GPT-4o's ability to maintain structural coherence across multiple files. When generating a complete application, the model must track:
- Import dependencies across modules
- Consistent variable naming and type annotations
- Proper error handling patterns
- Configuration management
- Testing infrastructure
According to OpenAI's documentation, GPT-4o achieves approximately 87% accuracy on HumanEval benchmarks for Python code generation, representing a meaningful improvement over GPT-4's 67% on the same benchmark. This makes it suitable for production use cases where code correctness is critical.
Setting Up the Code Generation Pipeline
We'll build a system that takes natural language requirements and generates a complete FastAPI microservice with PostgreSQL integration, including tests, Docker configuration, and CI/CD pipelines.
Prerequisites and Environment Setup
First, install the required dependencies:
# Create a virtual environment
python -m venv gpt4o-codegen
source gpt4o-codegen/bin/activate
# Install core dependencies
pip install openai==1.35.0
pip install pydantic==2.7.0
pip install jinja2==3.1.4
pip install black==24.4.2
pip install pylint==3.2.0
pip install mypy==1.10.0
pip install pytest==8.2.0
pip install httpx==0.27.0
pip install structlog==24.1.0
Set up your OpenAI API key:
export OPENAI_API_KEY="your-api-key-here"
Core Implementation: The Code Generation Engine
We'll implement a modular code generation system that handles the complete lifecycle from requirements to deployable code. The system uses a multi-stage prompting strategy that breaks down complex generation tasks into manageable chunks.
# code_generator.py
import os
import json
import ast
from typing import Dict, List, Optional, Tuple
from dataclasses import dataclass, field
from pathlib import Path
import openai
from openai import OpenAI
from pydantic import BaseModel, Field
import structlog
logger = structlog.get_logger()
# Configure OpenAI client
client = OpenAI(
api_key=os.environ.get("OPENAI_API_KEY"),
timeout=60.0, # Increased timeout for complex generations
max_retries=3
)
class ServiceRequirements(BaseModel):
"""Structured requirements for code generation."""
service_name: str = Field(.., description="Name of the microservice")
description: str = Field(.., description="Natural language description")
endpoints: List[Dict[str, str]] = Field(default_factory=list)
database_tables: List[Dict[str, str]] = Field(default_factory=list)
external_apis: List[str] = Field(default_factory=list)
authentication: bool = Field(default=False)
rate_limiting: bool = Field(default=False)
caching: bool = Field(default=False)
class GeneratedCode(BaseModel):
"""Container for generated code artifacts."""
main_app: str = Field(default="")
models: str = Field(default="")
schemas: str = Field(default="")
routes: str = Field(default="")
database: str = Field(default="")
tests: str = Field(default="")
dockerfile: str = Field(default="")
docker_compose: str = Field(default="")
requirements_txt: str = Field(default="")
config: str = Field(default="")
class CodeGenerator:
"""
Production-grade code generator using GPT-4o.
Implements multi-stage generation with validation and error recovery.
"""
def __init__(self, model: str = "gpt-4o", temperature: float = 0.2):
self.model = model
self.temperature = temperature
self.max_tokens = 4096 # Increased for complex generations
self.generation_history: List[Dict] = []
def _build_system_prompt(self) -> str:
"""Construct the system prompt for code generation."""
return """You are an expert Python developer specializing in FastAPI microservices.
Generate production-ready code following these strict requirements:
1. **Code Quality**:
- Use type hints for all function parameters and return values
- Include comprehensive docstrings following Google style
- Implement proper error handling with custom exceptions
- Use async/await patterns for I/O operations
- Follow PEP 8 style guidelines
2. **Architecture**:
- Implement repository pattern for database access
- Use dependency injection for service layer
- Include request/response validation with Pydantic
- Implement proper logging with structlog
- Add health check endpoints
3. **Security**:
- Implement input sanitization
- Use parameterized queries for SQL
- Add rate limiting headers
- Include CORS configuration
- Implement proper authentication middleware
4. **Testing**:
- Write pytest tests with fixtures
- Include unit tests for business logic
- Add integration tests for API endpoints
- Mock external dependencies
- Achieve minimum 80% code coverag [2]e
5. **Infrastructure**:
- Provide Dockerfile with multi-stage build
- Include docker-compose.yml for local development
- Add environment variable configuration
- Implement health check endpoints
- Include database migration scripts
Generate complete, working code without placeholders or TODO comments."""
def _build_generation_prompt(self, requirements: ServiceRequirements) -> str:
"""Build the specific generation prompt based on requirements."""
prompt_parts = [
f"Generate a complete FastAPI microservice called '{requirements.service_name}'.",
f"\nDescription: {requirements.description}",
]
if requirements.endpoints:
prompt_parts.append("\n\nAPI Endpoints:")
for endpoint in requirements.endpoints:
prompt_parts.append(f"- {endpoint.get('method', 'GET')} {endpoint.get('path', '/')}: {endpoint.get('description', '')}")
if requirements.database_tables:
prompt_parts.append("\n\nDatabase Tables:")
for table in requirements.database_tables:
prompt_parts.append(f"- {table.get('name', '')}: {table.get('columns', '')}")
if requirements.external_apis:
prompt_parts.append(f"\n\nExternal APIs to integrate: {', '.join(requirements.external_apis)}")
if requirements.authentication:
prompt_parts.append("\n\nInclude JWT-based authentication with refresh tokens.")
if requirements.rate_limiting:
prompt_parts.append("\n\nImplement rate limiting using Redis.")
if requirements.caching:
prompt_parts.append("\n\nImplement Redis caching for frequently accessed endpoints.")
prompt_parts.append("\n\nGenerate all files in separate code blocks with their filenames as headers.")
return "\n".join(prompt_parts)
def _parse_generated_code(self, response_text: str) -> GeneratedCode:
"""
Parse the GPT-4o response into structured code artifacts.
Handles multiple code block formats and error recovery.
"""
code_blocks = {}
current_file = None
current_content = []
lines = response_text.split('\n')
for line in lines:
if line.startswith('```') and not line.startswith('```python'):
# Save previous file content
if current_file and current_content:
code_blocks[current_file] = '\n'.join(current_content)
current_content = []
current_file = None
elif line.startswith('```python'):
# Extract filename from the line or previous context
current_file = self._extract_filename(line, current_file)
elif current_file and not line.startswith('```'):
current_content.append(line)
# Save last file
if current_file and current_content:
code_blocks[current_file] = '\n'.join(current_content)
# Map to GeneratedCode structure
generated = GeneratedCode()
file_mapping = {
'main.py': 'main_app',
'app/main.py': 'main_app',
'models.py': 'models',
'app/models.py': 'models',
'schemas.py': 'schemas',
'app/schemas.py': 'schemas',
'routes.py': 'routes',
'app/routes.py': 'routes',
'database.py': 'database',
'app/database.py': 'database',
'test_main.py': 'tests',
'tests/test_main.py': 'tests',
'Dockerfile': 'dockerfile',
'docker-compose.yml': 'docker_compose',
'requirements.txt': 'requirements_txt',
'config.py': 'config',
'app/config.py': 'config',
}
for filename, content in code_blocks.items():
if filename in file_mapping:
setattr(generated, file_mapping[filename], content)
return generated
def _extract_filename(self, line: str, default: Optional[str]) -> Optional[str]:
"""Extract filename from code block header."""
# Handle formats like ```python filename.py or ```python:filename.py
parts = line.replace('```python', '').strip()
if parts:
return parts.split(':').strip()
return default
def _validate_generated_code(self, code: GeneratedCode) -> Tuple]:
"""
Validate generated code for syntax errors and basic structure.
Returns (is_valid, error_messages).
"""
errors = []
# Validate Python files
python_files =
for name, content in python_files:
if content:
try:
ast.parse(content)
except SyntaxError as e:
errors.append(f"Syntax error in {name}: {str(e)}")
# Validate Dockerfile structure
if code.dockerfile:
if 'FROM' not in code.dockerfile:
errors.append("Dockerfile missing FROM instruction")
# Validate requirements.txt
if code.requirements_txt:
packages =
if not packages:
errors.append("requirements.txt is empty")
return len(errors) == 0, errors
def generate_service(self, requirements: ServiceRequirements) -> GeneratedCode:
"""
Generate complete microservice code from requirements.
Implements retry logic with exponential backoff.
"""
max_retries = 3
retry_delay = 2 # seconds
for attempt in range(max_retries):
try:
logger.info("generating_service",
service=requirements.service_name,
attempt=attempt + 1)
response = client.chat.completions.create(
model=self.model,
messages=,
temperature=self.temperature,
max_tokens=self.max_tokens,
response_format={"type": "text"}
)
generated_code = self._parse_generated_code(
response.choices[0].message.content
)
# Validate the generated code
is_valid, errors = self._validate_generated_code(generated_code)
if not is_valid:
logger.warning("validation_errors", errors=errors)
if attempt < max_retries - 1:
# Retry with error feedback
self._build_correction_prompt(errors)
continue
self.generation_history.append({
"requirements": requirements.model_dump(),
"generated": generated_code.model_dump(),
"validation_errors": errors
})
return generated_code
except openai.APIError as e:
logger.error("api_error", error=str(e))
if attempt < max_retries - 1:
import time
time.sleep(retry_delay * (attempt + 1))
else:
raise
except Exception as e:
logger.error("unexpected_error", error=str(e))
raise
def _build_correction_prompt(self, errors: List) -> str:
"""Build a correction prompt based on validation errors."""
return f"""
The previous code generation had the following errors:
{chr(10).join(f'- {error}' for error in errors)}
Please regenerate the code, fixing all these issues. Ensure:
1. All Python files are syntactically valid
2. All imports are correct and available
3. The Dockerfile follows best practices
4. All dependencies are listed in requirements.txt
"""
def save_generated_code(self, code: GeneratedCode, output_dir: str):
"""Save generated code to disk with proper directory structure."""
output_path = Path(output_dir)
output_path.mkdir(parents=True, exist_ok=True)
# Create app directory
app_dir = output_path / "app"
app_dir.mkdir(exist_ok=True)
# Create tests directory
tests_dir = output_path / "tests"
tests_dir.mkdir(exist_ok=True)
# File mapping with proper paths
files = {
output_path / "main.py": code.main_app,
app_dir / "__init__.py": "",
app_dir / "models.py": code.models,
app_dir / "schemas.py": code.schemas,
app_dir / "routes.py": code.routes,
app_dir / "database.py": code.database,
app_dir / "config.py": code.config,
tests_dir / "__init__.py": "",
tests_dir / "test_main.py": code.tests,
output_path / "Dockerfile": code.dockerfile,
output_path / "docker-compose.yml": code.docker_compose,
output_path / "requirements.txt": code.requirements_txt,
}
for filepath, content in files.items():
if content: # Only write non-empty files
filepath.write_text(content)
logger.info("saved_file", path=str(filepath))
logger.info("code_saved", directory=str(output_path))
# Example usage
if __name__ == "__main__":
# Define requirements for a user management service
requirements = ServiceRequirements(
service_name="user-service",
description="A microservice for user management with registration, login, and profile management",
endpoints=,
database_tables=,
authentication=True,
rate_limiting=True,
caching=True
)
# Initialize generator
generator = CodeGenerator()
# Generate the service
generated = generator.generate_service(requirements)
# Save to disk
generator.save_generated_code(generated, "./generated-user-service")
Advanced Prompt Engineering for Code Generation
The effectiveness of GPT-4o for code generation heavily depends on prompt engineering. Our implementation uses several advanced techniques:
Structured Output Parsing
The _parse_generated_code method handles multiple output formats that GPT-4o might produce. This is critical because the model may vary its response structure based on context. We implement robust parsing that handles:
- Multiple code block formats (with and without language specifiers)
- Inline file path annotations
- Mixed markdown and code content
Validation Pipeline
The _validate_generated_code method performs static analysis on generated code before saving it. This catches common issues like:
- Syntax errors in Python files
- Missing Dockerfile instructions
- Empty requirements files
- Import errors
Error Recovery with Feedback
When validation fails, the system automatically retries with error-specific feedback. This iterative refinement approach significantly improves generation quality. The _build_correction_prompt method creates targeted prompts that address specific validation failures.
Production Considerations and Edge Cases
Handling API Rate Limits
When generating large codebases, you may encounter OpenAI's rate limits. Our implementation includes exponential backoff and retry logic:
import time
from functools import wraps
def retry_with_backoff(max_retries=3, base_delay=2):
"""Decorator for API calls with exponential backoff."""
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
for attempt in range(max_retries):
try:
return func(*args, **kwargs)
except openai.RateLimitError:
if attempt == max_retries - 1:
raise
delay = base_delay * (2 ** attempt)
logger.warning("rate_limit_exceeded", retry_delay=delay)
time.sleep(delay)
return None
return wrapper
return decorator
Memory Management for Large Generations
GPT-4o has a context window of 128,000 tokens, but generating complete microservices can approach this limit. We implement chunking for very large codebases:
def chunk_generation(requirements: ServiceRequirements, chunk_size: int = 3) -> List:
"""Split large requirements into manageable chunks."""
chunks = []
endpoints = requirements.endpoints
for i in range(0, len(endpoints), chunk_size):
chunk = requirements.model_copy()
chunk.endpoints = endpoints
chunks.append(chunk)
return chunks
Handling Generated Code Quality
Not all generated code will be production-ready. We implement a quality gate that checks for common issues:
def quality_check(code: GeneratedCode) -> Dict:
"""Perform quality checks on generated code."""
scores = {}
# Check for type hints
if code.main_app:
type_hint_count = code.main_app.count(": ") # Rough estimate
scores = min(type_hint_count / 50, 1.0) # Normalize
# Check for docstrings
if code.main_app:
docstring_count = code.main_app.count('"""')
scores = min(docstring_count / 10, 1.0)
# Check for error handling
if code.routes:
try_count = code.routes.count("try:")
except_count = code.routes.count("except")
scores = min(min(try_count, except_count) / 5, 1.0)
# Check for tests
if code.tests:
test_functions = code.tests.count("def test_")
scores = min(test_functions / 5, 1.0)
return scores
Testing the Generated Code
After generation, we need to verify the code works correctly. Here's a comprehensive test suite:
# test_generated_service.py
import pytest
import subprocess
import sys
from pathlib import Path
@pytest.fixture
def generated_service(tmp_path):
"""Fixture that generates a test service."""
from code_generator import CodeGenerator, ServiceRequirements
generator = CodeGenerator()
requirements = ServiceRequirements(
service_name="test-service",
description="Simple health check service",
endpoints=
)
code = generator.generate_service(requirements)
generator.save_generated_code(code, str(tmp_path))
return tmp_path
def test_service_imports(generated_service):
"""Test that generated code can be imported without errors."""
sys.path.insert(0, str(generated_service))
try:
from app.main import app
assert app is not None
except ImportError as e:
pytest.fail(f"Import failed: {e}")
finally:
sys.path.pop(0)
def test_service_starts(generated_service):
"""Test that the FastAPI service starts correctly."""
import uvicorn
import asyncio
async def test_start():
config = uvicorn.Config(
"app.main:app",
host="127.0.0.1",
port=8000,
log_level="info"
)
server = uvicorn.Server(config)
# Start server in background
task = asyncio.create_task(server.serve())
await asyncio.sleep(2) # Wait for server to start
# Test health endpoint
import httpx
async with httpx.AsyncClient() as client:
response = await client.get("http://127.0.0.1:8000/health")
assert response.status_code == 200
# Shutdown
server.should_exit = True
await task
asyncio.run(test_start())
def test_dockerfile_valid(generated_service):
"""Test that Dockerfile is syntactically valid."""
dockerfile = generated_service / "Dockerfile"
assert dockerfile.exists()
# Check basic Dockerfile structure
content = dockerfile.read_text()
assert "FROM" in content
assert "WORKDIR" in content
assert "COPY" in content
assert "CMD" in content or "ENTRYPOINT" in content
def test_requirements_valid(generated_service):
"""Test that requirements.txt contains valid packages."""
requirements = generated_service / "requirements.txt"
assert requirements.exists()
content = requirements.read_text()
packages =
for package in packages:
# Check package format (name or name==version)
assert "=" in package or ">" in package or "<" in package or " " not in package
Performance Optimization and Best Practices
Caching Generated Code
For repeated generations with similar requirements, implement caching:
import hashlib
import json
from functools import lru_cache
class CachedCodeGenerator(CodeGenerator):
"""Code generator with disk-based caching."""
def __init__(self, cache_dir: str = "./code_cache", **kwargs):
super().__init__(**kwargs)
self.cache_dir = Path(cache_dir)
self.cache_dir.mkdir(exist_ok=True)
def _get_cache_key(self, requirements: ServiceRequirements) -> str:
"""Generate cache key from requirements."""
data = json.dumps(requirements.model_dump(), sort_keys=True)
return hashlib.sha256(data.encode()).hexdigest()
def generate_service(self, requirements: ServiceRequirements) -> GeneratedCode:
cache_key = self._get_cache_key(requirements)
cache_path = self.cache_dir / f"{cache_key}.json"
if cache_path.exists():
logger.info("cache_hit", key=cache_key)
return GeneratedCode(**json.loads(cache_path.read_text()))
result = super().generate_service(requirements)
# Cache the result
cache_path.write_text(json.dumps(result.model_dump()))
logger.info("cache_miss", key=cache_key)
return result
Parallel Generation for Large Projects
For enterprise-scale code generation, implement parallel processing:
from concurrent.futures import ThreadPoolExecutor, as_completed
from typing import List
def generate_microservices(requirements_list: List,
max_workers: int = 3) -> List:
"""Generate multiple microservices in parallel."""
generator = CodeGenerator()
results = []
with ThreadPoolExecutor(max_workers=max_workers) as executor:
future_to_req = {
executor.submit(generator.generate_service, req): req
for req in requirements_list
}
for future in as_completed(future_to_req):
req = future_to_req
try:
result = future.result()
results.append(result)
logger.info("generation_complete", service=req.service_name)
except Exception as e:
logger.error("generation_failed",
service=req.service_name,
error=str(e))
return results
What's Next
GPT-4o's code generation capabilities continue to evolve. Based on current trends, we can expect:
- Improved Context Understanding: Future iterations will better understand project structure and coding conventions
- Better Error Recovery: Enhanced ability to self-correct based on compilation errors
- Multi-language Support: Improved generation for languages beyond Python
- Integration with Development Tools: Direct integration with IDEs and CI/CD pipelines
For production deployments, consider:
- Implementing human review gates for generated code
- Adding automated security scanning
- Setting up continuous integration with generated tests
- Monitoring API costs and optimizing prompt efficiency
The techniques presented here provide a foundation for building production-grade code generation systems. As GPT-4o and similar models mature, the gap between generated and hand-written code will continue to narrow, making automated code generation an increasingly valuable tool in the software development lifecycle.
Was this article helpful?
Let us know to improve our AI generation.
Related Articles
How to Analyze Security Logs with DeepSeek Locally
Practical tutorial: Analyze security logs with DeepSeek locally
How to Build a Multimodal App with Gemini 2.0 Vision API
Practical tutorial: Build a multimodal app with Gemini 2.0 Vision API
How to Build an AI Research Assistant with Perplexity API
Practical tutorial: Create an AI research assistant with Perplexity API