How to Build a Prompt Management System with ChatGPT

How to Build a Prompt Management System with ChatGPT
- Understanding the Prompt Management Landscape
- Architecture and Design Decisions
  - Prerequisites
System requirements
Python packages
- Building the Core Prompt Management System
models.py
- Implementing the API Layer
main.py

📺 Watch: Neural Networks Explained

Video by 3Blue1Brown

The explosion of generative AI has created a new challenge: how do you effectively manage, version, and share the prompts that drive these powerful models? As of May 2026, ChatGPT, developed by OpenAI and originally released in November 2022, has become the most widely adopted conversational AI platform, with a rating of 4.7 according to DND:Tools. The platform uses large language models—specifically generative pre-trained transformers [5] (GPTs)—to generate text, speech, and images in response to user prompts [1]. However, as organizations scale their AI usage, the ad-hoc approach of storing prompts in text files or chat history becomes unsustainable.

In this tutorial, we'll build a production-ready prompt management system that addresses the real-world challenges of prompt engineering at scale. We'll leverag [1]e the open-source ecosystem around ChatGPT, including tools like WebChatGPT and ChatGPT Prompt Genius, while building a custom backend that handles versioning, discovery, and sharing. According to DND:Tools, WebChatGPT augments ChatGPT prompts with relevant results from the web [9], while ChatGPT Prompt Genius enables users to discover, share, import, and use the best prompts for ChatGPT while saving chat history locally [13]. We'll integrate these concepts into a unified platform.

Understanding the Prompt Management Landscape

Before diving into code, let's examine why prompt management matters in production environments. The ChatGPT [6] ecosystem has evolved significantly since its launch in November 2022 [1]. With a freemium pricing model [3], ChatGPT has democratized access to advanced AI capabilities, but this accessibility creates organizational challenges:

Prompt Versioning: Teams iterate on prompts without tracking changes
Discovery: Good prompts get lost in chat histories
Sharing: No standardized way to distribute prompts across teams
Quality Control: No mechanism for rating or reviewing prompts

The open-source community has responded with tools like chatgpt-on-wechat, which has garnered 42,157 stars and 9,818 forks on GitHub as of the latest data [15][16]. Written in Python [17], this project demonstrates the demand for integrating ChatGPT into existing workflows. However, most solutions focus on individual use cases rather than enterprise-grade prompt management.

Architecture and Design Decisions

Our system will use a microservices architecture with the following components:

FastAPI Backend: Handles CRUD operations, versioning, and search
PostgreSQL Database: Stores prompts with full-text search capabilities
Redis Cache: Improves response times for frequently accessed prompts
Chrome Extension Integration: Leverages WebChatGPT's approach of augmenting prompts with web results

The key architectural decision is using PostgreSQL's JSONB columns for prompt metadata while maintaining relational integrity for versioning. This gives us the flexibility of NoSQL with the consistency of relational databases.

Prerequisites

# System requirements
python 3.11+
postgresql 15+
redis 7+

# Python packages
pip install fastapi uvicorn sqlalchemy asyncpg redis pydantic python-dotenv
pip install httpx beautifulsoup4 lxml  # For web augmentation
pip install alembic  # Database migrations

Building the Core Prompt Management System

Let's start with the database schema. We'll use SQLAlchemy with async support for production performance.

# models.py
from sqlalchemy import Column, String, Text, DateTime, Float, ForeignKey, JSON
from sqlalchemy.dialects.postgresql import UUID, JSONB
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.orm import declarative_base, relationship
import uuid
from datetime import datetime

Base = declarative_base()

class Prompt(Base):
    __tablename__ = "prompts"

    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    title = Column(String(255), nullable=False, index=True)
    content = Column(Text, nullable=False)
    description = Column(Text)
    category = Column(String(100), index=True)
    tags = Column(JSONB, default=list)
    metadata = Column(JSONB, default=dict)
    rating = Column(Float, default=0.0)
    usage_count = Column(Integer, default=0)
    created_at = Column(DateTime, default=datetime.utcnow)
    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
    is_public = Column(Boolean, default=False)

    # Versioning
    current_version = Column(Integer, default=1)
    versions = relationship("PromptVersion", back_populates="prompt", 
                          cascade="all, delete-orphan")

    # User relationship (simplified for tutorial)
    author_id = Column(UUID(as_uuid=True), ForeignKey("users.id"))
    author = relationship("User", back_populates="prompts")

class PromptVersion(Base):
    __tablename__ = "prompt_versions"

    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    prompt_id = Column(UUID(as_uuid=True), ForeignKey("prompts.id"), nullable=False)
    version_number = Column(Integer, nullable=False)
    content = Column(Text, nullable=False)
    change_description = Column(Text)
    created_at = Column(DateTime, default=datetime.utcnow)

    prompt = relationship("Prompt", back_populates="versions")

The versioning system is critical for production use. Each time a prompt is updated, we create a new version record while keeping the main prompt table pointing to the latest version. This allows rollbacks and audit trails.

Implementing the API Layer

Now let's build the FastAPI application with proper error handling and pagination:

# main.py
from fastapi import FastAPI, HTTPException, Depends, Query
from fastapi.middleware.cors import CORSMiddleware
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy import select, func, or_
from typing import List, Optional
import redis.asyncio as redis
import json

app = FastAPI(title="Prompt Management API")

# CORS for Chrome extension integration
app.add_middleware(
    CORSMiddleware,
    allow_origins=["chrome-extension://*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

# Redis connection for caching
redis_client = redis.from_url("redis://localhost:6379", decode_responses=True)

# Database dependency
async def get_db():
    async with AsyncSession(engine) as session:
        yield session

@app.post("/prompts/", response_model=PromptResponse)
async def create_prompt(
    prompt: PromptCreate,
    db: AsyncSession = Depends(get_db)
):
    """Create a new prompt with initial version."""

    # Validate prompt content
    if len(prompt.content) < 10:
        raise HTTPException(status_code=400, 
                          detail="Prompt content must be at least 10 characters")

    # Create the prompt
    new_prompt = Prompt(
        title=prompt.title,
        content=prompt.content,
        description=prompt.description,
        category=prompt.category,
        tags=prompt.tags,
        metadata=prompt.metadata,
        author_id=prompt.author_id
    )

    db.add(new_prompt)
    await db.flush()  # Get the ID without committing

    # Create initial version
    initial_version = PromptVersion(
        prompt_id=new_prompt.id,
        version_number=1,
        content=prompt.content,
        change_description="Initial version"
    )

    db.add(initial_version)
    await db.commit()
    await db.refresh(new_prompt)

    # Invalidate cache for this category
    await redis_client.delete(f"category:{prompt.category}")

    return new_prompt

@app.get("/prompts/search/")
async def search_prompts(
    q: str = Query(.., min_length=2),
    category: Optional[str] = None,
    tags: Optional[List[str]] = Query(None),
    page: int = Query(1, ge=1),
    page_size: int = Query(20, ge=1, le=100),
    db: AsyncSession = Depends(get_db)
):
    """
    Full-text search across prompts with filtering.
    Implements the discovery aspect similar to ChatGPT Prompt Genius.
    """

    # Check cache first
    cache_key = f"search:{q}:{category}:{page}:{page_size}"
    cached = await redis_client.get(cache_key)
    if cached:
        return json.loads(cached)

    # Build query with full-text search
    query = select(Prompt).where(
        or_(
            Prompt.title.ilike(f"%{q}%"),
            Prompt.content.ilike(f"%{q}%"),
            Prompt.description.ilike(f"%{q}%")
        )
    )

    if category:
        query = query.where(Prompt.category == category)

    if tags:
        # PostgreSQL JSONB contains operator
        query = query.where(Prompt.tags.contains(tags))

    # Pagination
    total = await db.scalar(select(func.count()).select_from(query.subquery()))
    query = query.offset((page - 1) * page_size).limit(page_size)

    result = await db.execute(query)
    prompts = result.scalars().all()

    response = {
        "results": [p.to_dict() for p in prompts],
        "total": total,
        "page": page,
        "page_size": page_size,
        "total_pages": (total + page_size - 1) // page_size
    }

    # Cache for 5 minutes
    await redis_client.setex(cache_key, 300, json.dumps(response))

    return response

The search endpoint implements the discovery functionality that tools like ChatGPT Prompt Genius provide [13], but at scale. The caching layer is crucial for production—without it, every search query would hit the database, causing performance degradation under load.

Web Augmentation Integration

One of the most powerful features of WebChatGPT is its ability to augment prompts with web results [9]. Let's implement a similar capability:

# web_augmenter.py
import httpx
from bs4 import BeautifulSoup
from typing import List, Dict
import asyncio
from urllib.parse import quote_plus

class WebAugmenter:
    """
    Augments prompts with relevant web content.
    Similar to WebChatGPT's approach of adding web results to prompts.
    """

    def __init__(self, max_results: int = 5, timeout: int = 10):
        self.max_results = max_results
        self.timeout = timeout
        self.client = httpx.AsyncClient(timeout=timeout)

    async def search_web(self, query: str) -> List[Dict[str, str]]:
        """
        Perform web search and extract relevant content.
        Uses a configurable search backend (DuckDuckGo, Bing, etc.)
        """
        # Note: In production, use a proper search API
        # This example uses DuckDuckGo's lite version
        search_url = f"https://lite.duckduckgo.com/lite/?q={quote_plus(query)}"

        try:
            response = await self.client.get(search_url)
            response.raise_for_status()

            soup = BeautifulSoup(response.text, 'lxml')
            results = []

            # Extract search results
            for result in soup.select('.result')[:self.max_results]:
                title_elem = result.select_one('.result__title')
                snippet_elem = result.select_one('.result__snippet')

                if title_elem and snippet_elem:
                    results.append({
                        'title': title_elem.get_text(strip=True),
                        'snippet': snippet_elem.get_text(strip=True),
                        'url': title_elem.find('a')['href'] if title_elem.find('a') else ''
                    })

            return results

        except httpx.HTTPError as e:
            # Log error but don't fail the request
            print(f"Web search failed: {e}")
            return []

    async def augment_prompt(self, prompt_content: str) -> str:
        """
        Augment a prompt with web context.
        Returns the original prompt with web results appended.
        """
        web_results = await self.search_web(prompt_content)

        if not web_results:
            return prompt_content

        # Format web results as context
        context = "\n\n--- Web Context ---\n"
        for i, result in enumerate(web_results, 1):
            context += f"\n{i}. {result['title']}\n"
            context += f"   {result['snippet']}\n"
            context += f"   Source: {result['url']}\n"

        return prompt_content + context

    async def close(self):
        await self.client.aclose()

This implementation handles edge cases like network failures gracefully—if the web search fails, we return the original prompt unchanged rather than breaking the user's workflow. The timeout parameter prevents slow searches from blocking the main application.

Production Considerations and Edge Cases

Rate Limiting and API Protection

When integrating with ChatGPT's API, rate limiting becomes critical. According to OpenAI [7]'s documentation, the ChatGPT API has tiered rate limits based on usage. Here's a robust rate limiter:

# rate_limiter.py
import time
from collections import defaultdict
from typing import Dict, Tuple
import asyncio

class TokenBucketRateLimiter:
    """
    Token bucket algorithm for rate limiting.
    Prevents API abuse while allowing burst traffic.
    """

    def __init__(self, tokens_per_second: float, bucket_size: int):
        self.tokens_per_second = tokens_per_second
        self.bucket_size = bucket_size
        self.buckets: Dict[str, Dict] = defaultdict(
            lambda: {"tokens": bucket_size, "last_refill": time.time()}
        )
        self.lock = asyncio.Lock()

    async def acquire(self, key: str, tokens: int = 1) -> Tuple[bool, float]:
        """
        Try to acquire tokens for a request.
        Returns (success, wait_time) where wait_time is 0 if successful.
        """
        async with self.lock:
            bucket = self.buckets[key]
            now = time.time()

            # Refill tokens
            elapsed = now - bucket["last_refill"]
            bucket["tokens"] = min(
                self.bucket_size,
                bucket["tokens"] + elapsed * self.tokens_per_second
            )
            bucket["last_refill"] = now

            if bucket["tokens"] >= tokens:
                bucket["tokens"] -= tokens
                return True, 0.0
            else:
                wait_time = (tokens - bucket["tokens"]) / self.tokens_per_second
                return False, wait_time

Prompt Validation and Sanitization

User-submitted prompts can contain malicious content or personally identifiable information (PII). Here's a validation layer:

# validators.py
import re
from typing import Optional
from pydantic import BaseModel, validator

class PromptCreate(BaseModel):
    title: str
    content: str
    description: Optional[str] = None
    category: str
    tags: List[str] = []
    metadata: Dict[str, Any] = {}
    author_id: UUID

    @validator('content')
    def validate_content_length(cls, v):
        if len(v) < 10:
            raise ValueError('Prompt must be at least 10 characters')
        if len(v) > 10000:
            raise ValueError('Prompt exceeds maximum length of 10000 characters')
        return v

    @validator('content')
    def sanitize_content(cls, v):
        # Remove potential injection patterns
        v = re.sub(r'<script[^>]*>.*?</script>', '', v, flags=re.DOTALL)
        v = re.sub(r'javascript:', '', v, flags=re.IGNORECASE)
        return v

    @validator('tags')
    def validate_tags(cls, v):
        if len(v) > 10:
            raise ValueError('Maximum 10 tags allowed')
        for tag in v:
            if len(tag) > 50:
                raise ValueError('Tag exceeds maximum length of 50 characters')
            if not re.match(r'^[a-zA-Z0-9_-]+$', tag):
                raise ValueError('Tags can only contain alphanumeric characters, hyphens, and underscores')
        return v

Deployment and Scaling

For production deployment, consider using Docker containers with orchestration:

# Dockerfile
FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY .

# Run with Gunicorn for production
CMD ["gunicorn", "main:app", "--worker-class", "uvicorn.workers.UvicornWorker", 
     "--bind", "0.0.0.0:8000", "--workers", "4", "--timeout", "120"]

The four workers handle concurrent requests efficiently, while the 120-second timeout accommodates long-running web augmentation tasks.

What's Next

This prompt management system provides a solid foundation for enterprise AI workflows. To extend it further:

Implement A/B testing for prompt variants using the versioning system
Add analytics to track prompt performance (completion rates, user satisfaction)
Integrate with ChatGPT Prompt Genius for seamless browser-based prompt management
Build a recommendation engine using collaborative filtering based on usage patterns

The ecosystem around ChatGPT continues to evolve rapidly. As of May 2026, tools like chatgpt-on-wechat (42,157 GitHub stars [15]) demonstrate the community's appetite for integrated solutions. By building on open standards and focusing on production-grade architecture, you can create a prompt management system that scales with your organization's AI adoption.

Remember that prompt engineering is both an art and a science—the best prompts emerge from systematic experimentation and collaboration. Your management system should facilitate this process, not constrain it.

References

1. Wikipedia - Rag. Wikipedia. [Source]

2. Wikipedia - Transformers. Wikipedia. [Source]

3. Wikipedia - GPT. Wikipedia. [Source]

4. GitHub - Shubhamsaboo/awesome-llm-apps. Github. [Source]

5. GitHub - huggingface/transformers. Github. [Source]

6. GitHub - Significant-Gravitas/AutoGPT. Github. [Source]

7. GitHub - openai/openai-python. Github. [Source]

How to Build a Prompt Management System with ChatGPT

How to Build a Prompt Management System with ChatGPT

Table of Contents

📺 Watch: Neural Networks Explained

Understanding the Prompt Management Landscape

Architecture and Design Decisions

Prerequisites

Building the Core Prompt Management System

Implementing the API Layer

Web Augmentation Integration

Production Considerations and Edge Cases

Rate Limiting and API Protection

Prompt Validation and Sanitization

Deployment and Scaling

What's Next

References

Was this article helpful?

Related Articles

How to Analyze Rare Particle Decays with Python and ROOT

How to Build a Semantic Search Engine with Qdrant and OpenAI Embeddings

How to Build a Telegram Bot with DeepSeek-R1 Reasoning