How to Build AI Prototypes with Google AI Studio
Practical tutorial: It indicates an update or new feature in a developer tool, which is interesting but not groundbreaking.
How to Build AI Prototypes with Google AI Studio
Table of Contents
- How to Build AI Prototypes with Google AI Studio
- Python 3.10+ required
- Create a virtual environment
- Install required packages
📺 Watch: Neural Networks Explained
Video by 3Blue1Brown
Google AI Studio has emerged as a critical tool for developers who need to rapidly prototype generative AI applications without the overhead of managing infrastructure. As of its December 2023 release alongside the Gemini API, Google AI Studio provides a web-based integrated development environment for prototyping applications using generative AI models, targeting both developers and non-technical users for testing prompts and building proof-of-concept applications [1].
In this tutorial, we'll build a production-grade document analysis prototype that extracts structured data from PDFs using Google AI Studio's Gemini models. You'll learn how to move from a simple prompt test to a reusable API endpoint, handle edge cases like rate limiting and token limits, and structure your code for maintainability.
Understanding the Architecture: From Prompt Testing to Production
Before writing code, it's essential to understand how Google AI Studio fits into a modern AI development workflow. The platform provides access to Google's Gemini family of models, which support multimodal inputs including text, images, audio, and video [1]. This makes it particularly powerful for document processing tasks where you need to extract information from scanned PDFs, images, or complex layouts.
The architecture we'll build follows a three-tier pattern:
- Prompt Engineering Layer: We'll use Google AI Studio's prompt gallery and testing interface to iterate on our extraction prompts
- API Integration Layer: We'll wrap our tested prompts in Python code using the
google-generativeailibrary - Service Layer: We'll expose our functionality through a FastAPI endpoint that handles authentication, rate limiting, and error recovery
This separation ensures that prompt changes don't require code deployments, and our API remains stable even as we iterate on extraction logic.
Prerequisites and Environment Setup
You'll need the following tools installed:
# Python 3.10+ required
python --version # Verify you have 3.10 or higher
# Create a virtual environment
python -m venv ai-studio-env
source ai-studio-env/bin/activate # On Windows: ai-studio-env\Scripts\activate
# Install required packages
pip install google-generativeai==0.8.3 fastapi==0.115.0 uvicorn==0.30.6 pypdf2==3.0.1 python-multipart==0.0.12 pydantic==2.9.2
You'll also need a Google AI Studio API key. Navigate to makersuite.google.com/app/apikey and create a new key. Store this securely—we'll use environment variables:
export GOOGLE_API_KEY="your-api-key-here"
Important: The free tier of Google AI Studio has rate limits. According to available documentation, the Gemini API free tier allows 60 requests per minute for text-only prompts and 30 requests per minute for multimodal prompts. We'll implement retry logic to handle these limits gracefully.
Core Implementation: Building the Document Extraction Pipeline
Step 1: Initialize the Gemini Client
Create a file called document_extractor.py. We'll start by setting up our connection to Google AI Studio's API:
import os
import base64
import io
from typing import Optional, Dict, Any
from dataclasses import dataclass
import google.generativeai as genai
from PyPDF2 import PdfReader
@dataclass
class ExtractionConfig:
"""Configuration for document extraction behavior."""
model_name: str = "gemini-1.5-pro"
temperature: float = 0.1 # Low temperature for deterministic extraction
max_output_tokens: int = 8192
top_p: float = 0.95
top_k: int = 40
class DocumentExtractor:
"""Production-grade document extraction using Google AI Studio."""
def __init__(self, api_key: Optional[str] = None, config: Optional[ExtractionConfig] = None):
self.api_key = api_key or os.environ.get("GOOGLE_API_KEY")
if not self.api_key:
raise ValueError("GOOGLE_API_KEY must be provided or set as environment variable")
genai.configure(api_key=self.api_key)
self.config = config or ExtractionConfig()
self.model = genai.GenerativeModel(
model_name=self.config.model_name,
generation_config={
"temperature": self.config.temperature,
"max_output_tokens": self.config.max_output_tokens,
"top_p": self.config.top_p,
"top_k": self.config.top_k,
}
)
Why this matters: We're using a dataclass for configuration rather than hardcoding values. This allows us to change model parameters without modifying core logic—critical when you need to switch between Gemini 1.5 Pro for complex documents and Gemini 1.5 Flash for high-throughput scenarios.
Step 2: Implement PDF Processing with Error Handling
Now we'll add PDF processing that handles common edge cases like encrypted files, corrupted PDFs, and large documents:
class DocumentExtractor:
# .. (previous code remains)
def extract_text_from_pdf(self, pdf_bytes: bytes) -> str:
"""
Extract text from PDF bytes with comprehensive error handling.
Args:
pdf_bytes: Raw PDF file content
Returns:
Extracted text string
Raises:
ValueError: If PDF is encrypted or cannot be parsed
MemoryError: If PDF is too large (>50MB)
"""
# Check file size before processing
if len(pdf_bytes) > 50 * 1024 * 1024: # 50MB limit
raise MemoryError("PDF exceeds maximum size of 50MB")
try:
pdf_file = io.BytesIO(pdf_bytes)
reader = PdfReader(pdf_file)
# Check for encryption
if reader.is_encrypted:
# Attempt decryption with empty password (common for secured PDFs)
try:
reader.decrypt("")
except:
raise ValueError("PDF is encrypted and cannot be processed")
text_parts = []
total_pages = len(reader.pages)
# Process pages with memory management
for page_num in range(total_pages):
try:
page = reader.pages[page_num]
page_text = page.extract_text()
# Skip empty pages
if page_text.strip():
text_parts.append(f"--- Page {page_num + 1} ---\n{page_text}")
# Free memory by deleting page reference
del page
except Exception as e:
# Log page-specific errors but continue processing
print(f"Warning: Failed to extract page {page_num + 1}: {str(e)}")
continue
full_text = "\n\n".join(text_parts)
# Truncate if too long for model context window
# Gemini 1.5 Pro supports up to 1M tokens, but we'll be conservative
max_chars = 500000 # ~125K tokens
if len(full_text) > max_chars:
print(f"Warning: Text truncated from {len(full_text)} to {max_chars} characters")
full_text = full_text[:max_chars]
return full_text
except MemoryError:
raise
except Exception as e:
raise ValueError(f"Failed to parse PDF: {str(e)}")
Edge case handling: This code handles encrypted PDFs, corrupted pages, memory limits, and token window constraints. In production, you'd also want to add logging to a service like Cloud Logging or Datadog.
Step 3: Build the Prompt Template System
One of Google AI Studio's strengths is its prompt gallery, which provides templates for common tasks [1]. We'll create a structured prompt system that separates instructions from data:
class DocumentExtractor:
# .. (previous code remains)
def build_extraction_prompt(self, document_text: str, schema: Dict[str, str]) -> str:
"""
Build a structured extraction prompt with clear instructions.
Args:
document_text: The extracted text from the document
schema: Dictionary mapping field names to descriptions
Returns:
Formatted prompt string
"""
fields_description = "\n".join([
f"- {field_name}: {description}"
for field_name, description in schema.items()
])
prompt = f"""You are a precise document extraction system. Extract the following fields from the document text below.
Fields to extract:
{fields_description}
Rules:
1. Return ONLY valid JSON, no markdown formatting or code blocks
2. If a field value is not found, use null (not "N/A" or "Not found")
3. Preserve exact formatting for dates, amounts, and identifiers
4. For monetary values, include currency symbol if present
5. Do not infer or guess missing information
Document text:
{document_text}
Extracted JSON:"""
return prompt
def extract_structured_data(self, pdf_bytes: bytes, schema: Dict[str, str]) -> Dict[str, Any]:
"""
Extract structured data from a PDF document.
Args:
pdf_bytes: Raw PDF content
schema: Field definitions for extraction
Returns:
Dictionary with extracted fields
"""
document_text = self.extract_text_from_pdf(pdf_bytes)
prompt = self.build_extraction_prompt(document_text, schema)
# Implement retry logic for rate limiting
max_retries = 3
retry_delay = 1.0 # seconds
for attempt in range(max_retries):
try:
response = self.model.generate_content(prompt)
# Parse the response as JSON
import json
try:
result = json.loads(response.text)
return result
except json.JSONDecodeError:
# Sometimes the model wraps JSON in markdown
# Try to extract JSON from code blocks
import re
json_match = re.search(r'```(?:json)?\s*(*?)\s*```', response.text)
if json_match:
result = json.loads(json_match.group(1))
return result
raise ValueError(f"Failed to parse model response as JSON: {response.text[:200]}")
except Exception as e:
if "429" in str(e) or "RESOURCE_EXHAUSTED" in str(e):
# Rate limit hit - wait and retry
import time
wait_time = retry_delay * (2 ** attempt) # Exponential backoff
print(f"Rate limit hit, waiting {wait_time}s (attempt {attempt + 1}/{max_retries})")
time.sleep(wait_time)
continue
else:
# Non-retryable error
raise
raise RuntimeError(f"Failed after {max_retries} attempts due to rate limiting")
Why this matters: The prompt template system is crucial for production use. By separating instructions from data, you can:
- Version control your prompts independently
- A/B test different prompt strategies
- Update extraction logic without code changes
- Handle model response variations (markdown wrapping, JSON formatting)
Step 4: Create the FastAPI Service Layer
Now we'll wrap our extractor in a FastAPI application that handles concurrent requests, authentication, and proper error responses:
# app.py
from fastapi import FastAPI, UploadFile, File, HTTPException, Depends
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
from pydantic import BaseModel, Field
from typing import Dict, Any, Optional
import os
from document_extractor import DocumentExtractor, ExtractionConfig
app = FastAPI(
title="Document Extraction API",
description="Production-grade document extraction using Google AI Studio",
version="1.0.0"
)
# Security
security = HTTPBearer(auto_error=False)
# Schema definitions for different document types
INVOICE_SCHEMA = {
"invoice_number": "Invoice identifier (e.g., INV-2024-001)",
"vendor_name": "Name of the vendor or supplier",
"invoice_date": "Date of invoice in YYYY-MM-DD format",
"total_amount": "Total amount including tax",
"currency": "Currency code (e.g., USD, EUR)",
"line_items": "Array of items with description, quantity, unit_price, and total"
}
CONTRACT_SCHEMA = {
"contract_id": "Contract identifier",
"parties": "Array of party names involved",
"effective_date": "Contract start date in YYYY-MM-DD format",
"expiration_date": "Contract end date in YYYY-MM-DD format",
"contract_value": "Total contract value if specified",
"governing_law": "Jurisdiction or governing law"
}
class ExtractionRequest(BaseModel):
"""Request model for document extraction."""
document_type: str = Field(.., description="Type of document: 'invoice' or 'contract'")
model_name: Optional[str] = Field("gemini-1.5-pro", description="Gemini model to use")
class ExtractionResponse(BaseModel):
"""Response model for extraction results."""
success: bool
data: Optional[Dict[str, Any]] = None
error: Optional[str] = None
model_used: str
def get_extractor(model_name: str = "gemini-1.5-pro") -> DocumentExtractor:
"""Dependency injection for DocumentExtractor."""
config = ExtractionConfig(model_name=model_name)
return DocumentExtractor(config=config)
def verify_api_key(credentials: Optional[HTTPAuthorizationCredentials] = Depends(security)):
"""Simple API key verification."""
api_key = os.environ.get("API_KEY")
if api_key and (not credentials or credentials.credentials != api_key):
raise HTTPException(status_code=403, detail="Invalid API key")
return credentials
@app.post("/extract", response_model=ExtractionResponse)
async def extract_document(
file: UploadFile = File(..),
request: ExtractionRequest = None,
credentials: Optional[HTTPAuthorizationCredentials] = Depends(verify_api_key)
):
"""
Extract structured data from a PDF document.
Supports invoices and contracts with predefined extraction schemas.
"""
# Validate file type
if not file.filename.lower().endswith('.pdf'):
raise HTTPException(status_code=400, detail="Only PDF files are supported")
# Select schema based on document type
schema_map = {
"invoice": INVOICE_SCHEMA,
"contract": CONTRACT_SCHEMA
}
schema = schema_map.get(request.document_type)
if not schema:
raise HTTPException(
status_code=400,
detail=f"Unsupported document type: {request.document_type}. Supported types: {list(schema_map.keys())}"
)
try:
# Read file content
pdf_bytes = await file.read()
# Initialize extractor and process
extractor = get_extractor(request.model_name)
result = extractor.extract_structured_data(pdf_bytes, schema)
return ExtractionResponse(
success=True,
data=result,
model_used=request.model_name
)
except MemoryError as e:
raise HTTPException(status_code=413, detail=str(e))
except ValueError as e:
raise HTTPException(status_code=422, detail=str(e))
except Exception as e:
# Log the full error for debugging
print(f"Extraction failed: {str(e)}")
raise HTTPException(status_code=500, detail="Internal extraction error")
Step 5: Running the Service
Create a main.py entry point:
# main.py
import uvicorn
if __name__ == "__main__":
uvicorn.run(
"app:app",
host="0.0.0.0",
port=8000,
reload=True, # Disable in production
workers=4, # Adjust based on CPU cores
limit_concurrency=10 # Prevent overwhelming the API
)
Run the service:
python main.py
Test with curl:
# Test invoice extraction
curl -X POST http://localhost:8000/extract \
-H "Authorization: Bearer your-api-key" \
-F "file=@invoice.pdf" \
-F "request={\"document_type\": \"invoice\", \"model_name\": \"gemini-1.5-pro\"}"
# Expected response:
# {
# "success": true,
# "data": {
# "invoice_number": "INV-2024-001",
# "vendor_name": "Acme Corp",
# "invoice_date": "2024-03-15",
# "total_amount": 1250.00,
# "currency": "USD",
# "line_items": [..]
# },
# "model_used": "gemini-1.5-pro"
# }
Production Considerations and Edge Cases
Rate Limiting and Cost Management
Google AI Studio's free tier has significant limitations for production use. According to available documentation, the paid tier through Google Cloud provides higher quotas and lower latency. For production deployments:
- Implement a token bucket rate limiter to stay within API limits
- Cache extraction results using a key-value store like Redis
- Monitor token usage with Cloud Monitoring or custom metrics
- Set up alerts when approaching quota limits
Handling Large Documents
For documents exceeding the 50MB limit or 500K character threshold:
- Split documents into logical sections (by page or chapter)
- Process sections in parallel using async workers
- Merge results with a post-processing step
- Implement pagination for the API response
Error Recovery Strategy
Our current implementation retries on rate limits, but production systems need more:
# Add to DocumentExtractor class
import time
from functools import wraps
def retry_with_backoff(max_retries=3, base_delay=1.0):
"""Decorator for retrying API calls with exponential backoff."""
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
for attempt in range(max_retries):
try:
return func(*args, **kwargs)
except Exception as e:
if attempt == max_retries - 1:
raise
wait_time = base_delay * (2 ** attempt)
print(f"Retry {attempt + 1}/{max_retries} after {wait_time}s: {str(e)}")
time.sleep(wait_time)
return None
return wrapper
return decorator
Conclusion
Google AI Studio provides a powerful foundation for building AI-powered document processing systems. By combining its Gemini API with proper software engineering practices—error handling, rate limiting, structured prompts, and clean architecture—you can move from prototype to production with confidence.
The key takeaways from this tutorial:
- Start with Google AI Studio's prompt gallery to iterate on extraction logic before writing code
- Separate prompt templates from application logic for maintainability
- Implement comprehensive error handling for PDF processing, API limits, and model responses
- Use FastAPI's dependency injection to manage extractor instances and authentication
- Monitor and log everything in production to debug extraction failures
What's Next
To extend this prototype into a production system:
- Add document classification to automatically detect document types
- Implement a feedback loop where users can correct extraction errors
- Explore Gemini 1.5 Flash for high-throughput, lower-cost extraction
- Add support for image-based documents using Gemini's multimodal capabilities
- Integrate with Google Cloud Storag [1]e for document archiving
For more advanced patterns, check out our guides on building RAG systems with Gemini and optimizing prompt engineering.
Was this article helpful?
Let us know to improve our AI generation.
Related Articles
How to Analyze Security Logs with DeepSeek Locally
Practical tutorial: Analyze security logs with DeepSeek locally
How to Build a Multimodal App with Gemini 2.0 Vision API
Practical tutorial: Build a multimodal app with Gemini 2.0 Vision API
How to Build an AI Research Assistant with Perplexity API
Practical tutorial: Create an AI research assistant with Perplexity API