Back to Tutorials
tutorialstutorialaiapi

How to Build AI Prototypes with Google AI Studio

Practical tutorial: It indicates an update or new feature in a developer tool, which is interesting but not groundbreaking.

BlogIA AcademyMay 30, 202613 min read2 405 words

How to Build AI Prototypes with Google AI Studio

Table of Contents

📺 Watch: Neural Networks Explained

Video by 3Blue1Brown


Google AI Studio has emerged as a critical tool for developers who need to rapidly prototype generative AI applications without the overhead of managing infrastructure. As of its December 2023 release alongside the Gemini API, Google AI Studio provides a web-based integrated development environment for prototyping applications using generative AI models, targeting both developers and non-technical users for testing prompts and building proof-of-concept applications [1].

In this tutorial, we'll build a production-grade document analysis prototype that extracts structured data from PDFs using Google AI Studio's Gemini models. You'll learn how to move from a simple prompt test to a reusable API endpoint, handle edge cases like rate limiting and token limits, and structure your code for maintainability.

Understanding the Architecture: From Prompt Testing to Production

Before writing code, it's essential to understand how Google AI Studio fits into a modern AI development workflow. The platform provides access to Google's Gemini family of models, which support multimodal inputs including text, images, audio, and video [1]. This makes it particularly powerful for document processing tasks where you need to extract information from scanned PDFs, images, or complex layouts.

The architecture we'll build follows a three-tier pattern:

  1. Prompt Engineering Layer: We'll use Google AI Studio's prompt gallery and testing interface to iterate on our extraction prompts
  2. API Integration Layer: We'll wrap our tested prompts in Python code using the google-generativeai library
  3. Service Layer: We'll expose our functionality through a FastAPI endpoint that handles authentication, rate limiting, and error recovery

This separation ensures that prompt changes don't require code deployments, and our API remains stable even as we iterate on extraction logic.

Prerequisites and Environment Setup

You'll need the following tools installed:

# Python 3.10+ required
python --version  # Verify you have 3.10 or higher

# Create a virtual environment
python -m venv ai-studio-env
source ai-studio-env/bin/activate  # On Windows: ai-studio-env\Scripts\activate

# Install required packages
pip install google-generativeai==0.8.3 fastapi==0.115.0 uvicorn==0.30.6 pypdf2==3.0.1 python-multipart==0.0.12 pydantic==2.9.2

You'll also need a Google AI Studio API key. Navigate to makersuite.google.com/app/apikey and create a new key. Store this securely—we'll use environment variables:

export GOOGLE_API_KEY="your-api-key-here"

Important: The free tier of Google AI Studio has rate limits. According to available documentation, the Gemini API free tier allows 60 requests per minute for text-only prompts and 30 requests per minute for multimodal prompts. We'll implement retry logic to handle these limits gracefully.

Core Implementation: Building the Document Extraction Pipeline

Step 1: Initialize the Gemini Client

Create a file called document_extractor.py. We'll start by setting up our connection to Google AI Studio's API:

import os
import base64
import io
from typing import Optional, Dict, Any
from dataclasses import dataclass

import google.generativeai as genai
from PyPDF2 import PdfReader

@dataclass
class ExtractionConfig:
    """Configuration for document extraction behavior."""
    model_name: str = "gemini-1.5-pro"
    temperature: float = 0.1  # Low temperature for deterministic extraction
    max_output_tokens: int = 8192
    top_p: float = 0.95
    top_k: int = 40

class DocumentExtractor:
    """Production-grade document extraction using Google AI Studio."""

    def __init__(self, api_key: Optional[str] = None, config: Optional[ExtractionConfig] = None):
        self.api_key = api_key or os.environ.get("GOOGLE_API_KEY")
        if not self.api_key:
            raise ValueError("GOOGLE_API_KEY must be provided or set as environment variable")

        genai.configure(api_key=self.api_key)
        self.config = config or ExtractionConfig()
        self.model = genai.GenerativeModel(
            model_name=self.config.model_name,
            generation_config={
                "temperature": self.config.temperature,
                "max_output_tokens": self.config.max_output_tokens,
                "top_p": self.config.top_p,
                "top_k": self.config.top_k,
            }
        )

Why this matters: We're using a dataclass for configuration rather than hardcoding values. This allows us to change model parameters without modifying core logic—critical when you need to switch between Gemini 1.5 Pro for complex documents and Gemini 1.5 Flash for high-throughput scenarios.

Step 2: Implement PDF Processing with Error Handling

Now we'll add PDF processing that handles common edge cases like encrypted files, corrupted PDFs, and large documents:

class DocumentExtractor:
    # .. (previous code remains)

    def extract_text_from_pdf(self, pdf_bytes: bytes) -> str:
        """
        Extract text from PDF bytes with comprehensive error handling.

        Args:
            pdf_bytes: Raw PDF file content

        Returns:
            Extracted text string

        Raises:
            ValueError: If PDF is encrypted or cannot be parsed
            MemoryError: If PDF is too large (>50MB)
        """
        # Check file size before processing
        if len(pdf_bytes) > 50 * 1024 * 1024:  # 50MB limit
            raise MemoryError("PDF exceeds maximum size of 50MB")

        try:
            pdf_file = io.BytesIO(pdf_bytes)
            reader = PdfReader(pdf_file)

            # Check for encryption
            if reader.is_encrypted:
                # Attempt decryption with empty password (common for secured PDFs)
                try:
                    reader.decrypt("")
                except:
                    raise ValueError("PDF is encrypted and cannot be processed")

            text_parts = []
            total_pages = len(reader.pages)

            # Process pages with memory management
            for page_num in range(total_pages):
                try:
                    page = reader.pages[page_num]
                    page_text = page.extract_text()

                    # Skip empty pages
                    if page_text.strip():
                        text_parts.append(f"--- Page {page_num + 1} ---\n{page_text}")

                    # Free memory by deleting page reference
                    del page

                except Exception as e:
                    # Log page-specific errors but continue processing
                    print(f"Warning: Failed to extract page {page_num + 1}: {str(e)}")
                    continue

            full_text = "\n\n".join(text_parts)

            # Truncate if too long for model context window
            # Gemini 1.5 Pro supports up to 1M tokens, but we'll be conservative
            max_chars = 500000  # ~125K tokens
            if len(full_text) > max_chars:
                print(f"Warning: Text truncated from {len(full_text)} to {max_chars} characters")
                full_text = full_text[:max_chars]

            return full_text

        except MemoryError:
            raise
        except Exception as e:
            raise ValueError(f"Failed to parse PDF: {str(e)}")

Edge case handling: This code handles encrypted PDFs, corrupted pages, memory limits, and token window constraints. In production, you'd also want to add logging to a service like Cloud Logging or Datadog.

Step 3: Build the Prompt Template System

One of Google AI Studio's strengths is its prompt gallery, which provides templates for common tasks [1]. We'll create a structured prompt system that separates instructions from data:

class DocumentExtractor:
    # .. (previous code remains)

    def build_extraction_prompt(self, document_text: str, schema: Dict[str, str]) -> str:
        """
        Build a structured extraction prompt with clear instructions.

        Args:
            document_text: The extracted text from the document
            schema: Dictionary mapping field names to descriptions

        Returns:
            Formatted prompt string
        """
        fields_description = "\n".join([
            f"- {field_name}: {description}"
            for field_name, description in schema.items()
        ])

        prompt = f"""You are a precise document extraction system. Extract the following fields from the document text below.

Fields to extract:
{fields_description}

Rules:
1. Return ONLY valid JSON, no markdown formatting or code blocks
2. If a field value is not found, use null (not "N/A" or "Not found")
3. Preserve exact formatting for dates, amounts, and identifiers
4. For monetary values, include currency symbol if present
5. Do not infer or guess missing information

Document text:
{document_text}

Extracted JSON:"""

        return prompt

    def extract_structured_data(self, pdf_bytes: bytes, schema: Dict[str, str]) -> Dict[str, Any]:
        """
        Extract structured data from a PDF document.

        Args:
            pdf_bytes: Raw PDF content
            schema: Field definitions for extraction

        Returns:
            Dictionary with extracted fields
        """
        document_text = self.extract_text_from_pdf(pdf_bytes)
        prompt = self.build_extraction_prompt(document_text, schema)

        # Implement retry logic for rate limiting
        max_retries = 3
        retry_delay = 1.0  # seconds

        for attempt in range(max_retries):
            try:
                response = self.model.generate_content(prompt)

                # Parse the response as JSON
                import json
                try:
                    result = json.loads(response.text)
                    return result
                except json.JSONDecodeError:
                    # Sometimes the model wraps JSON in markdown
                    # Try to extract JSON from code blocks
                    import re
                    json_match = re.search(r'```(?:json)?\s*(*?)\s*```', response.text)
                    if json_match:
                        result = json.loads(json_match.group(1))
                        return result
                    raise ValueError(f"Failed to parse model response as JSON: {response.text[:200]}")

            except Exception as e:
                if "429" in str(e) or "RESOURCE_EXHAUSTED" in str(e):
                    # Rate limit hit - wait and retry
                    import time
                    wait_time = retry_delay * (2 ** attempt)  # Exponential backoff
                    print(f"Rate limit hit, waiting {wait_time}s (attempt {attempt + 1}/{max_retries})")
                    time.sleep(wait_time)
                    continue
                else:
                    # Non-retryable error
                    raise

        raise RuntimeError(f"Failed after {max_retries} attempts due to rate limiting")

Why this matters: The prompt template system is crucial for production use. By separating instructions from data, you can:

  • Version control your prompts independently
  • A/B test different prompt strategies
  • Update extraction logic without code changes
  • Handle model response variations (markdown wrapping, JSON formatting)

Step 4: Create the FastAPI Service Layer

Now we'll wrap our extractor in a FastAPI application that handles concurrent requests, authentication, and proper error responses:

# app.py
from fastapi import FastAPI, UploadFile, File, HTTPException, Depends
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
from pydantic import BaseModel, Field
from typing import Dict, Any, Optional
import os

from document_extractor import DocumentExtractor, ExtractionConfig

app = FastAPI(
    title="Document Extraction API",
    description="Production-grade document extraction using Google AI Studio",
    version="1.0.0"
)

# Security
security = HTTPBearer(auto_error=False)

# Schema definitions for different document types
INVOICE_SCHEMA = {
    "invoice_number": "Invoice identifier (e.g., INV-2024-001)",
    "vendor_name": "Name of the vendor or supplier",
    "invoice_date": "Date of invoice in YYYY-MM-DD format",
    "total_amount": "Total amount including tax",
    "currency": "Currency code (e.g., USD, EUR)",
    "line_items": "Array of items with description, quantity, unit_price, and total"
}

CONTRACT_SCHEMA = {
    "contract_id": "Contract identifier",
    "parties": "Array of party names involved",
    "effective_date": "Contract start date in YYYY-MM-DD format",
    "expiration_date": "Contract end date in YYYY-MM-DD format",
    "contract_value": "Total contract value if specified",
    "governing_law": "Jurisdiction or governing law"
}

class ExtractionRequest(BaseModel):
    """Request model for document extraction."""
    document_type: str = Field(.., description="Type of document: 'invoice' or 'contract'")
    model_name: Optional[str] = Field("gemini-1.5-pro", description="Gemini model to use")

class ExtractionResponse(BaseModel):
    """Response model for extraction results."""
    success: bool
    data: Optional[Dict[str, Any]] = None
    error: Optional[str] = None
    model_used: str

def get_extractor(model_name: str = "gemini-1.5-pro") -> DocumentExtractor:
    """Dependency injection for DocumentExtractor."""
    config = ExtractionConfig(model_name=model_name)
    return DocumentExtractor(config=config)

def verify_api_key(credentials: Optional[HTTPAuthorizationCredentials] = Depends(security)):
    """Simple API key verification."""
    api_key = os.environ.get("API_KEY")
    if api_key and (not credentials or credentials.credentials != api_key):
        raise HTTPException(status_code=403, detail="Invalid API key")
    return credentials

@app.post("/extract", response_model=ExtractionResponse)
async def extract_document(
    file: UploadFile = File(..),
    request: ExtractionRequest = None,
    credentials: Optional[HTTPAuthorizationCredentials] = Depends(verify_api_key)
):
    """
    Extract structured data from a PDF document.

    Supports invoices and contracts with predefined extraction schemas.
    """
    # Validate file type
    if not file.filename.lower().endswith('.pdf'):
        raise HTTPException(status_code=400, detail="Only PDF files are supported")

    # Select schema based on document type
    schema_map = {
        "invoice": INVOICE_SCHEMA,
        "contract": CONTRACT_SCHEMA
    }

    schema = schema_map.get(request.document_type)
    if not schema:
        raise HTTPException(
            status_code=400, 
            detail=f"Unsupported document type: {request.document_type}. Supported types: {list(schema_map.keys())}"
        )

    try:
        # Read file content
        pdf_bytes = await file.read()

        # Initialize extractor and process
        extractor = get_extractor(request.model_name)
        result = extractor.extract_structured_data(pdf_bytes, schema)

        return ExtractionResponse(
            success=True,
            data=result,
            model_used=request.model_name
        )

    except MemoryError as e:
        raise HTTPException(status_code=413, detail=str(e))
    except ValueError as e:
        raise HTTPException(status_code=422, detail=str(e))
    except Exception as e:
        # Log the full error for debugging
        print(f"Extraction failed: {str(e)}")
        raise HTTPException(status_code=500, detail="Internal extraction error")

Step 5: Running the Service

Create a main.py entry point:

# main.py
import uvicorn

if __name__ == "__main__":
    uvicorn.run(
        "app:app",
        host="0.0.0.0",
        port=8000,
        reload=True,  # Disable in production
        workers=4,    # Adjust based on CPU cores
        limit_concurrency=10  # Prevent overwhelming the API
    )

Run the service:

python main.py

Test with curl:

# Test invoice extraction
curl -X POST http://localhost:8000/extract \
  -H "Authorization: Bearer your-api-key" \
  -F "file=@invoice.pdf" \
  -F "request={\"document_type\": \"invoice\", \"model_name\": \"gemini-1.5-pro\"}"

# Expected response:
# {
#   "success": true,
#   "data": {
#     "invoice_number": "INV-2024-001",
#     "vendor_name": "Acme Corp",
#     "invoice_date": "2024-03-15",
#     "total_amount": 1250.00,
#     "currency": "USD",
#     "line_items": [..]
#   },
#   "model_used": "gemini-1.5-pro"
# }

Production Considerations and Edge Cases

Rate Limiting and Cost Management

Google AI Studio's free tier has significant limitations for production use. According to available documentation, the paid tier through Google Cloud provides higher quotas and lower latency. For production deployments:

  1. Implement a token bucket rate limiter to stay within API limits
  2. Cache extraction results using a key-value store like Redis
  3. Monitor token usage with Cloud Monitoring or custom metrics
  4. Set up alerts when approaching quota limits

Handling Large Documents

For documents exceeding the 50MB limit or 500K character threshold:

  1. Split documents into logical sections (by page or chapter)
  2. Process sections in parallel using async workers
  3. Merge results with a post-processing step
  4. Implement pagination for the API response

Error Recovery Strategy

Our current implementation retries on rate limits, but production systems need more:

# Add to DocumentExtractor class
import time
from functools import wraps

def retry_with_backoff(max_retries=3, base_delay=1.0):
    """Decorator for retrying API calls with exponential backoff."""
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_retries):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    if attempt == max_retries - 1:
                        raise
                    wait_time = base_delay * (2 ** attempt)
                    print(f"Retry {attempt + 1}/{max_retries} after {wait_time}s: {str(e)}")
                    time.sleep(wait_time)
            return None
        return wrapper
    return decorator

Conclusion

Google AI Studio provides a powerful foundation for building AI-powered document processing systems. By combining its Gemini API with proper software engineering practices—error handling, rate limiting, structured prompts, and clean architecture—you can move from prototype to production with confidence.

The key takeaways from this tutorial:

  • Start with Google AI Studio's prompt gallery to iterate on extraction logic before writing code
  • Separate prompt templates from application logic for maintainability
  • Implement comprehensive error handling for PDF processing, API limits, and model responses
  • Use FastAPI's dependency injection to manage extractor instances and authentication
  • Monitor and log everything in production to debug extraction failures

What's Next

To extend this prototype into a production system:

  1. Add document classification to automatically detect document types
  2. Implement a feedback loop where users can correct extraction errors
  3. Explore Gemini 1.5 Flash for high-throughput, lower-cost extraction
  4. Add support for image-based documents using Gemini's multimodal capabilities
  5. Integrate with Google Cloud Storag [1]e for document archiving

For more advanced patterns, check out our guides on building RAG systems with Gemini and optimizing prompt engineering.


References

1. Wikipedia - Rag. Wikipedia. [Source]
2. Wikipedia - Gemini. Wikipedia. [Source]
3. GitHub - Shubhamsaboo/awesome-llm-apps. Github. [Source]
4. GitHub - google-gemini/gemini-cli. Github. [Source]
5. Google Gemini Pricing. Pricing. [Source]
tutorialaiapi
Share this article:

Was this article helpful?

Let us know to improve our AI generation.

Related Articles