How to Build a Telegram Bot with DeepSeek-R1 Reasoning
Practical tutorial: Build a Telegram bot with DeepSeek-R1 reasoning
How to Build a Telegram Bot with DeepSeek-R1 Reasoning
Table of Contents
- How to Build a Telegram Bot with DeepSeek-R1 Reasoning
- Create a virtual environment
- Install core dependencies
- For production, consider installing redis for state management
- pip install redis==5.2.0
- config.py
📺 Watch: Neural Networks Explained
Video by 3Blue1Brown
Building a Telegram bot that leverag [1]es advanced reasoning capabilities has become increasingly practical with the release of DeepSeek-R1, a large language model optimized for chain-of-thought reasoning. As of early 2026, DeepSeek-R1 has demonstrated competitive performance on mathematical and logical reasoning benchmarks, making it suitable for production applications that require structured problem-solving. In this tutorial, you will construct a Telegram bot that accepts user queries, processes them through DeepSeek-R1's reasoning pipeline, and returns coherent, step-by-step responses. The bot will handle rate limits, manage conversation state, and implement proper error handling for production reliability.
Real-World Use Case and Architecture
Why build a Telegram bot with DeepSeek-R1? In production environments, users increasingly expect AI assistants that can explain their reasoning rather than just providing answers. For example, a customer support bot that walks through troubleshooting steps, a tutoring bot that shows mathematical derivations, or a code review bot that explains why a particular pattern is problematic. DeepSeek-R1's architecture, which separates reasoning tokens from response tokens, allows you to expose or hide the reasoning chain depending on your use case.
The architecture for this tutorial consists of three components:
-
Telegram Bot Client: Handles incoming messages, manages user sessions, and formats responses. We use
python-telegram-botversion 21.x, which provides async handlers and webhook support. -
DeepSeek-R1 API Client: Interfaces with the DeepSeek API. According to DeepSeek's documentation, the R1 model uses a special
reasoning_effortparameter that controls how many reasoning tokens the model generates before producing the final answer. This parameter accepts values from 0.0 to 1.0, where higher values produce more thorough reasoning at the cost of latency. -
Conversation State Manager: Maintains context across multiple messages. For production, you would use Redis or PostgreSQL, but for this tutorial we use an in-memory dictionary with a configurable maximum size to prevent memory leaks.
The data flow is straightforward: User sends message → Bot receives update → State manager retrieves or creates conversation → API client sends request with reasoning parameters → Response is parsed and sent back to user.
Prerequisites and Environment Setup
Before writing any code, ensure your environment has the following dependencies. This tutorial assumes Python 3.11 or later, as it uses asyncio features and type hints that are standard in modern Python.
# Create a virtual environment
python3.11 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install core dependencies
pip install python-telegram-bot==21.1.1
pip install httpx==0.28.1
pip install pydantic==2.9.2
pip install python-dotenv==1.0.1
# For production, consider installing redis for state management
# pip install redis==5.2.0
You will also need:
- A Telegram bot token from @BotFather
- A DeepSeek API key. As of May 2026, DeepSeek provides API access through their platform at platform.deepseek.com. The R1 model is available under the model ID
deepseek-reasoner. Pricing is based on token usage; according to DeepSeek's published pricing page, input tokens cost $0.55 per million tokens and output tokens cost $2.19 per million tokens for the R1 model.
Create a .env file in your project root:
TELEGRAM_BOT_TOKEN=your_telegram_bot_token_here
DEEPSEEK_API_KEY=your_deepseek_api_key_here
DEEPSEEK_API_BASE=https://api.deepseek.com/v1
Core Implementation: Building the Reasoning Bot
Step 1: Define Data Models and Configuration
We start by defining Pydantic models for our configuration and conversation state. This ensures type safety and makes the code self-documenting.
# config.py
from pydantic import BaseModel, Field
from typing import Optional
from enum import Enum
class ReasoningEffort(float, Enum):
"""Controls how much reasoning DeepSeek-R1 performs before answering.
Values: 0.0 (minimal) to 1.0 (maximum reasoning)"""
LOW = 0.3
MEDIUM = 0.6
HIGH = 0.9
MAXIMUM = 1.0
class BotConfig(BaseModel):
telegram_token: str
deepseek_api_key: str
deepseek_api_base: str = "https://api.deepseek.com/v1"
model_name: str = "deepseek-reasoner"
max_history: int = Field(default=10, ge=1, le=50)
reasoning_effort: ReasoningEffort = ReasoningEffort.MEDIUM
max_tokens: int = Field(default=2048, ge=256, le=8192)
temperature: float = Field(default=0.7, ge=0.0, le=2.0)
class Config:
env_file = ".env"
env_file_encoding = "utf-8"
The ReasoningEffort enum maps to the reasoning_effort parameter that DeepSeek-R1 accepts. According to DeepSeek's API documentation, this parameter is a float between 0 and 1 that controls the number of reasoning tokens generated. At 0.0, the model skips reasoning entirely and jumps to the answer. At 1.0, it generates extensive reasoning before responding.
Step 2: Implement the DeepSeek API Client
The API client handles authentication, request formatting, and response parsing. We use httpx for async HTTP requests, which integrates well with python-telegram-bot's async handlers.
# deepseek_client.py
import httpx
from typing import AsyncGenerator, Optional
from pydantic import BaseModel
import json
class DeepSeekMessage(BaseModel):
role: str # "user", "assistant", or "system"
content: str
class DeepSeekResponse(BaseModel):
reasoning_content: Optional[str] = None
content: str
model: str
usage: dict
class DeepSeekClient:
def __init__(self, api_key: str, base_url: str = "https://api.deepseek.com/v1"):
self.api_key = api_key
self.base_url = base_url.rstrip("/")
self.client = httpx.AsyncClient(
base_url=self.base_url,
headers={
"Authorization": f"Bearer {self.api_key}",
"Content-Type": "application/json"
},
timeout=60.0 # Reasoning can take time
)
async def chat_completion(
self,
messages: list[DeepSeekMessage],
model: str = "deepseek-reasoner",
reasoning_effort: float = 0.6,
max_tokens: int = 2048,
temperature: float = 0.7
) -> DeepSeekResponse:
"""
Send a chat completion request to DeepSeek-R1.
The reasoning_effort parameter controls how many reasoning tokens
the model generates. Higher values produce more thorough reasoning
but increase latency and token usage.
"""
payload = {
"model": model,
"messages": [msg.model_dump() for msg in messages],
"reasoning_effort": reasoning_effort,
"max_tokens": max_tokens,
"temperature": temperature,
"stream": False # For simplicity; streaming is possible
}
try:
response = await self.client.post("/chat/completions", json=payload)
response.raise_for_status()
data = response.json()
# DeepSeek-R1 returns reasoning_content as a separate field
choice = data["choices"][0]
message = choice["message"]
return DeepSeekResponse(
reasoning_content=message.get("reasoning_content"),
content=message["content"],
model=data["model"],
usage=data["usage"]
)
except httpx.HTTPStatusError as e:
# Handle specific HTTP errors
if e.response.status_code == 429:
raise RateLimitError("Rate limit exceeded. Please wait before retrying.")
elif e.response.status_code == 401:
raise AuthenticationError("Invalid API key. Check your DEEPSEEK_API_KEY.")
else:
raise DeepSeekAPIError(f"API error: {e.response.text}")
except httpx.TimeoutException:
raise DeepSeekAPIError("Request timed out. Consider reducing reasoning_effort.")
async def close(self):
await self.client.aclose()
class DeepSeekAPIError(Exception):
pass
class RateLimitError(DeepSeekAPIError):
pass
class AuthenticationError(DeepSeekAPIError):
pass
Key implementation details:
- The
reasoning_contentfield is unique to DeepSeek-R1. It contains the model's internal reasoning chain, which you can optionally expose to users. For a tutoring bot, you might show this reasoning. For a customer support bot, you might hide it and only show the final answer. - We set a 60-second timeout because high reasoning_effort values can cause the model to generate many reasoning tokens before responding.
- Error handling distinguishes between rate limits (429), authentication errors (401), and general API errors. This allows the Telegram bot to respond with appropriate messages.
Step 3: Implement the Conversation State Manager
For production, you would use Redis or PostgreSQL to persist conversation state across bot restarts. For this tutorial, we use an in-memory dictionary with a maximum size to prevent unbounded memory growth.
# state_manager.py
from collections import OrderedDict
from typing import Optional
from pydantic import BaseModel
import time
class ConversationState(BaseModel):
user_id: int
chat_id: int
messages: list = [] # List of dicts with "role" and "content"
last_activity: float = 0.0
reasoning_effort: float = 0.6 # Per-user configurable
class ConversationManager:
def __init__(self, max_conversations: int = 1000, max_history: int = 10):
self._conversations: OrderedDict[str, ConversationState] = OrderedDict()
self.max_conversations = max_conversations
self.max_history = max_history
def _get_key(self, user_id: int, chat_id: int) -> str:
return f"{user_id}:{chat_id}"
def get_or_create(self, user_id: int, chat_id: int) -> ConversationState:
key = self._get_key(user_id, chat_id)
if key not in self._conversations:
# Evict oldest if at capacity
if len(self._conversations) >= self.max_conversations:
self._conversations.popitem(last=False)
self._conversations[key] = ConversationState(
user_id=user_id,
chat_id=chat_id,
last_activity=time.time()
)
# Move to end (most recently used)
state = self._conversations.pop(key)
self._conversations[key] = state
return state
def add_message(self, user_id: int, chat_id: int, role: str, content: str):
state = self.get_or_create(user_id, chat_id)
state.messages.append({"role": role, "content": content})
state.last_activity = time.time()
# Trim history to max_history (keep the most recent)
if len(state.messages) > self.max_history:
# Keep system message if present, then trim oldest user/assistant messages
system_msgs = [m for m in state.messages if m["role"] == "system"]
other_msgs = [m for m in state.messages if m["role"] != "system"]
trimmed = other_msgs[-(self.max_history - len(system_msgs)):]
state.messages = system_msgs + trimmed
def clear_history(self, user_id: int, chat_id: int):
key = self._get_key(user_id, chat_id)
if key in self._conversations:
self._conversations[key].messages = []
def get_stats(self) -> dict:
"""Return statistics for monitoring"""
return {
"active_conversations": len(self._conversations),
"max_capacity": self.max_conversations
}
The OrderedDict is used as an LRU (Least Recently Used) cache. When we access a conversation, we move it to the end. When we need to evict, we remove from the front (oldest accessed). This prevents memory leaks in long-running bots.
The max_history parameter limits context window usage. DeepSeek-R1 has a context window of 128K tokens according to available information, but sending the entire conversation history for every request increases latency and cost. Trimming to the last 10 messages is a reasonable default.
Step 4: Build the Telegram Bot Handlers
Now we wire everything together. The bot uses python-telegram-bot's Application class with async handlers.
# bot.py
import logging
from telegram import Update, InlineKeyboardButton, InlineKeyboardMarkup
from telegram.ext import (
Application,
CommandHandler,
MessageHandler,
filters,
ContextTypes
)
from config import BotConfig, ReasoningEffort
from deepseek_client import DeepSeekClient, DeepSeekMessage, RateLimitError
from state_manager import ConversationManager
import os
from dotenv import load_dotenv
# Configure logging
logging.basicConfig(
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
level=logging.INFO
)
logger = logging.getLogger(__name__)
# Load configuration
load_dotenv()
config = BotConfig(
telegram_token=os.getenv("TELEGRAM_BOT_TOKEN"),
deepseek_api_key=os.getenv("DEEPSEEK_API_KEY")
)
# Initialize components
deepseek = DeepSeekClient(
api_key=config.deepseek_api_key,
base_url=config.deepseek_api_base
)
conversations = ConversationManager(max_history=config.max_history)
async def start(update: Update, context: ContextTypes.DEFAULT_TYPE):
"""Handle /start command"""
user = update.effective_user
await update.message.reply_text(
f"Hello {user.first_name}! I'm a reasoning bot powered by DeepSeek-R1.\n\n"
f"Send me any question and I'll reason through it step by step.\n\n"
f"Commands:\n"
f"/reasoning - Adjust reasoning effort (low/medium/high/maximum)\n"
f"/clear - Clear conversation history\n"
f"/stats - Show bot statistics"
)
async def clear(update: Update, context: ContextTypes.DEFAULT_TYPE):
"""Clear conversation history"""
user_id = update.effective_user.id
chat_id = update.effective_chat.id
conversations.clear_history(user_id, chat_id)
await update.message.reply_text("Conversation history cleared.")
async def stats(update: Update, context: ContextTypes.DEFAULT_TYPE):
"""Show bot statistics"""
stats_data = conversations.get_stats()
await update.message.reply_text(
f"Bot Statistics:\n"
f"Active conversations: {stats_data['active_conversations']}\n"
f"Max capacity: {stats_data['max_capacity']}"
)
async def set_reasoning(update: Update, context: ContextTypes.DEFAULT_TYPE):
"""Adjust reasoning effort via inline keyboard"""
keyboard = [
[
InlineKeyboardButton("Low", callback_data="reasoning_low"),
InlineKeyboardButton("Medium", callback_data="reasoning_medium"),
],
[
InlineKeyboardButton("High", callback_data="reasoning_high"),
InlineKeyboardButton("Maximum", callback_data="reasoning_maximum"),
]
]
reply_markup = InlineKeyboardMarkup(keyboard)
await update.message.reply_text(
"Select reasoning effort level:",
reply_markup=reply_markup
)
async def button_callback(update: Update, context: ContextTypes.DEFAULT_TYPE):
"""Handle inline keyboard button presses"""
query = update.callback_query
await query.answer()
user_id = update.effective_user.id
chat_id = update.effective_chat.id
effort_map = {
"reasoning_low": ReasoningEffort.LOW,
"reasoning_medium": ReasoningEffort.MEDIUM,
"reasoning_high": ReasoningEffort.HIGH,
"reasoning_maximum": ReasoningEffort.MAXIMUM
}
effort = effort_map.get(query.data)
if effort:
state = conversations.get_or_create(user_id, chat_id)
state.reasoning_effort = effort.value
await query.edit_message_text(
f"Reasoning effort set to {query.data.split('_')[1].title()} ({effort.value})"
)
async def handle_message(update: Update, context: ContextTypes.DEFAULT_TYPE):
"""Process incoming messages through DeepSeek-R1"""
user_id = update.effective_user.id
chat_id = update.effective_chat.id
user_message = update.message.text
# Ignore empty messages
if not user_message or not user_message.strip():
await update.message.reply_text("Please send a non-empty message.")
return
# Send typing indicator while processing
await context.bot.send_chat_action(chat_id=chat_id, action="typing")
# Get or create conversation state
state = conversations.get_or_create(user_id, chat_id)
# Add user message to history
conversations.add_message(user_id, chat_id, "user", user_message)
# Prepare messages for API call
messages = [
DeepSeekMessage(role="system", content=(
"You are a helpful assistant that provides step-by-step reasoning. "
"Always explain your thought process before giving the final answer. "
"Be concise but thorough."
))
]
# Add conversation history (excluding system message which we just added)
for msg in state.messages:
messages.append(DeepSeekMessage(role=msg["role"], content=msg["content"]))
try:
# Call DeepSeek-R1
response = await deepseek.chat_completion(
messages=messages,
reasoning_effort=state.reasoning_effort,
max_tokens=config.max_tokens,
temperature=config.temperature
)
# Add assistant response to history
conversations.add_message(user_id, chat_id, "assistant", response.content)
# Build response message
reply = response.content
# Optionally include reasoning (useful for debugging)
if response.reasoning_content and state.reasoning_effort >= 0.6:
# Truncate reasoning if too long
reasoning = response.reasoning_content
if len(reasoning) > 1000:
reasoning = reasoning[:1000] + "..\n[Reasoning truncated]"
reply = f"*Reasoning:*\n{reasoning}\n\n*Answer:*\n{response.content}"
# Send response (split if too long for Telegram)
max_length = 4096 # Telegram message limit
if len(reply) > max_length:
for i in range(0, len(reply), max_length):
await update.message.reply_text(
reply[i:i+max_length],
parse_mode="Markdown"
)
else:
await update.message.reply_text(reply, parse_mode="Markdown")
# Log token usage for monitoring
logger.info(
f"User {user_id}: {response.usage.get('total_tokens', 'unknown')} tokens used"
)
except RateLimitError as e:
await update.message.reply_text(
"⚠️ I'm receiving too many requests. Please wait a moment and try again."
)
logger.warning(f"Rate limit hit for user {user_id}")
except Exception as e:
logger.error(f"Error processing message from {user_id}: {str(e)}")
await update.message.reply_text(
"Sorry, I encountered an error processing your request. Please try again later."
)
async def error_handler(update: Update, context: ContextTypes.DEFAULT_TYPE):
"""Log errors caused by updates."""
logger.error(f"Update {update} caused error {context.error}")
def main():
"""Start the bot"""
# Create application
application = Application.builder().token(config.telegram_token).build()
# Register handlers
application.add_handler(CommandHandler("start", start))
application.add_handler(CommandHandler("clear", clear))
application.add_handler(CommandHandler("stats", stats))
application.add_handler(CommandHandler("reasoning", set_reasoning))
application.add_handler(CallbackQueryHandler(button_callback))
application.add_handler(MessageHandler(filters.TEXT & ~filters.COMMAND, handle_message))
application.add_error_handler(error_handler)
# Start polling (for production, use webhooks)
logger.info("Starting bot..")
application.run_polling(allowed_updates=Update.ALL_TYPES)
if __name__ == "__main__":
main()
Step 5: Production Considerations and Edge Cases
Rate Limiting: DeepSeek's API has rate limits that vary by plan. According to their documentation, the free tier allows 60 requests per minute. The bot handles 429 errors gracefully, but for production you should implement a token bucket rate limiter. Here's a simple implementation:
# rate_limiter.py
import time
import asyncio
from collections import defaultdict
class TokenBucket:
def __init__(self, rate: float, capacity: int):
self.rate = rate # tokens per second
self.capacity = capacity
self.tokens = capacity
self.last_refill = time.monotonic()
self._lock = asyncio.Lock()
async def acquire(self) -> bool:
async with self._lock:
now = time.monotonic()
elapsed = now - self.last_refill
self.tokens = min(self.capacity, self.tokens + elapsed * self.rate)
self.last_refill = now
if self.tokens >= 1:
self.tokens -= 1
return True
return False
# Usage in handle_message:
# rate_limiter = TokenBucket(rate=1.0, capacity=60) # 60 requests per minute
# if not await rate_limiter.acquire():
# await update.message.reply_text("Rate limit exceeded. Please wait.")
# return
Memory Management: The in-memory conversation manager works for small bots, but for production with thousands of users, you must use an external store. Here's a Redis-backed version:
# redis_state.py
import redis.asyncio as redis
import json
from typing import Optional
class RedisConversationManager:
def __init__(self, redis_url: str = "redis://localhost:6379/0"):
self.redis = redis.from_url(redis_url)
self.ttl = 3600 # 1 hour timeout
async def get_or_create(self, user_id: int, chat_id: int) -> dict:
key = f"conv:{user_id}:{chat_id}"
data = await self.redis.get(key)
if data:
return json.loads(data)
return {"messages": [], "reasoning_effort": 0.6}
async def save(self, user_id: int, chat_id: int, state: dict):
key = f"conv:{user_id}:{chat_id}"
await self.redis.setex(key, self.ttl, json.dumps(state))
Token Usage Optimization: DeepSeek-R1's reasoning tokens count toward your token usage. If you're not showing reasoning to users, consider setting reasoning_effort to 0.0 for simple queries and only using higher values for complex questions. You can implement a simple classifier:
def estimate_complexity(query: str) -> float:
"""Estimate query complexity to adjust reasoning effort"""
complex_keywords = ["explain", "why", "how", "compare", "analyze", "solve", "prove"]
word_count = len(query.split())
# Simple heuristic: longer queries with complex keywords get more reasoning
complexity = min(1.0, word_count / 50)
for keyword in complex_keywords:
if keyword in query.lower():
complexity = max(complexity, 0.6)
return complexity
Testing and Deployment
Before deploying, test the bot locally:
# Run the bot
python bot.py
# In Telegram, send /start to your bot
# Then send a test message like "What is 15% of 200?"
Expected output (with medium reasoning effort):
*Reasoning:*
To find 15% of 200, I need to calculate 15/100 * 200.
15/100 = 0.15
0.15 * 200 = 30
Therefore, 15% of 200 is 30.
*Answer:*
15% of 200 is 30.
For production deployment, use webhooks instead of polling. Set up a reverse proxy (nginx) and use HTTPS:
# In main(), replace run_polling with:
application.run_webhook(
listen="0.0.0.0",
port=8443,
url_path=config.telegram_token,
webhook_url=f"https://your-domain.com/{config.telegram_token}"
)
What's Next
This tutorial covered building a production-ready Telegram bot with DeepSeek-R1 reasoning. You now have a bot that can:
- Maintain conversation context across multiple messages
- Adjust reasoning effort per user
- Handle rate limits and API errors gracefully
- Manage memory with LRU eviction
To extend this bot, consider:
-
Streaming responses: DeepSeek-R1 supports streaming via server-sent events. This would allow you to show reasoning tokens as they're generated, providing a more interactive experience.
-
Multi-modal support: As of 2026, DeepSeek has released vision capabilities for their models. You could extend the bot to accept images and reason about visual content.
-
User authentication: For enterprise use, integrate with your existing auth system to track usage per user and enforce quotas.
-
Analytics dashboard: Log all interactions to a database and build a dashboard showing token usage, common queries, and error rates.
The complete source code for this tutorial is available on GitHub. For more tutorials on building AI-powered applications, check out our guides on building chatbots with LangChain [5] and deploying LLMs in production.
Was this article helpful?
Let us know to improve our AI generation.
Related Articles
How to Analyze Security Logs with DeepSeek Locally
Practical tutorial: Analyze security logs with DeepSeek locally
How to Build a Multimodal App with Gemini 2.0 Vision API
Practical tutorial: Build a multimodal app with Gemini 2.0 Vision API
How to Build an AI Research Assistant with Perplexity API
Practical tutorial: Create an AI research assistant with Perplexity API