How to Use the Claude API: A Complete Beginner Tutorial
The Claude API gives you programmatic access to Anthropic’s Claude models — the same AI that powers Cursor, Claude Code, and a growing list of developer tools. This tutorial covers everything from your first API call to advanced patterns like streaming, tool use, and vision.
We will use the official Anthropic Python and TypeScript SDKs with the latest models as of April 2026.
Current Claude Models and Pricing
Before writing code, know what you are working with:
| Model | Model ID | Input (per 1M tokens) | Output (per 1M tokens) | Best For |
|---|---|---|---|---|
| Claude Opus 4.6 | claude-opus-4-6-20260401 | $5.00 | $25.00 | Complex reasoning, analysis, coding |
| Claude Sonnet 4.6 | claude-sonnet-4-6-20260401 | $3.00 | $15.00 | Balanced speed and quality (recommended default) |
| Claude Haiku 4.5 | claude-haiku-4-5-20250620 | $1.00 | $5.00 | Fast, simple tasks, high volume |
Key pricing notes:
- The full 1M token context window is included at standard pricing — no long-context surcharges.
- The Batch API gives you a flat 50% discount on all tokens by processing requests asynchronously within 24 hours.
- Prompt caching stores repeated context and charges cache reads at roughly 10% of the standard input rate.
For most development work, Claude Sonnet 4.6 is the right default. It offers strong reasoning and code quality at a moderate price. Use Haiku for high-volume or simple tasks, and Opus when you need maximum intelligence for complex problems.
Prerequisites
- Python 3.8+ or Node.js 18+
- A terminal
- An Anthropic account (free to create)
Step 1: Get Your API Key
- Go to console.anthropic.com
- Create an account or sign in
- Navigate to Settings > API Keys
- Click Create Key and copy it immediately — you will not be able to see it again
Store your API key as an environment variable. Never hardcode it in your source files and never commit it to version control.
macOS/Linux:
export ANTHROPIC_API_KEY="sk-ant-your-key-here"
Add it to your shell profile (~/.bashrc, ~/.zshrc, or equivalent) so it persists across sessions:
echo 'export ANTHROPIC_API_KEY="sk-ant-your-key-here"' >> ~/.zshrc
source ~/.zshrc
Windows (PowerShell):
$env:ANTHROPIC_API_KEY = "sk-ant-your-key-here"
Alternatively, use a .env file with a library like python-dotenv or dotenv for Node.js. Just make sure .env is in your .gitignore.
Step 2: Install the SDK
Python:
pip install anthropic
TypeScript/JavaScript:
npm install @anthropic-ai/sdk
The SDK automatically reads the ANTHROPIC_API_KEY environment variable, so you do not need to pass it explicitly in your code.
Step 3: Your First API Call
Python:
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-6-20260401",
max_tokens=1024,
messages=[
{
"role": "user",
"content": "Explain what a REST API is in two sentences."
}
]
)
print(message.content[0].text)
TypeScript:
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const message = await client.messages.create({
model: "claude-sonnet-4-6-20260401",
max_tokens: 1024,
messages: [
{
role: "user",
content: "Explain what a REST API is in two sentences.",
},
],
});
console.log(message.content[0].text);
Run it. You should see Claude’s response printed to your terminal.
Understanding the Response
The message object contains more than just text. Here is the full structure:
print(message.model) # "claude-sonnet-4-6-20260401"
print(message.role) # "assistant"
print(message.stop_reason) # "end_turn"
print(message.usage.input_tokens) # Number of input tokens used
print(message.usage.output_tokens) # Number of output tokens used
The usage field is critical for monitoring costs. Track input_tokens and output_tokens to calculate your spending.
Step 4: System Prompts
System prompts define Claude’s behavior, personality, and constraints. They are passed as a separate system parameter, not as a message in the messages array.
message = client.messages.create(
model="claude-sonnet-4-6-20260401",
max_tokens=1024,
system="You are a senior Python developer. Give concise, practical answers with code examples. Always use type hints. Never explain what the user already knows.",
messages=[
{
"role": "user",
"content": "How do I read a CSV file and filter rows where the 'age' column is greater than 30?"
}
]
)
Tips for effective system prompts:
- Be specific about the format you want (bullet points, code only, brief vs. detailed)
- Tell Claude what to skip (“don’t explain basic concepts”, “no preamble”)
- Define the persona precisely (“senior backend engineer” is better than “helpful assistant”)
- Include constraints (“respond in under 200 words”, “only suggest solutions using the standard library”)
System prompts count toward your input tokens, so keep them focused.
Step 5: Multi-Turn Conversations
Claude is stateless — each API call is independent. To maintain conversation context, you pass the full conversation history in the messages array:
conversation = [
{
"role": "user",
"content": "What's the best Python web framework for a REST API?"
},
{
"role": "assistant",
"content": "For a REST API, I'd recommend FastAPI. It's async-native, generates OpenAPI docs automatically, and has built-in request validation via Pydantic."
},
{
"role": "user",
"content": "Show me a basic FastAPI setup with one GET and one POST endpoint."
}
]
message = client.messages.create(
model="claude-sonnet-4-6-20260401",
max_tokens=2048,
messages=conversation
)
print(message.content[0].text)
Important: Every message in the history counts toward your input tokens. For long conversations, this adds up. Strategies to manage this:
- Summarize older messages periodically
- Only include messages relevant to the current question
- Use prompt caching (covered below) to reduce costs on repeated context
Messages must alternate between user and assistant roles. The first message must always be from user.
Step 6: Streaming Responses
Streaming returns the response token by token as it is generated, rather than waiting for the complete response. This is essential for any user-facing application where perceived latency matters.
Python:
with client.messages.stream(
model="claude-sonnet-4-6-20260401",
max_tokens=1024,
messages=[
{"role": "user", "content": "Write a Python function to validate email addresses using regex, with tests."}
]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
TypeScript:
const stream = client.messages.stream({
model: "claude-sonnet-4-6-20260401",
max_tokens: 1024,
messages: [
{
role: "user",
content:
"Write a Python function to validate email addresses using regex, with tests.",
},
],
});
for await (const event of stream) {
if (
event.type === "content_block_delta" &&
event.delta.type === "text_delta"
) {
process.stdout.write(event.delta.text);
}
}
Handling Stream Events
The stream emits several event types. The most useful ones:
with client.messages.stream(
model="claude-sonnet-4-6-20260401",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
) as stream:
for event in stream:
if event.type == "text":
print(event.text, end="")
elif event.type == "message_stop":
print("\n--- Stream complete ---")
# After the stream completes, get the final message
final_message = stream.get_final_message()
print(f"Tokens used: {final_message.usage.input_tokens} in, {final_message.usage.output_tokens} out")
Step 7: Vision (Image Analysis)
Claude can analyze images — screenshots, diagrams, charts, documents, UI mockups, and photos. You can provide images as base64-encoded data or via URL.
Base64 Method (Python):
import anthropic
import base64
from pathlib import Path
client = anthropic.Anthropic()
# Read and encode the image
image_data = base64.standard_b64encode(
Path("screenshot.png").read_bytes()
).decode("utf-8")
message = client.messages.create(
model="claude-sonnet-4-6-20260401",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data,
},
},
{
"type": "text",
"text": "What errors do you see in this screenshot? List them with line numbers."
}
],
}
],
)
print(message.content[0].text)
URL Method (Python):
message = client.messages.create(
model="claude-sonnet-4-6-20260401",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "url",
"url": "https://example.com/architecture-diagram.png",
},
},
{
"type": "text",
"text": "Explain this architecture diagram. What are the main components and how do they communicate?"
}
],
}
],
)
Supported formats: JPEG, PNG, GIF, and WebP. Images can be up to 8000x8000 pixels, with optimal performance at 1568 pixels or less on the longest edge. You can include up to 100 images per API request (600 for models with a 1M context window).
Practical Vision Use Cases for Developers
- Bug reports: Paste a screenshot of an error and ask Claude to diagnose it
- UI review: Send a mockup or screenshot and ask for accessibility or design feedback
- Diagram analysis: Upload architecture diagrams, flowcharts, or ERDs and ask Claude to explain or critique them
- Document parsing: Send photos of whiteboards, handwritten notes, or printed documents for extraction
- Code review: Screenshot code from an unfamiliar editor or tool and ask Claude to analyze it
Step 8: Tool Use (Function Calling)
Tool use lets Claude call functions that you define. Claude decides when a tool is needed based on the user’s request, returns a structured tool call, and your code executes it. This is how you give Claude access to real-time data, external APIs, databases, or any capability beyond text generation.
Python Example — Weather Lookup:
import anthropic
import json
client = anthropic.Anthropic()
# Define the tools Claude can use
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a given city. Use this when the user asks about current weather conditions.",
"input_schema": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city name, e.g. 'San Francisco' or 'London'"
},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit preference"
}
},
"required": ["city"]
}
}
]
# Your actual function that fetches weather
def get_weather(city: str, units: str = "celsius") -> dict:
# In production, this would call a real weather API
return {
"city": city,
"temperature": 22,
"units": units,
"conditions": "partly cloudy",
"humidity": 65
}
# Step 1: Send the user message with tools
response = client.messages.create(
model="claude-sonnet-4-6-20260401",
max_tokens=1024,
tools=tools,
messages=[
{"role": "user", "content": "What's the weather like in Tokyo right now?"}
]
)
# Step 2: Check if Claude wants to use a tool
if response.stop_reason == "tool_use":
# Find the tool use block
tool_use_block = next(
block for block in response.content
if block.type == "tool_use"
)
# Execute the function
tool_name = tool_use_block.name
tool_input = tool_use_block.input
if tool_name == "get_weather":
result = get_weather(**tool_input)
# Step 3: Send the tool result back to Claude
final_response = client.messages.create(
model="claude-sonnet-4-6-20260401",
max_tokens=1024,
tools=tools,
messages=[
{"role": "user", "content": "What's the weather like in Tokyo right now?"},
{"role": "assistant", "content": response.content},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": tool_use_block.id,
"content": json.dumps(result)
}
]
}
]
)
print(final_response.content[0].text)
Building an Agentic Loop
For real applications, you want a loop that handles multiple tool calls in sequence:
def run_agent(user_message: str, tools: list, system: str = "") -> str:
"""Run an agentic loop that handles multiple tool calls."""
messages = [{"role": "user", "content": user_message}]
while True:
response = client.messages.create(
model="claude-sonnet-4-6-20260401",
max_tokens=4096,
system=system,
tools=tools,
messages=messages
)
# If Claude is done (no more tool calls), return the text
if response.stop_reason == "end_turn":
return next(
(block.text for block in response.content if hasattr(block, "text")),
""
)
# Process all tool calls in this response
messages.append({"role": "assistant", "content": response.content})
tool_results = []
for block in response.content:
if block.type == "tool_use":
# Execute the tool (route to your functions)
result = execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": json.dumps(result)
})
messages.append({"role": "user", "content": tool_results})
def execute_tool(name: str, input_data: dict) -> dict:
"""Route tool calls to actual implementations."""
tool_functions = {
"get_weather": get_weather,
"search_database": search_database,
"send_email": send_email,
}
fn = tool_functions.get(name)
if fn is None:
return {"error": f"Unknown tool: {name}"}
return fn(**input_data)
This pattern is the foundation for building AI agents — applications where Claude can autonomously use external tools to accomplish complex tasks.
Step 9: Error Handling
Production applications need proper error handling. The Anthropic SDK raises specific exceptions for different error types:
import anthropic
client = anthropic.Anthropic()
try:
message = client.messages.create(
model="claude-sonnet-4-6-20260401",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)
print(message.content[0].text)
except anthropic.AuthenticationError:
# Invalid API key
print("Error: Invalid API key. Check your ANTHROPIC_API_KEY environment variable.")
except anthropic.RateLimitError:
# Too many requests — implement backoff
print("Error: Rate limited. Wait and retry with exponential backoff.")
except anthropic.BadRequestError as e:
# Malformed request (invalid messages, bad model ID, etc.)
print(f"Error: Bad request — {e.message}")
except anthropic.APIConnectionError:
# Network issues
print("Error: Cannot connect to the Anthropic API. Check your network.")
except anthropic.APIStatusError as e:
# Other API errors (500s, etc.)
print(f"Error: API returned status {e.status_code} — {e.message}")
Retry with Exponential Backoff
For production systems, implement automatic retries for transient errors:
import time
import anthropic
client = anthropic.Anthropic()
def call_claude_with_retry(messages, max_retries=3, model="claude-sonnet-4-6-20260401"):
"""Call Claude API with exponential backoff on rate limits."""
for attempt in range(max_retries):
try:
return client.messages.create(
model=model,
max_tokens=1024,
messages=messages
)
except anthropic.RateLimitError:
if attempt == max_retries - 1:
raise
wait_time = (2 ** attempt) + 1 # 2s, 5s, 9s
print(f"Rate limited. Retrying in {wait_time}s...")
time.sleep(wait_time)
except anthropic.APIConnectionError:
if attempt == max_retries - 1:
raise
time.sleep(2 ** attempt)
The SDK also has built-in retry support. You can configure it when creating the client:
client = anthropic.Anthropic(
max_retries=3, # Default is 2
timeout=60.0, # Default timeout in seconds
)
Step 10: Cost Optimization Patterns
Prompt Caching
If your system prompt or conversation prefix stays the same across requests, prompt caching can reduce costs dramatically:
# First request — writes to cache
message = client.messages.create(
model="claude-sonnet-4-6-20260401",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a code review assistant. Here is the full project context: [large codebase context here]...",
"cache_control": {"type": "ephemeral"}
}
],
messages=[{"role": "user", "content": "Review this pull request diff: ..."}]
)
# Subsequent requests reuse the cached system prompt at ~10% of normal input cost
Cache entries last for 5 minutes and are refreshed on each use. This is particularly effective for:
- Chat applications where the system prompt is reused across every message
- Code review tools where the codebase context is the same for multiple reviews
- RAG applications where the retrieved context is the same for follow-up questions
Batch API
For non-time-sensitive workloads, the Batch API processes requests within a 24-hour window at a 50% discount:
# Create a batch of requests
batch = client.batches.create(
requests=[
{
"custom_id": "request-1",
"params": {
"model": "claude-sonnet-4-6-20260401",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Summarize this article: ..."}]
}
},
{
"custom_id": "request-2",
"params": {
"model": "claude-sonnet-4-6-20260401",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Translate this to Spanish: ..."}]
}
}
]
)
# Check batch status later
status = client.batches.retrieve(batch.id)
print(status.processing_status) # "in_progress" or "ended"
Use batches for content generation pipelines, data processing, analytics, and any workflow where real-time response is not required.
Choosing the Right Model
Cost optimization starts with model selection. A rough guide:
| Task | Recommended Model | Why |
|---|---|---|
| Simple classification, extraction | Haiku 4.5 ($1/$5) | Fast, cheap, good enough |
| Code generation, debugging, writing | Sonnet 4.6 ($3/$15) | Best quality-to-cost ratio |
| Complex reasoning, architecture | Opus 4.6 ($5/$25) | Maximum intelligence |
| High-volume processing | Haiku 4.5 + Batch API | $0.50/$2.50 effective rate |
Complete Example: A Code Review Bot
Here is a practical, end-to-end example that combines system prompts, multi-turn conversations, and structured output:
import anthropic
import json
from pathlib import Path
client = anthropic.Anthropic()
SYSTEM_PROMPT = """You are a senior code reviewer. Review code diffs for:
1. Bugs and logic errors
2. Security vulnerabilities
3. Performance issues
4. Style and readability
Respond with a JSON object containing:
- "summary": One-sentence overall assessment
- "issues": Array of {"severity": "critical|warning|info", "line": number, "message": string}
- "approved": boolean
Be honest. If the code is fine, say so. Don't invent issues."""
def review_code(diff: str) -> dict:
"""Submit a code diff for AI review."""
message = client.messages.create(
model="claude-sonnet-4-6-20260401",
max_tokens=2048,
system=SYSTEM_PROMPT,
messages=[
{
"role": "user",
"content": f"Review this diff:\n\n```diff\n{diff}\n```"
}
]
)
response_text = message.content[0].text
# Parse the JSON response
# Claude may wrap it in markdown code fences, so strip those
cleaned = response_text.strip()
if cleaned.startswith("```"):
cleaned = cleaned.split("\n", 1)[1].rsplit("```", 1)[0]
review = json.loads(cleaned)
print(f"Summary: {review['summary']}")
print(f"Approved: {'Yes' if review['approved'] else 'No'}")
print(f"Issues found: {len(review['issues'])}")
for issue in review["issues"]:
icon = {"critical": "X", "warning": "!", "info": "i"}[issue["severity"]]
print(f" [{icon}] Line {issue['line']}: {issue['message']}")
print(f"Tokens used: {message.usage.input_tokens} in, {message.usage.output_tokens} out")
return review
# Usage
diff = """
- def process_payment(amount, user_id):
- db.execute(f"UPDATE users SET balance = balance - {amount} WHERE id = {user_id}")
+ def process_payment(amount: float, user_id: int) -> bool:
+ if amount <= 0:
+ raise ValueError("Amount must be positive")
+ db.execute("UPDATE users SET balance = balance - %s WHERE id = %s", (amount, user_id))
+ return True
"""
review_code(diff)
Accessing Claude Through Other Providers
The Claude API is also available through cloud providers, which can simplify billing if you are already using these platforms:
- AWS Bedrock — Access Claude models through your existing AWS account. Pricing may differ from direct Anthropic pricing.
- Google Vertex AI — Available in Google Cloud with Vertex AI integration.
- Microsoft Foundry — Access via Azure.
The SDKs support these providers with minimal code changes. Check the official documentation for provider-specific setup.
What to Build Next
Now that you have the fundamentals, here are practical project ideas in order of increasing complexity:
- CLI assistant — A terminal tool that answers coding questions with project context. (Uses: messages, system prompts)
- Code review bot — Automated PR review that runs on every push. (Uses: messages, structured output)
- Documentation generator — Reads your codebase and generates/updates docs. (Uses: vision for diagrams, large context)
- Support chatbot — Customer-facing chat with access to your knowledge base. (Uses: streaming, multi-turn, tool use for search)
- AI agent — An autonomous tool that can browse the web, query databases, and call APIs to complete complex tasks. (Uses: tool use, agentic loop)
Quick Reference
Official Resources:
- Anthropic API Documentation — Complete reference
- Python SDK on PyPI —
pip install anthropic - TypeScript SDK on npm —
npm install @anthropic-ai/sdk - Anthropic Console — API keys, usage, billing
- Anthropic Cookbook — Example notebooks and patterns
Current Model IDs (April 2026):
claude-opus-4-6-20260401claude-sonnet-4-6-20260401claude-haiku-4-5-20250620
Rate Limits: Rate limits depend on your usage tier, which increases automatically as you spend more. New accounts start at Tier 1. Check your current limits in the Anthropic Console.
Further Reading:
- Claude API Pricing — Official pricing page
- Tool Use Documentation — Complete tool use reference
- Vision Documentation — Image analysis capabilities
- Python SDK Reference — Full Python SDK docs
- CloudInsight: Claude API Integration Tutorial 2026 — Additional tutorial
- Claude AI Complete Guide 2026 — Model overview
Related TCAL articles:
- ChatGPT vs Claude vs Gemini — How Claude compares to the competition
- Best AI Tools for Debugging Code — Use Claude for debugging workflows
- Reduce AI Hallucinations in Code Generation — Write better prompts for more reliable output