How to Use the Claude API: A Complete Beginner Tutorial

📅
Disclosure: This article may contain affiliate links. We only recommend products we believe in.

The Claude API gives you programmatic access to Anthropic’s Claude models — the same AI that powers Cursor, Claude Code, and a growing list of developer tools. This tutorial covers everything from your first API call to advanced patterns like streaming, tool use, and vision.

We will use the official Anthropic Python and TypeScript SDKs with the latest models as of April 2026.

Current Claude Models and Pricing

Before writing code, know what you are working with:

ModelModel IDInput (per 1M tokens)Output (per 1M tokens)Best For
Claude Opus 4.6claude-opus-4-6-20260401$5.00$25.00Complex reasoning, analysis, coding
Claude Sonnet 4.6claude-sonnet-4-6-20260401$3.00$15.00Balanced speed and quality (recommended default)
Claude Haiku 4.5claude-haiku-4-5-20250620$1.00$5.00Fast, simple tasks, high volume

Key pricing notes:

  • The full 1M token context window is included at standard pricing — no long-context surcharges.
  • The Batch API gives you a flat 50% discount on all tokens by processing requests asynchronously within 24 hours.
  • Prompt caching stores repeated context and charges cache reads at roughly 10% of the standard input rate.

For most development work, Claude Sonnet 4.6 is the right default. It offers strong reasoning and code quality at a moderate price. Use Haiku for high-volume or simple tasks, and Opus when you need maximum intelligence for complex problems.

Prerequisites

Step 1: Get Your API Key

  1. Go to console.anthropic.com
  2. Create an account or sign in
  3. Navigate to Settings > API Keys
  4. Click Create Key and copy it immediately — you will not be able to see it again

Store your API key as an environment variable. Never hardcode it in your source files and never commit it to version control.

macOS/Linux:

export ANTHROPIC_API_KEY="sk-ant-your-key-here"

Add it to your shell profile (~/.bashrc, ~/.zshrc, or equivalent) so it persists across sessions:

echo 'export ANTHROPIC_API_KEY="sk-ant-your-key-here"' >> ~/.zshrc
source ~/.zshrc

Windows (PowerShell):

$env:ANTHROPIC_API_KEY = "sk-ant-your-key-here"

Alternatively, use a .env file with a library like python-dotenv or dotenv for Node.js. Just make sure .env is in your .gitignore.

Step 2: Install the SDK

Python:

pip install anthropic

TypeScript/JavaScript:

npm install @anthropic-ai/sdk

The SDK automatically reads the ANTHROPIC_API_KEY environment variable, so you do not need to pass it explicitly in your code.

Step 3: Your First API Call

Python:

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-6-20260401",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Explain what a REST API is in two sentences."
        }
    ]
)

print(message.content[0].text)

TypeScript:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const message = await client.messages.create({
  model: "claude-sonnet-4-6-20260401",
  max_tokens: 1024,
  messages: [
    {
      role: "user",
      content: "Explain what a REST API is in two sentences.",
    },
  ],
});

console.log(message.content[0].text);

Run it. You should see Claude’s response printed to your terminal.

Understanding the Response

The message object contains more than just text. Here is the full structure:

print(message.model)          # "claude-sonnet-4-6-20260401"
print(message.role)           # "assistant"
print(message.stop_reason)    # "end_turn"
print(message.usage.input_tokens)   # Number of input tokens used
print(message.usage.output_tokens)  # Number of output tokens used

The usage field is critical for monitoring costs. Track input_tokens and output_tokens to calculate your spending.

Step 4: System Prompts

System prompts define Claude’s behavior, personality, and constraints. They are passed as a separate system parameter, not as a message in the messages array.

message = client.messages.create(
    model="claude-sonnet-4-6-20260401",
    max_tokens=1024,
    system="You are a senior Python developer. Give concise, practical answers with code examples. Always use type hints. Never explain what the user already knows.",
    messages=[
        {
            "role": "user",
            "content": "How do I read a CSV file and filter rows where the 'age' column is greater than 30?"
        }
    ]
)

Tips for effective system prompts:

  • Be specific about the format you want (bullet points, code only, brief vs. detailed)
  • Tell Claude what to skip (“don’t explain basic concepts”, “no preamble”)
  • Define the persona precisely (“senior backend engineer” is better than “helpful assistant”)
  • Include constraints (“respond in under 200 words”, “only suggest solutions using the standard library”)

System prompts count toward your input tokens, so keep them focused.

Step 5: Multi-Turn Conversations

Claude is stateless — each API call is independent. To maintain conversation context, you pass the full conversation history in the messages array:

conversation = [
    {
        "role": "user",
        "content": "What's the best Python web framework for a REST API?"
    },
    {
        "role": "assistant",
        "content": "For a REST API, I'd recommend FastAPI. It's async-native, generates OpenAPI docs automatically, and has built-in request validation via Pydantic."
    },
    {
        "role": "user",
        "content": "Show me a basic FastAPI setup with one GET and one POST endpoint."
    }
]

message = client.messages.create(
    model="claude-sonnet-4-6-20260401",
    max_tokens=2048,
    messages=conversation
)

print(message.content[0].text)

Important: Every message in the history counts toward your input tokens. For long conversations, this adds up. Strategies to manage this:

  • Summarize older messages periodically
  • Only include messages relevant to the current question
  • Use prompt caching (covered below) to reduce costs on repeated context

Messages must alternate between user and assistant roles. The first message must always be from user.

Step 6: Streaming Responses

Streaming returns the response token by token as it is generated, rather than waiting for the complete response. This is essential for any user-facing application where perceived latency matters.

Python:

with client.messages.stream(
    model="claude-sonnet-4-6-20260401",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Write a Python function to validate email addresses using regex, with tests."}
    ]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

TypeScript:

const stream = client.messages.stream({
  model: "claude-sonnet-4-6-20260401",
  max_tokens: 1024,
  messages: [
    {
      role: "user",
      content:
        "Write a Python function to validate email addresses using regex, with tests.",
    },
  ],
});

for await (const event of stream) {
  if (
    event.type === "content_block_delta" &&
    event.delta.type === "text_delta"
  ) {
    process.stdout.write(event.delta.text);
  }
}

Handling Stream Events

The stream emits several event types. The most useful ones:

with client.messages.stream(
    model="claude-sonnet-4-6-20260401",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
) as stream:
    for event in stream:
        if event.type == "text":
            print(event.text, end="")
        elif event.type == "message_stop":
            print("\n--- Stream complete ---")

    # After the stream completes, get the final message
    final_message = stream.get_final_message()
    print(f"Tokens used: {final_message.usage.input_tokens} in, {final_message.usage.output_tokens} out")

Step 7: Vision (Image Analysis)

Claude can analyze images — screenshots, diagrams, charts, documents, UI mockups, and photos. You can provide images as base64-encoded data or via URL.

Base64 Method (Python):

import anthropic
import base64
from pathlib import Path

client = anthropic.Anthropic()

# Read and encode the image
image_data = base64.standard_b64encode(
    Path("screenshot.png").read_bytes()
).decode("utf-8")

message = client.messages.create(
    model="claude-sonnet-4-6-20260401",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data,
                    },
                },
                {
                    "type": "text",
                    "text": "What errors do you see in this screenshot? List them with line numbers."
                }
            ],
        }
    ],
)

print(message.content[0].text)

URL Method (Python):

message = client.messages.create(
    model="claude-sonnet-4-6-20260401",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "url",
                        "url": "https://example.com/architecture-diagram.png",
                    },
                },
                {
                    "type": "text",
                    "text": "Explain this architecture diagram. What are the main components and how do they communicate?"
                }
            ],
        }
    ],
)

Supported formats: JPEG, PNG, GIF, and WebP. Images can be up to 8000x8000 pixels, with optimal performance at 1568 pixels or less on the longest edge. You can include up to 100 images per API request (600 for models with a 1M context window).

Practical Vision Use Cases for Developers

  • Bug reports: Paste a screenshot of an error and ask Claude to diagnose it
  • UI review: Send a mockup or screenshot and ask for accessibility or design feedback
  • Diagram analysis: Upload architecture diagrams, flowcharts, or ERDs and ask Claude to explain or critique them
  • Document parsing: Send photos of whiteboards, handwritten notes, or printed documents for extraction
  • Code review: Screenshot code from an unfamiliar editor or tool and ask Claude to analyze it

Step 8: Tool Use (Function Calling)

Tool use lets Claude call functions that you define. Claude decides when a tool is needed based on the user’s request, returns a structured tool call, and your code executes it. This is how you give Claude access to real-time data, external APIs, databases, or any capability beyond text generation.

Python Example — Weather Lookup:

import anthropic
import json

client = anthropic.Anthropic()

# Define the tools Claude can use
tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a given city. Use this when the user asks about current weather conditions.",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "The city name, e.g. 'San Francisco' or 'London'"
                },
                "units": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit preference"
                }
            },
            "required": ["city"]
        }
    }
]

# Your actual function that fetches weather
def get_weather(city: str, units: str = "celsius") -> dict:
    # In production, this would call a real weather API
    return {
        "city": city,
        "temperature": 22,
        "units": units,
        "conditions": "partly cloudy",
        "humidity": 65
    }

# Step 1: Send the user message with tools
response = client.messages.create(
    model="claude-sonnet-4-6-20260401",
    max_tokens=1024,
    tools=tools,
    messages=[
        {"role": "user", "content": "What's the weather like in Tokyo right now?"}
    ]
)

# Step 2: Check if Claude wants to use a tool
if response.stop_reason == "tool_use":
    # Find the tool use block
    tool_use_block = next(
        block for block in response.content
        if block.type == "tool_use"
    )

    # Execute the function
    tool_name = tool_use_block.name
    tool_input = tool_use_block.input

    if tool_name == "get_weather":
        result = get_weather(**tool_input)

    # Step 3: Send the tool result back to Claude
    final_response = client.messages.create(
        model="claude-sonnet-4-6-20260401",
        max_tokens=1024,
        tools=tools,
        messages=[
            {"role": "user", "content": "What's the weather like in Tokyo right now?"},
            {"role": "assistant", "content": response.content},
            {
                "role": "user",
                "content": [
                    {
                        "type": "tool_result",
                        "tool_use_id": tool_use_block.id,
                        "content": json.dumps(result)
                    }
                ]
            }
        ]
    )

    print(final_response.content[0].text)

Building an Agentic Loop

For real applications, you want a loop that handles multiple tool calls in sequence:

def run_agent(user_message: str, tools: list, system: str = "") -> str:
    """Run an agentic loop that handles multiple tool calls."""
    messages = [{"role": "user", "content": user_message}]

    while True:
        response = client.messages.create(
            model="claude-sonnet-4-6-20260401",
            max_tokens=4096,
            system=system,
            tools=tools,
            messages=messages
        )

        # If Claude is done (no more tool calls), return the text
        if response.stop_reason == "end_turn":
            return next(
                (block.text for block in response.content if hasattr(block, "text")),
                ""
            )

        # Process all tool calls in this response
        messages.append({"role": "assistant", "content": response.content})

        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                # Execute the tool (route to your functions)
                result = execute_tool(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": json.dumps(result)
                })

        messages.append({"role": "user", "content": tool_results})


def execute_tool(name: str, input_data: dict) -> dict:
    """Route tool calls to actual implementations."""
    tool_functions = {
        "get_weather": get_weather,
        "search_database": search_database,
        "send_email": send_email,
    }
    fn = tool_functions.get(name)
    if fn is None:
        return {"error": f"Unknown tool: {name}"}
    return fn(**input_data)

This pattern is the foundation for building AI agents — applications where Claude can autonomously use external tools to accomplish complex tasks.

Step 9: Error Handling

Production applications need proper error handling. The Anthropic SDK raises specific exceptions for different error types:

import anthropic

client = anthropic.Anthropic()

try:
    message = client.messages.create(
        model="claude-sonnet-4-6-20260401",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello"}]
    )
    print(message.content[0].text)

except anthropic.AuthenticationError:
    # Invalid API key
    print("Error: Invalid API key. Check your ANTHROPIC_API_KEY environment variable.")

except anthropic.RateLimitError:
    # Too many requests — implement backoff
    print("Error: Rate limited. Wait and retry with exponential backoff.")

except anthropic.BadRequestError as e:
    # Malformed request (invalid messages, bad model ID, etc.)
    print(f"Error: Bad request — {e.message}")

except anthropic.APIConnectionError:
    # Network issues
    print("Error: Cannot connect to the Anthropic API. Check your network.")

except anthropic.APIStatusError as e:
    # Other API errors (500s, etc.)
    print(f"Error: API returned status {e.status_code}{e.message}")

Retry with Exponential Backoff

For production systems, implement automatic retries for transient errors:

import time
import anthropic

client = anthropic.Anthropic()

def call_claude_with_retry(messages, max_retries=3, model="claude-sonnet-4-6-20260401"):
    """Call Claude API with exponential backoff on rate limits."""
    for attempt in range(max_retries):
        try:
            return client.messages.create(
                model=model,
                max_tokens=1024,
                messages=messages
            )
        except anthropic.RateLimitError:
            if attempt == max_retries - 1:
                raise
            wait_time = (2 ** attempt) + 1  # 2s, 5s, 9s
            print(f"Rate limited. Retrying in {wait_time}s...")
            time.sleep(wait_time)
        except anthropic.APIConnectionError:
            if attempt == max_retries - 1:
                raise
            time.sleep(2 ** attempt)

The SDK also has built-in retry support. You can configure it when creating the client:

client = anthropic.Anthropic(
    max_retries=3,  # Default is 2
    timeout=60.0,   # Default timeout in seconds
)

Step 10: Cost Optimization Patterns

Prompt Caching

If your system prompt or conversation prefix stays the same across requests, prompt caching can reduce costs dramatically:

# First request — writes to cache
message = client.messages.create(
    model="claude-sonnet-4-6-20260401",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a code review assistant. Here is the full project context: [large codebase context here]...",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[{"role": "user", "content": "Review this pull request diff: ..."}]
)

# Subsequent requests reuse the cached system prompt at ~10% of normal input cost

Cache entries last for 5 minutes and are refreshed on each use. This is particularly effective for:

  • Chat applications where the system prompt is reused across every message
  • Code review tools where the codebase context is the same for multiple reviews
  • RAG applications where the retrieved context is the same for follow-up questions

Batch API

For non-time-sensitive workloads, the Batch API processes requests within a 24-hour window at a 50% discount:

# Create a batch of requests
batch = client.batches.create(
    requests=[
        {
            "custom_id": "request-1",
            "params": {
                "model": "claude-sonnet-4-6-20260401",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": "Summarize this article: ..."}]
            }
        },
        {
            "custom_id": "request-2",
            "params": {
                "model": "claude-sonnet-4-6-20260401",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": "Translate this to Spanish: ..."}]
            }
        }
    ]
)

# Check batch status later
status = client.batches.retrieve(batch.id)
print(status.processing_status)  # "in_progress" or "ended"

Use batches for content generation pipelines, data processing, analytics, and any workflow where real-time response is not required.

Choosing the Right Model

Cost optimization starts with model selection. A rough guide:

TaskRecommended ModelWhy
Simple classification, extractionHaiku 4.5 ($1/$5)Fast, cheap, good enough
Code generation, debugging, writingSonnet 4.6 ($3/$15)Best quality-to-cost ratio
Complex reasoning, architectureOpus 4.6 ($5/$25)Maximum intelligence
High-volume processingHaiku 4.5 + Batch API$0.50/$2.50 effective rate

Complete Example: A Code Review Bot

Here is a practical, end-to-end example that combines system prompts, multi-turn conversations, and structured output:

import anthropic
import json
from pathlib import Path

client = anthropic.Anthropic()

SYSTEM_PROMPT = """You are a senior code reviewer. Review code diffs for:
1. Bugs and logic errors
2. Security vulnerabilities
3. Performance issues
4. Style and readability

Respond with a JSON object containing:
- "summary": One-sentence overall assessment
- "issues": Array of {"severity": "critical|warning|info", "line": number, "message": string}
- "approved": boolean

Be honest. If the code is fine, say so. Don't invent issues."""

def review_code(diff: str) -> dict:
    """Submit a code diff for AI review."""
    message = client.messages.create(
        model="claude-sonnet-4-6-20260401",
        max_tokens=2048,
        system=SYSTEM_PROMPT,
        messages=[
            {
                "role": "user",
                "content": f"Review this diff:\n\n```diff\n{diff}\n```"
            }
        ]
    )

    response_text = message.content[0].text

    # Parse the JSON response
    # Claude may wrap it in markdown code fences, so strip those
    cleaned = response_text.strip()
    if cleaned.startswith("```"):
        cleaned = cleaned.split("\n", 1)[1].rsplit("```", 1)[0]

    review = json.loads(cleaned)

    print(f"Summary: {review['summary']}")
    print(f"Approved: {'Yes' if review['approved'] else 'No'}")
    print(f"Issues found: {len(review['issues'])}")
    for issue in review["issues"]:
        icon = {"critical": "X", "warning": "!", "info": "i"}[issue["severity"]]
        print(f"  [{icon}] Line {issue['line']}: {issue['message']}")
    print(f"Tokens used: {message.usage.input_tokens} in, {message.usage.output_tokens} out")

    return review


# Usage
diff = """
- def process_payment(amount, user_id):
-     db.execute(f"UPDATE users SET balance = balance - {amount} WHERE id = {user_id}")
+ def process_payment(amount: float, user_id: int) -> bool:
+     if amount <= 0:
+         raise ValueError("Amount must be positive")
+     db.execute("UPDATE users SET balance = balance - %s WHERE id = %s", (amount, user_id))
+     return True
"""

review_code(diff)

Accessing Claude Through Other Providers

The Claude API is also available through cloud providers, which can simplify billing if you are already using these platforms:

  • AWS Bedrock — Access Claude models through your existing AWS account. Pricing may differ from direct Anthropic pricing.
  • Google Vertex AI — Available in Google Cloud with Vertex AI integration.
  • Microsoft Foundry — Access via Azure.

The SDKs support these providers with minimal code changes. Check the official documentation for provider-specific setup.

What to Build Next

Now that you have the fundamentals, here are practical project ideas in order of increasing complexity:

  1. CLI assistant — A terminal tool that answers coding questions with project context. (Uses: messages, system prompts)
  2. Code review bot — Automated PR review that runs on every push. (Uses: messages, structured output)
  3. Documentation generator — Reads your codebase and generates/updates docs. (Uses: vision for diagrams, large context)
  4. Support chatbot — Customer-facing chat with access to your knowledge base. (Uses: streaming, multi-turn, tool use for search)
  5. AI agent — An autonomous tool that can browse the web, query databases, and call APIs to complete complex tasks. (Uses: tool use, agentic loop)

Quick Reference

Official Resources:

Current Model IDs (April 2026):

  • claude-opus-4-6-20260401
  • claude-sonnet-4-6-20260401
  • claude-haiku-4-5-20250620

Rate Limits: Rate limits depend on your usage tier, which increases automatically as you spend more. New accounts start at Tier 1. Check your current limits in the Anthropic Console.


Further Reading:

Related TCAL articles: