Function Calling vs Tool Use vs Structured Output

Ravinder·January 29, 2025·8 min read

AILLMTool UseFunction CallingStructured Output

Function Calling vs Tool Use vs Structured Output

OpenAI calls it function calling. Anthropic calls it tool use. Google calls it function calling too, but the schema is different. The marketing makes it sound like three innovations. It's one idea with different APIs, different reliability characteristics, and one genuinely meaningful difference you probably don't know about.

Here's what actually matters.

The Core Mechanism (Shared Across All Vendors)

Every LLM tool use implementation does the same thing:

You describe available tools in the API request as structured schemas.
The model decides which tool to call and generates a structured argument object.
You execute the tool with those arguments.
You pass the result back to the model.
The model continues.

sequenceDiagram participant App as Application participant LLM as LLM participant Tool as Tool/API App->>LLM: user message + tool definitions LLM-->>App: tool_call(name="search", args={query: "..."}) App->>Tool: Execute search(query="...") Tool-->>App: results App->>LLM: tool_result(results) LLM-->>App: Final text response

The LLM never executes anything. It generates a structured request. Your code executes it. This is critical for security — the LLM cannot directly call any function; it can only produce text that describes a call.

Schema Differences That Actually Matter

OpenAI Function Calling

from openai import OpenAI
 
client = OpenAI()
 
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City and country, e.g. 'London, UK'",
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "default": "celsius",
                    },
                },
                "required": ["location"],
            },
        },
    }
]
 
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto",  # "none" | "auto" | {"type": "function", "function": {"name": "..."}}
)
 
# Parse the tool call
if response.choices[0].finish_reason == "tool_calls":
    tool_call = response.choices[0].message.tool_calls[0]
    print(tool_call.function.name)       # "get_weather"
    print(tool_call.function.arguments)  # '{"location": "Tokyo, Japan"}'

Anthropic Tool Use

import anthropic
import json
 
client = anthropic.Anthropic()
 
tools = [
    {
        "name": "get_weather",
        "description": "Get current weather for a location.",
        "input_schema": {                        # NOTE: "input_schema", not "parameters"
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City and country",
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                },
            },
            "required": ["location"],
        },
    }
]
 
response = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
)
 
# Parse the tool use block
for block in response.content:
    if block.type == "tool_use":
        print(block.name)    # "get_weather"
        print(block.input)   # {"location": "Tokyo, Japan"}

Key schema differences:

Field	OpenAI	Anthropic	Google
Schema key	`parameters`	`input_schema`	`parameters`
Tool type field	`{"type": "function", "function": {...}}`	flat object	flat object
Tool choice	`"auto"` / `"none"` / specific	`"auto"` / `"any"` / specific	`"AUTO"` / `"NONE"` / `"ANY"`
Result format	`tool_results` in messages	`tool_result` content block	`FunctionResponse` part

None of these differences are fundamental. They're just API surface area. Abstract them.

# Thin abstraction over vendor differences
from dataclasses import dataclass
from typing import Any
 
@dataclass
class ToolCall:
    tool_name: str
    arguments: dict[str, Any]
    call_id: str
 
def parse_tool_calls_openai(response) -> list[ToolCall]:
    calls = []
    if response.choices[0].finish_reason == "tool_calls":
        for tc in response.choices[0].message.tool_calls:
            calls.append(ToolCall(
                tool_name=tc.function.name,
                arguments=json.loads(tc.function.arguments),
                call_id=tc.id,
            ))
    return calls
 
def parse_tool_calls_anthropic(response) -> list[ToolCall]:
    calls = []
    for block in response.content:
        if block.type == "tool_use":
            calls.append(ToolCall(
                tool_name=block.name,
                arguments=block.input,
                call_id=block.id,
            ))
    return calls

Parallel Tool Calls — The Meaningful Difference

This is the one that actually matters for latency-sensitive applications.

OpenAI's GPT-4o and Claude 3.5+ both support parallel tool calling: the model can decide to call multiple tools simultaneously in a single turn.

# OpenAI — multiple tool calls in one response
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": "Get the weather in Tokyo and London, and the EUR/USD rate."
    }],
    tools=[weather_tool, forex_tool],
    tool_choice="auto",
    parallel_tool_calls=True,  # default True in newer models
)
 
# Response may contain multiple tool_calls
for tc in response.choices[0].message.tool_calls:
    print(tc.function.name, tc.function.arguments)
# Output:
# get_weather {"location": "Tokyo, Japan"}
# get_weather {"location": "London, UK"}
# get_forex_rate {"base": "EUR", "quote": "USD"}

Execute them concurrently:

import asyncio
 
async def execute_parallel_tool_calls(tool_calls: list[ToolCall], tool_registry: dict) -> list[dict]:
    async def run_one(tc: ToolCall):
        fn = tool_registry[tc.tool_name]
        result = await fn(**tc.arguments)
        return {"call_id": tc.call_id, "result": result}
 
    return await asyncio.gather(*[run_one(tc) for tc in tool_calls])

Without parallel tool calls, a 3-tool chain with 200ms per tool is 600ms. With parallel execution, it's 200ms. This is not a marginal improvement.

When to disable parallel calls: If your tools have side effects that must be ordered (write then read), disable parallel tool calls. parallel_tool_calls=False forces sequential generation.

Structured Output vs Tool Use — When to Use Which

These are often confused because both produce structured JSON. They serve different purposes.

Structured output (OpenAI's response_format, Anthropic's JSON mode) extracts structured data from unstructured input. The model reads something and produces a schema-conformant JSON response. No side effects. No execution loop.

from pydantic import BaseModel
 
class Invoice(BaseModel):
    vendor: str
    total_amount: float
    currency: str
    due_date: str
    line_items: list[dict]
 
# OpenAI structured output
response = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Extract invoice fields from the text."},
        {"role": "user", "content": invoice_text},
    ],
    response_format=Invoice,
)
invoice = response.choices[0].message.parsed  # typed Invoice object

Tool use models a decision loop. The model decides to take an action, you execute it, the model responds to the result. It's for agentic behavior with side effects.

Decision guide:

Use case	Mechanism
Extract fields from a document	Structured output
Query a database based on user intent	Tool use
Parse an API response into typed objects	Structured output
Browse the web and answer a question	Tool use
Validate and transform user input	Structured output
Send an email, post a message	Tool use

Error Handling That Doesn't Break the Loop

Models generate malformed arguments. Tools fail. The execution loop must handle both without breaking.

async def safe_tool_dispatch(
    tool_call: ToolCall,
    tool_registry: dict,
) -> dict:
    if tool_call.tool_name not in tool_registry:
        return {
            "call_id": tool_call.call_id,
            "error": f"Unknown tool: {tool_call.tool_name}",
            "result": None,
        }
 
    tool_fn = tool_registry[tool_call.tool_name]
 
    try:
        result = await tool_fn(**tool_call.arguments)
        return {"call_id": tool_call.call_id, "result": result, "error": None}
    except TypeError as e:
        # Model passed wrong argument types or missing required args
        return {
            "call_id": tool_call.call_id,
            "error": f"Invalid arguments: {e}",
            "result": None,
        }
    except Exception as e:
        return {
            "call_id": tool_call.call_id,
            "error": f"Tool execution failed: {e}",
            "result": None,
        }

Pass errors back to the model as tool results — don't raise exceptions that abort the loop. The model can recover from a failed tool call if it sees the error message.

# Anthropic — pass error back as tool_result
messages.append({
    "role": "user",
    "content": [
        {
            "type": "tool_result",
            "tool_use_id": tool_call.call_id,
            "content": json.dumps(dispatch_result),
            "is_error": dispatch_result["error"] is not None,
        }
    ],
})

Loop Termination and Safety

Agentic tool loops need explicit termination conditions. Without them:

The model can call tools indefinitely
Runaway loops accumulate cost
A bug in a tool's output can cause the model to re-call it forever

async def agent_loop(
    user_message: str,
    tools: list[dict],
    tool_registry: dict,
    max_iterations: int = 10,
) -> str:
    messages = [{"role": "user", "content": user_message}]
    iteration = 0
 
    while iteration < max_iterations:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools,
            tool_choice="auto",
        )
 
        choice = response.choices[0]
 
        if choice.finish_reason == "stop":
            # Model is done, return final answer
            return choice.message.content
 
        if choice.finish_reason == "tool_calls":
            messages.append(choice.message)
            tool_calls = parse_tool_calls_openai(response)
            results = await execute_parallel_tool_calls(tool_calls, tool_registry)
 
            messages.append({
                "role": "tool",
                "content": json.dumps(results),
            })
            iteration += 1
            continue
 
        # Unexpected finish reason
        break
 
    return "Max iterations reached without a final answer."

max_iterations=10 is a hard ceiling. Log every loop that hits it — it means either your tools are failing repeatedly or the model is stuck.

Vendor-Specific Gotchas

OpenAI tool_choice="required": Forces the model to call at least one tool. Useful when you need structured extraction but dangerous in agentic loops — the model will call a tool even when it shouldn't.

Anthropic tool_choice={"type": "any"}: Same semantics as OpenAI's required. When you need the model to always use a tool, this is the switch.

Google Gemini: Function declarations use parameters like OpenAI, but the response parsing is different. Gemini returns function_call parts in the content, not a separate tool_calls array.

JSON argument reliability: Smaller models (GPT-4o-mini, Haiku) occasionally generate syntactically invalid JSON for complex nested schemas. Add a JSON parse validation layer before dispatching.

import json
 
def validate_tool_arguments(arguments_str: str, expected_schema: dict) -> dict:
    try:
        args = json.loads(arguments_str)
    except json.JSONDecodeError as e:
        raise ValueError(f"Model generated invalid JSON: {e}\nRaw: {arguments_str}")
 
    # Optional: validate against schema using jsonschema
    # from jsonschema import validate
    # validate(instance=args, schema=expected_schema)
 
    return args

Key Takeaways

"Function calling," "tool use," and "function calling" (Google) are the same pattern — the schemas differ but the mechanism is identical.
Parallel tool calls are the meaningful performance lever — execute multiple tool calls concurrently and cut multi-tool latency dramatically.
Use structured output for extraction (no side effects, no loop); use tool use for agentic action (execution loop with side effects).
Always pass tool errors back to the model as results — don't let them abort the loop; the model can recover.
Set a hard max_iterations ceiling on every agentic loop — there is no built-in termination in any vendor's API.
Smaller models generate invalid tool argument JSON more often — validate before dispatch, especially in production.