Genkit Python vs Pydantic AI: The Real Comparison

Pydantic AI has a good pitch. “We are to AI agents what FastAPI was to web APIs: ergonomic, type-safe, Python-native.” That’s not marketing copy. They built on Pydantic, which is already installed in most Python projects, and their v1.0 API is genuinely clean. The pitch is earned.

Genkit makes almost the same claim. When two frameworks reach for the same label, the label has stopped doing work. So I want to do something more useful: build the same agent in both, count what each one requires a developer to learn, and be direct about where each is actually stronger. I work on Genkit Python at Google, so I have a perspective here, but I’ll try to give Pydantic AI honest credit.

What each framework means by “FastAPI for AI”

FastAPI earned its reputation for two distinct reasons. First, it does automatic type-safe validation of request and response bodies using Pydantic. Second, it has a minimal, decorator-based routing API that most developers can learn in an afternoon.

When Pydantic AI says “FastAPI for AI,” they’re reaching for the first thing. Their core value proposition is that your agent’s output is a validated Pydantic model: result.output is guaranteed to match the type you declared in output_type. They also bring FastAPI’s dependency injection pattern (the Depends() system) into agent development, via deps_type and RunContext[Deps].

When Genkit says “FastAPI for AI,” they’re reaching for the second thing: the minimal decorator-based surface. The framework has one primary primitive, the generate() call, and everything composes from there. Tools are functions. Flows are typed async functions decorated with @ai.flow(). Middleware stacks on a use=[] parameter.

These are different claims wearing the same label, and understanding the difference matters for choosing between them.

The same agent, both ways

Here is a simple weather agent with tool use and structured output. Pydantic AI v1.0:

from dataclasses import dataclass
from pydantic import BaseModel
from pydantic_ai import Agent, RunContext

class WeatherResult(BaseModel):
    city: str
    temperature: str
    condition: str

@dataclass
class WeatherDeps:
    api_key: str

weather_agent = Agent(
    'google-gla:gemini-2.0-flash',
    deps_type=WeatherDeps,
    output_type=WeatherResult,
    system_prompt='You are a weather assistant.',
)

@weather_agent.tool
async def get_weather(ctx: RunContext[WeatherDeps], city: str) -> str:
    """Get current weather for a city."""
    return f"Sunny and 72°F in {city}"

deps = WeatherDeps(api_key="...")
result = weather_agent.run_sync("What's the weather in Chicago?", deps=deps)
print(result.output.temperature)

The same agent in Genkit Python 0.7.0:

from pydantic import BaseModel
from genkit import Genkit
from genkit.plugins.google_genai import GoogleAI

ai = Genkit(plugins=[GoogleAI()], model='googleai/gemini-2.0-flash')

class WeatherInput(BaseModel):
    city: str

class WeatherResult(BaseModel):
    city: str
    temperature: str
    condition: str

@ai.tool()
async def get_weather(input: WeatherInput) -> str:
    """Get current weather for a city."""
    return f"Sunny and 72°F in {input.city}"

response = await ai.generate(
    prompt="What's the weather in Chicago?",
    tools=[get_weather],
    output_schema=WeatherResult,
    output_format='json',
)
print(response.output.temperature)

A few things worth noting in these side by side. The Pydantic AI version introduces six new concepts: Agent, RunContext[WeatherDeps], the WeatherDeps dataclass, deps_type, output_type, and run_sync(). The Genkit version introduces three: Genkit, generate(), and @ai.tool().

Look at what WeatherDeps is doing. It is the mechanism for threading external state (an API key, a database connection, an HTTP client) through the tool function. The tool receives ctx: RunContext[WeatherDeps] and accesses ctx.deps.api_key. This is FastAPI’s dependency injection pattern applied to agents: explicit, generic, type-checked. Pydantic AI commits to this pattern throughout the framework. Every tool that needs external state receives it through RunContext.

In Genkit, there is no parallel construct. If get_weather needs an API key, it closes over it from the outer scope or reads from an environment variable, the same way any other async function would handle external state. The framework does not add a dependency injection layer. For simple agents, this keeps the concept count low. For complex agents with many tools sharing many dependencies, the absence of explicit DI means more implicit reliance on module-level state.

Observability

This is where the cost difference is largest, and it matters from the first day of development.

Pydantic AI’s observability story is Logfire, a commercial platform built by the same team. To get traces from a Pydantic AI agent, you install logfire, authenticate your local environment, and create a project:

pip install pydantic-ai
logfire auth
logfire projects new

Then in code:

import logfire
from pydantic_ai import Agent

logfire.configure()
logfire.instrument_pydantic_ai()

agent = Agent('google-gla:gemini-2.0-flash', ...)
result = agent.run_sync("What's the weather in Chicago?", deps=deps)

The logfire.configure() call finds the write token from the .logfire directory that logfire auth created. In production, you set a LOGFIRE_TOKEN environment variable and configure a project. The free tier has limited retention; production team usage costs real money.

Genkit’s observability is built into the framework. The Dev UI starts automatically when you set GENKIT_ENV=dev. To see full traces, token counts, request and response bodies, latency spans, and tool call results for a running agent:

genkit start -- python app.py

Open http://localhost:4000. No account, no token, no SaaS. Full traces for the session.

I want to be fair about what this comparison is measuring. Logfire is a production observability platform that aggregates data across a fleet, stores historical traces, and supports alerting. The Genkit Dev UI is a development tool. If you need production observability on Genkit, you will reach for the OpenTelemetry exporter plugin. The comparison point is what you get at zero cost and zero setup during development, and that gap is large.

The abstraction surface

Making this concrete matters, because framework comparisons often stay vague about the actual learning surface.

For a production Pydantic AI agent, a developer needs to understand: the Agent class and its constructor parameters, RunContext[T] and how to parameterize the generic, the dependency injection pattern, the distinction between @agent.tool and @agent.tool_plain, output_type and the separate streaming result types, ModelRouter for model fallbacks, Pydantic Evals for testing agents, Logfire for observability, and the four run modes (run(), run_sync(), run_stream(), run_stream_events()).

For the same Genkit agent: the Genkit class, generate() and generate_stream(), @ai.flow(), @ai.tool(), and the middleware plugins (Retry, Fallback, ToolApproval) available via the use=[] parameter. Flows replace a separate class hierarchy for wrapping agent execution.

Neither list is long by framework standards. The distinction is that Pydantic AI’s list includes more things requiring pattern knowledge (the generic RunContext[T], the tool/tool_plain split, the streaming result type hierarchy), while Genkit’s list is mostly “here are the three decorators.”

Where Pydantic AI is stronger

The evaluation story is better right now. Pydantic Evals is a built-in framework with fixtures, matchers, and a case runner. Genkit has evaluators in the SDK, but they are less developed. If you need a rigorous evaluation harness from the start of a project, Pydantic AI is ahead here.

The dependency injection pattern earns its complexity for certain agent architectures. If you have an agent where fifteen different tools need access to a database connection pool, an authenticated HTTP session, and a feature flag client, RunContext[Deps] makes that dependency graph explicit and type-checked across the whole codebase. A Genkit developer would use closures or module-level state, which works but loses the type-checking guarantee and the explicit callsite documentation.

The Python ecosystem fit is genuine. If your team already writes Pydantic models for FastAPI request bodies, Pydantic AI extends that same vocabulary into agent development. The BaseModel for your HTTP responses is the same BaseModel for your agent outputs. There is real value in that continuity, especially for teams onboarding new developers.

AG-UI protocol support is native. Pydantic AI has built-in support for the AG-UI streaming protocol and Vercel AI Data Stream. If you are building an agent with a streaming frontend integration, that is less integration work than you would write with Genkit today.

Where Genkit is stronger

The zero-config Dev UI is a real advantage during development. Being able to see a full trace, inspect every model request and response, view token counts, and replay a specific flow call without any external tooling is something I use constantly when building. This is not a minor convenience.

Python and TypeScript parity matters for mixed stacks. Genkit has the same mental model in JavaScript and TypeScript: the same generate() primitive, the same flow abstraction, the same tool pattern. If you have a Python backend and a TypeScript frontend or edge layer, both sides of the team share a vocabulary. Pydantic AI is Python only.

Google Cloud deployment has specific, tested integration paths: Cloud Run, Workload Identity Federation, Secret Manager, and Cloud Trace all work first-class. For teams deploying to GCP or Firebase, that is real reduction in integration work.

Middleware composition is straightforward. Retry, fallback, and tool approval stack on the use=[] parameter without subclassing or agent configuration:

from genkit.plugins.middleware import Retry, Fallback

response = await ai.generate(
    prompt="...",
    use=[Retry(max_retries=3), Fallback(fallback_model='googleai/gemini-flash-latest')],
)

Add what you need, remove what you do not. The middleware instances are plain Python objects.

Choosing between them

Scenario: Python-only team, already on Pydantic ecosystem · Framework: Pydantic AI
Scenario: Fullstack: Python backend + TypeScript frontend · Framework: Genkit
Scenario: Need free local observability, no SaaS account · Framework: Genkit
Scenario: Complex dependency injection through many tools · Framework: Pydantic AI
Scenario: Google Cloud or Firebase deployment · Framework: Genkit
Scenario: Need a production evaluation framework today · Framework: Pydantic AI
Scenario: Prefer minimum framework surface area · Framework: Genkit

Where this leaves things

Both frameworks are past the interesting prototype stage. Pydantic AI hit v1.0 stable in late 2025. Genkit Python 0.7.0 is under active development at Google.

Pydantic AI means “type-safe and DI-backed like FastAPI.” Genkit means “minimal and composable like Flask.” Both analogies are accurate; they describe different things.

If you are a Python-only team with heavy Pydantic investment and you need a robust evaluation framework, Pydantic AI is probably the right choice. If you are deploying to Google Cloud, have TypeScript anywhere in your stack, or you want observability without a credit card, Genkit earns a longer look.

I have not done a rigorous performance comparison. My hands-on time with both frameworks has been primarily on Gemini models; the experience on OpenAI or Anthropic endpoints may differ. The Genkit samples used in this post are in firebase/genkit. Pydantic AI’s examples are from their official docs.

Session Persistence: First-Class vs DIY

This is one of the sharpest architectural differences between the two frameworks.

PydanticAI — you own the history:

# Pass message history manually on every turn
result1 = await agent.run("What's the weather in Chicago?")

result2 = await agent.run(
    "What about tomorrow?",
    message_history=result1.new_messages(),  # you manage this
)

# For persistence across HTTP requests or process restarts:
from pydantic_ai.messages import ModelMessagesTypeAdapter
serialized = ModelMessagesTypeAdapter.dump_json(result1.all_messages())
# store 'serialized' to Redis, Postgres, etc. — your problem

No store= parameter. No session_id. No built-in SessionStore protocol. No atomic snapshots. PydanticAI’s official recommendation for production multi-turn persistence: integrate Temporal, Restate, or a third-party tool like Hindsight or Mem0.

Genkit — sessions are a first-class primitive:

from genkit.agent import InMemorySessionStore
from genkit._core._typing import AgentInit

agent = ai.define_agent(
    name='myAgent',
    tools=[get_weather],
    store=InMemorySessionStore(),  # swap for Redis/Firestore in prod
)

# First request
conn = await agent.stream_bidi(init=AgentInit(session_id='user-123'))
await conn.send_text("What's the weather in Chicago?")
async for chunk in conn.receive():
    print(chunk.model_chunk.text, end='')
await conn.close()

# Resume — different process, different server instance — doesn't matter
conn = await agent.stream_bidi(init=AgentInit(session_id='user-123'))
await conn.send_text("What about tomorrow?")  # agent remembers Chicago

The SessionStore protocol has exactly two methods: get_snapshot() and save_snapshot(). Implement them once against your database of choice and Genkit handles the rest: append-only history, atomic snapshots, concurrent-request safety, cross-process resumption.

The difference in one sentence: PydanticAI gives you the message list and says “store it yourself.” Genkit gives you the SessionStore interface and says “implement two methods, we handle everything else.”

The Graph Execution Reveal

There is one more architectural detail worth understanding before choosing between these frameworks.

PydanticAI’s Agent is not a while loop. It is a graph execution engine.

From their docs: agent.iter() returns an async iterable over “the nodes of the agent’s underlying Graph” — specifically pydantic_graph, their own directed graph library. Every model call, tool execution, output validation, and retry is a node. The generate loop is graph traversal.

# What agent.run() does internally:
# 1. Constructs a pydantic_graph.Graph
# 2. Defines nodes: ModelRequestNode, ToolCallNode, OutputValidationNode, RetryNode
# 3. Traverses the graph from initial prompt to terminal output node
# 4. Returns RunResult when a terminal node is reached

This is the same architectural decision LangGraph made — and the same one that caused the LangChain → LangGraph migration pain that drove developers away. PydanticAI made it with better types and a cleaner API, but the tradeoffs are identical: more overhead, harder to debug, more to understand when something breaks at 2am.

The irony: LangGraph’s graph model is worth its complexity because it enables node-level checkpointing via MemorySaver — durable, resumable execution across failures. PydanticAI’s graph model gives you the overhead without that benefit. Their session story is still “serialize the message list yourself.”

Genkit’s agent harness is a while loop:

# The entire Genkit agent loop — readable in 50 lines
while depth < max_depth:
    response = await generate(messages, tools)
    if not response.tool_calls:
        return response
    results = await execute_tools(response.tool_calls)
    messages.extend(results)
    depth += 1

No nodes. No edges. No graph traversal. Just Python. Want retry logic? Add middleware. Want custom routing? Write an if statement. Want human-in-the-loop? ToolApproval middleware. Every extension is code you own.

PydanticAI’s FastAPI analogy ultimately breaks here. FastAPI’s power comes from a transparent mapping: HTTP request → Python function. You always know what’s happening. PydanticAI maps prompt → graph traversal — and the graph belongs to the framework, not to you. That is the opposite of the philosophy they invoke.

Wait — Genkit Also Has Structured Pydantic Output

One more thing worth stating directly, because it reframes the entire comparison.

Pydantic AI’s core differentiator is structured, type-safe output using Pydantic models. That’s their FastAPI analogy: the same way FastAPI uses Pydantic to validate HTTP request bodies, they use it to validate LLM responses.

Here’s what that looks like in Genkit:

from pydantic import BaseModel
from genkit import Genkit
from genkit.plugins.google_genai import GoogleAI

ai = Genkit(plugins=[GoogleAI()], model='googleai/gemini-2.0-flash')

class WeatherResult(BaseModel):
    city: str
    temperature: str
    condition: str

response = await ai.generate(
    prompt="What's the weather in Chicago?",
    output_schema=WeatherResult,
    output_format='json',
)

print(response.output.temperature)  # typed, validated WeatherResult instance

output_schema takes any Pydantic BaseModel. response.output is a fully validated, typed instance. No separate output_type= parameter on an Agent class. No RunContext. No Deps. Just generate() with one extra argument.

Their headline feature ships in Genkit as a single parameter.

What else ships with it: free local observability, session persistence, a while loop instead of a graph engine, JS/TypeScript parity, and a middleware system for production patterns.

The full picture:

Structured Pydantic output: both frameworks ✅
Session persistence: Genkit ✅, PydanticAI ❌ (DIY or Temporal)
Local observability, zero config: Genkit ✅, PydanticAI ❌ (Logfire, paid)
Agent = transparent while loop: Genkit ✅, PydanticAI ❌ (graph engine)
Standard Python debugging: Genkit ✅, PydanticAI ❌ (needs graph inspector)
JS/TypeScript parity: Genkit ✅, PydanticAI ❌ (Python only)

PydanticAI matched the easy part of the FastAPI analogy — type-safe Pydantic validation. Genkit delivers that and doesn’t build a graph execution engine on top of it.