Genkit Python in 2026: The Definitive Getting-Started Guide

Here’s a working Genkit Python app in under 20 lines:

from genkit import Genkit
from genkit.plugins.google_genai import GoogleAI

ai = Genkit(
    plugins=[GoogleAI()],
    model='googleai/gemini-2.0-flash',
)

async def main():
    response = await ai.generate(prompt='Explain async/await in one sentence.')
    print(response.text)

if __name__ == '__main__':
    ai.run_main(main())

That’s the skeleton. Every Genkit Python app follows this shape. The rest of this guide fills in the details: installation, flows, structured output, async patterns, and a complete end-to-end example. If you already tried the JS version and want to know what’s different in Python—I’ll call that out explicitly.

Why Genkit Python instead of the Vertex AI SDK directly?

You could use google-generativeai or vertexai directly. Here’s why you’d reach for Genkit instead:

Observability out of the box. Every generate() call and flow execution is traced. You get a local Dev UI that shows you request/response pairs, token counts, latencies, and span trees—without writing any logging code. In production, this hooks into OpenTelemetry.

Flows are deployable units. A flow is a typed, observable async function that the framework knows how to call, test, and expose over HTTP. You don’t wire up Flask routes or FastAPI endpoints by hand—Genkit does that.

Middleware for production patterns. Retry, fallback, tool approval, and cost tracking are first-class concepts. You drop them into use=[...] rather than wrapping every generate() call.

JS parity. If your team has a JS Genkit app and you’re adding a Python backend—or vice versa—the mental model is identical. Flows, tools, middleware, structured output: same concepts, same Dev UI.

The tradeoff: Genkit Python adds indirection. If you just need a one-shot LLM call with no tracing or observability, the raw SDK is simpler. Genkit earns its keep at scale.

Installation

The quick way with `uv` (recommended)

# Create a project
mkdir my-genkit-app && cd my-genkit-app
uv init --python 3.12

Add to pyproject.toml:

[project]
name = "my-genkit-app"
version = "0.1.0"
requires-python = ">=3.12"
dependencies = [
    "genkit[google-genai] @ git+https://github.com/firebase/genkit.git#subdirectory=py/packages/genkit",
]

Then:

uv sync

With pip

pip install "genkit[google-genai] @ git+https://github.com/firebase/genkit.git#subdirectory=py/packages/genkit"

Important: why install from git?

The core genkit package is on PyPI at version 0.7.0. But the model plugins (like google-genai) are not yet on PyPI as standalone packages. Running pip install genkit-google-genai fails. The git install pulls both the core SDK and the plugin together via the [google-genai] extra.

API key

export GEMINI_API_KEY="your-key-here"

GoogleAI() reads this automatically. Don’t pass it as a constructor argument.

Init — one object rules everything

from genkit import Genkit
from genkit.plugins.google_genai import GoogleAI

ai = Genkit(
    plugins=[GoogleAI()],          # load model providers
    model='googleai/gemini-2.0-flash',  # default model for all calls
)

Model ID format matters. The model string is always "googleai/<model-name>". Never a bare model name. Common options:

Model: Gemini 2.0 Flash · ID: googleai/gemini-2.0-flash · When to use: Default — fast, cheap
Model: Gemini 2.5 Flash · ID: googleai/gemini-flash-latest · When to use: Balanced quality/speed
Model: Gemini 2.5 Pro · ID: googleai/gemini-2.5-pro · When to use: Best quality, slower

The ai object is your app’s runtime. It holds the plugin registry, middleware registry, and async event loop. You create one and use it everywhere.

First `generate()` call

response = await ai.generate(prompt='Write a haiku about Python.')
print(response.text)

Output:

Indented stanzas,
whitespace speaks louder than words—
Guido smiles softly.

Two things to know about response:

response.text — the model’s text output (string)
response.output — structured output when you’ve specified a schema (more on this below)
response.messages — the full conversation history including this turn

Never use asyncio.run() to call async Genkit code. Use ai.run_main():

async def main():
    response = await ai.generate(prompt='Hello')
    print(response.text)

if __name__ == '__main__':
    ai.run_main(main())  # ✅
    # asyncio.run(main())  # ❌ — breaks the reflection server

ai.run_main() starts your coroutine and—in dev mode—keeps the process alive to serve the Dev UI’s reflection server.

Defining a Flow

A flow is the core Genkit primitive. It’s an async function with:

Typed input/output (Pydantic models recommended)
Automatic tracing
Callable from the Dev UI
Deployable over HTTP (via the FastAPI plugin)

from pydantic import BaseModel

class SummarizeInput(BaseModel):
    text: str
    max_words: int = 50

@ai.flow()
async def summarize(input: SummarizeInput) -> str:
    response = await ai.generate(
        prompt=f'Summarize in at most {input.max_words} words: {input.text}'
    )
    return response.text

Call it like a normal async function:

result = await summarize(SummarizeInput(
    text='Large language models are neural networks trained...',
    max_words=20,
))
print(result)
# "Large language models are neural networks trained on text to generate language."

The @ai.flow() decorator registers the function with the Genkit runtime. You can give it a custom name:

@ai.flow(name='my-summarizer')
async def summarize(input: SummarizeInput) -> str:
    ...

Structured Output with Pydantic

This is where Genkit Python shines. You define a Pydantic model, pass it as output_schema, and get a typed object back—no manual JSON parsing.

from pydantic import BaseModel

class BookReview(BaseModel):
    title: str
    author: str
    rating: int       # 1-5
    summary: str
    recommend: bool

response = await ai.generate(
    prompt='Review "The Pragmatic Programmer" by David Thomas and Andrew Hunt.',
    output_format='json',
    output_schema=BookReview,
)

review = response.output   # BookReview instance
print(f"{review.title} — {review.rating}/5 stars")
print(f"Recommend: {review.recommend}")
print(review.summary)

response.output not response.json — a common mistake from other SDKs.

For lists of structured objects, use Pydantic’s TypeAdapter:

from pydantic import TypeAdapter

schema = TypeAdapter(list[BookReview]).json_schema()
response = await ai.generate(
    prompt='Give me reviews of 3 classic programming books.',
    output_format='array',
    output_schema=schema,
)
books = response.output  # list of dicts

Available output_format values: 'text', 'json', 'array', 'enum', 'jsonl'.

Streaming

Streaming lets you display partial output as it arrives—critical for any user-facing app.

# generate_stream is NOT awaited — it returns synchronously
sr = ai.generate_stream(prompt='Write a short story about a robot learning to code.')

async for chunk in sr.stream:
    if chunk.text:
        print(chunk.text, end='', flush=True)

final = await sr.response   # full ModelResponse when done
print(f"\n\nTotal tokens: {final.usage.total_tokens}")

Note the asymmetry: generate() is awaited; generate_stream() is not. It returns immediately with a StreamResponse object. You then iterate .stream async, and await .response to get the final result.

Streaming flows

Flows can stream chunks to callers using ActionRunContext:

from genkit import ActionRunContext

@ai.flow()
async def stream_story(subject: str, ctx: ActionRunContext) -> str:
    sr = ai.generate_stream(prompt=f'Write a 3-paragraph story about {subject}.')
    full_text = ''
    async for chunk in sr.stream:
        if chunk.text:
            ctx.send_chunk(chunk.text)
            full_text += chunk.text
    return full_text

The ctx parameter is injected by the framework when present—you don’t pass it when calling the flow.

Running Locally with the Dev Server

The Dev UI is where development actually happens. It lets you call flows interactively, inspect traces, and test structured output—without writing a test harness.

# Start the dev server (from your project directory)
genkit start -- uv run src/main.py

This starts your Python app with the Genkit reflection server enabled, then opens the Dev UI at http://localhost:4000. You’ll see all your registered flows in a sidebar. Click any flow, fill in the input JSON, and run it. Every call shows up as a trace with full request/response details.

If you’re using pip instead of uv:

genkit start -- python src/main.py

Install the Genkit CLI with: npm install -g genkit-cli

Complete End-to-End Example: Article Summarizer

Here’s a complete, working app that combines flows, structured output, and tools:

# src/main.py
from pydantic import BaseModel, Field
from genkit import Genkit
from genkit.plugins.google_genai import GoogleAI

ai = Genkit(
    plugins=[GoogleAI()],
    model='googleai/gemini-2.0-flash',
)


# --- Schemas ---

class ArticleInput(BaseModel):
    url: str = Field(description='Article URL to summarize')
    audience: str = Field(
        default='general',
        description='Target audience: general, technical, executive',
    )

class ArticleSummary(BaseModel):
    title: str
    one_liner: str
    key_points: list[str]
    sentiment: str   # positive, negative, neutral
    reading_time_minutes: int


# --- Tool: fetch article text ---

class FetchInput(BaseModel):
    url: str

@ai.tool()
async def fetch_article(input: FetchInput) -> str:
    """Fetch the text content of a URL."""
    import urllib.request
    try:
        with urllib.request.urlopen(input.url, timeout=10) as resp:
            # In production: use httpx + HTML stripping
            return resp.read().decode('utf-8')[:8000]
    except Exception as e:
        return f'Could not fetch article: {e}'


# --- Flow ---

@ai.flow()
async def summarize_article(input: ArticleInput) -> ArticleSummary:
    """Fetch an article and return a structured summary."""

    # Step 1: Fetch the article using tool use
    fetch_response = await ai.generate(
        prompt=f'Fetch the article at {input.url}',
        tools=[fetch_article],
    )

    article_text = fetch_response.text

    # Step 2: Summarize with structured output
    summary_response = await ai.generate(
        prompt=f"""
Summarize the following article for a {input.audience} audience.
Article:
{article_text}
""",
        output_format='json',
        output_schema=ArticleSummary,
    )

    return summary_response.output


async def main():
    result = await summarize_article(ArticleInput(
        url='https://example.com/some-article',
        audience='technical',
    ))
    print(f"Title: {result.title}")
    print(f"One-liner: {result.one_liner}")
    print(f"Sentiment: {result.sentiment}")
    print("Key points:")
    for point in result.key_points:
        print(f"  • {point}")


if __name__ == '__main__':
    ai.run_main(main())

Run it:

uv run src/main.py

Or explore it interactively:

genkit start -- uv run src/main.py

Then open http://localhost:4000, click summarize_article, and paste in any URL.

Common Mistakes Quick Reference

❌ Wrong: pip install genkit-google-genai · ✅ Right: Install from git · Why: Plugin not on PyPI yet
❌ Wrong: model=‘gemini-2.0-flash’ · ✅ Right: model=‘googleai/gemini-2.0-flash’ · Why: Must include provider prefix
❌ Wrong: await ai.generate_stream(…) · ✅ Right: ai.generate_stream(…) (no await) · Why: Returns sync StreamResponse
❌ Wrong: asyncio.run(main()) · ✅ Right: ai.run_main(main()) · Why: Breaks reflection server
❌ Wrong: response.json · ✅ Right: response.output · Why: Correct attribute name
❌ Wrong: response.message · ✅ Right: response.text · Why: Correct attribute name
❌ Wrong: GoogleAI(api_key=’…’) · ✅ Right: Set GEMINI_API_KEY env var · Why: API reads from env
❌ Wrong: @ai.define_tool() · ✅ Right: @ai.tool() · Why: Correct decorator name
❌ Wrong: async def tool(city: str) · ✅ Right: Input as Pydantic BaseModel · Why: Gemini requires OBJECT type

What’s Next

Article 2: Multi-turn agents and the Sessions API — how Genkit manages conversation state across requests
Article 3: Middleware in production — retry, fallback, and tool approval patterns

The full SDK source is at github.com/firebase/genkit in py/. File issues there; join the discussion on Discord.