Writing
Genkit Python in 2026: The Definitive Getting-Started Guide
Here's a working Genkit Python app in under 20 lines:
Here’s a working Genkit Python app in under 20 lines:
from genkit import Genkit
from genkit.plugins.google_genai import GoogleAI
ai = Genkit(
plugins=[GoogleAI()],
model='googleai/gemini-2.0-flash',
)
async def main():
response = await ai.generate(prompt='Explain async/await in one sentence.')
print(response.text)
if __name__ == '__main__':
ai.run_main(main())
That’s the skeleton. Every Genkit Python app follows this shape. The rest of this guide fills in the details: installation, flows, structured output, async patterns, and a complete end-to-end example. If you already tried the JS version and want to know what’s different in Python—I’ll call that out explicitly.
Why Genkit Python instead of the Vertex AI SDK directly?
You could use google-generativeai or vertexai directly. Here’s why you’d reach for Genkit instead:
Observability out of the box. Every generate() call and flow execution is traced. You get a local Dev UI that shows you request/response pairs, token counts, latencies, and span trees—without writing any logging code. In production, this hooks into OpenTelemetry.
Flows are deployable units. A flow is a typed, observable async function that the framework knows how to call, test, and expose over HTTP. You don’t wire up Flask routes or FastAPI endpoints by hand—Genkit does that.
Middleware for production patterns. Retry, fallback, tool approval, and cost tracking are first-class concepts. You drop them into use=[...] rather than wrapping every generate() call.
JS parity. If your team has a JS Genkit app and you’re adding a Python backend—or vice versa—the mental model is identical. Flows, tools, middleware, structured output: same concepts, same Dev UI.
The tradeoff: Genkit Python adds indirection. If you just need a one-shot LLM call with no tracing or observability, the raw SDK is simpler. Genkit earns its keep at scale.
Installation
The quick way with uv (recommended)
# Create a project
mkdir my-genkit-app && cd my-genkit-app
uv init --python 3.12
Add to pyproject.toml:
[project]
name = "my-genkit-app"
version = "0.1.0"
requires-python = ">=3.12"
dependencies = [
"genkit[google-genai] @ git+https://github.com/firebase/genkit.git#subdirectory=py/packages/genkit",
]
Then:
uv sync
With pip
pip install "genkit[google-genai] @ git+https://github.com/firebase/genkit.git#subdirectory=py/packages/genkit"
Important: why install from git?
The core genkit package is on PyPI at version 0.7.0. But the model plugins (like google-genai) are not yet on PyPI as standalone packages. Running pip install genkit-google-genai fails. The git install pulls both the core SDK and the plugin together via the [google-genai] extra.
API key
export GEMINI_API_KEY="your-key-here"
GoogleAI() reads this automatically. Don’t pass it as a constructor argument.
Init — one object rules everything
from genkit import Genkit
from genkit.plugins.google_genai import GoogleAI
ai = Genkit(
plugins=[GoogleAI()], # load model providers
model='googleai/gemini-2.0-flash', # default model for all calls
)
Model ID format matters. The model string is always "googleai/<model-name>". Never a bare model name. Common options:
- Model: Gemini 2.0 Flash · ID: googleai/gemini-2.0-flash · When to use: Default — fast, cheap
- Model: Gemini 2.5 Flash · ID: googleai/gemini-flash-latest · When to use: Balanced quality/speed
- Model: Gemini 2.5 Pro · ID: googleai/gemini-2.5-pro · When to use: Best quality, slower
The ai object is your app’s runtime. It holds the plugin registry, middleware registry, and async event loop. You create one and use it everywhere.
First generate() call
response = await ai.generate(prompt='Write a haiku about Python.')
print(response.text)
Output:
Indented stanzas,
whitespace speaks louder than words—
Guido smiles softly.
Two things to know about response:
response.text— the model’s text output (string)response.output— structured output when you’ve specified a schema (more on this below)response.messages— the full conversation history including this turn
Never use asyncio.run() to call async Genkit code. Use ai.run_main():
async def main():
response = await ai.generate(prompt='Hello')
print(response.text)
if __name__ == '__main__':
ai.run_main(main()) # ✅
# asyncio.run(main()) # ❌ — breaks the reflection server
ai.run_main() starts your coroutine and—in dev mode—keeps the process alive to serve the Dev UI’s reflection server.
Defining a Flow
A flow is the core Genkit primitive. It’s an async function with:
- Typed input/output (Pydantic models recommended)
- Automatic tracing
- Callable from the Dev UI
- Deployable over HTTP (via the FastAPI plugin)
from pydantic import BaseModel
class SummarizeInput(BaseModel):
text: str
max_words: int = 50
@ai.flow()
async def summarize(input: SummarizeInput) -> str:
response = await ai.generate(
prompt=f'Summarize in at most {input.max_words} words: {input.text}'
)
return response.text
Call it like a normal async function:
result = await summarize(SummarizeInput(
text='Large language models are neural networks trained...',
max_words=20,
))
print(result)
# "Large language models are neural networks trained on text to generate language."
The @ai.flow() decorator registers the function with the Genkit runtime. You can give it a custom name:
@ai.flow(name='my-summarizer')
async def summarize(input: SummarizeInput) -> str:
...
Structured Output with Pydantic
This is where Genkit Python shines. You define a Pydantic model, pass it as output_schema, and get a typed object back—no manual JSON parsing.
from pydantic import BaseModel
class BookReview(BaseModel):
title: str
author: str
rating: int # 1-5
summary: str
recommend: bool
response = await ai.generate(
prompt='Review "The Pragmatic Programmer" by David Thomas and Andrew Hunt.',
output_format='json',
output_schema=BookReview,
)
review = response.output # BookReview instance
print(f"{review.title} — {review.rating}/5 stars")
print(f"Recommend: {review.recommend}")
print(review.summary)
response.output not response.json — a common mistake from other SDKs.
For lists of structured objects, use Pydantic’s TypeAdapter:
from pydantic import TypeAdapter
schema = TypeAdapter(list[BookReview]).json_schema()
response = await ai.generate(
prompt='Give me reviews of 3 classic programming books.',
output_format='array',
output_schema=schema,
)
books = response.output # list of dicts
Available output_format values: 'text', 'json', 'array', 'enum', 'jsonl'.
Streaming
Streaming lets you display partial output as it arrives—critical for any user-facing app.
# generate_stream is NOT awaited — it returns synchronously
sr = ai.generate_stream(prompt='Write a short story about a robot learning to code.')
async for chunk in sr.stream:
if chunk.text:
print(chunk.text, end='', flush=True)
final = await sr.response # full ModelResponse when done
print(f"\n\nTotal tokens: {final.usage.total_tokens}")
Note the asymmetry: generate() is awaited; generate_stream() is not. It returns immediately with a StreamResponse object. You then iterate .stream async, and await .response to get the final result.
Streaming flows
Flows can stream chunks to callers using ActionRunContext:
from genkit import ActionRunContext
@ai.flow()
async def stream_story(subject: str, ctx: ActionRunContext) -> str:
sr = ai.generate_stream(prompt=f'Write a 3-paragraph story about {subject}.')
full_text = ''
async for chunk in sr.stream:
if chunk.text:
ctx.send_chunk(chunk.text)
full_text += chunk.text
return full_text
The ctx parameter is injected by the framework when present—you don’t pass it when calling the flow.
Running Locally with the Dev Server
The Dev UI is where development actually happens. It lets you call flows interactively, inspect traces, and test structured output—without writing a test harness.
# Start the dev server (from your project directory)
genkit start -- uv run src/main.py
This starts your Python app with the Genkit reflection server enabled, then opens the Dev UI at http://localhost:4000. You’ll see all your registered flows in a sidebar. Click any flow, fill in the input JSON, and run it. Every call shows up as a trace with full request/response details.
If you’re using pip instead of uv:
genkit start -- python src/main.py
Install the Genkit CLI with: npm install -g genkit-cli
Complete End-to-End Example: Article Summarizer
Here’s a complete, working app that combines flows, structured output, and tools:
# src/main.py
from pydantic import BaseModel, Field
from genkit import Genkit
from genkit.plugins.google_genai import GoogleAI
ai = Genkit(
plugins=[GoogleAI()],
model='googleai/gemini-2.0-flash',
)
# --- Schemas ---
class ArticleInput(BaseModel):
url: str = Field(description='Article URL to summarize')
audience: str = Field(
default='general',
description='Target audience: general, technical, executive',
)
class ArticleSummary(BaseModel):
title: str
one_liner: str
key_points: list[str]
sentiment: str # positive, negative, neutral
reading_time_minutes: int
# --- Tool: fetch article text ---
class FetchInput(BaseModel):
url: str
@ai.tool()
async def fetch_article(input: FetchInput) -> str:
"""Fetch the text content of a URL."""
import urllib.request
try:
with urllib.request.urlopen(input.url, timeout=10) as resp:
# In production: use httpx + HTML stripping
return resp.read().decode('utf-8')[:8000]
except Exception as e:
return f'Could not fetch article: {e}'
# --- Flow ---
@ai.flow()
async def summarize_article(input: ArticleInput) -> ArticleSummary:
"""Fetch an article and return a structured summary."""
# Step 1: Fetch the article using tool use
fetch_response = await ai.generate(
prompt=f'Fetch the article at {input.url}',
tools=[fetch_article],
)
article_text = fetch_response.text
# Step 2: Summarize with structured output
summary_response = await ai.generate(
prompt=f"""
Summarize the following article for a {input.audience} audience.
Article:
{article_text}
""",
output_format='json',
output_schema=ArticleSummary,
)
return summary_response.output
async def main():
result = await summarize_article(ArticleInput(
url='https://example.com/some-article',
audience='technical',
))
print(f"Title: {result.title}")
print(f"One-liner: {result.one_liner}")
print(f"Sentiment: {result.sentiment}")
print("Key points:")
for point in result.key_points:
print(f" • {point}")
if __name__ == '__main__':
ai.run_main(main())
Run it:
uv run src/main.py
Or explore it interactively:
genkit start -- uv run src/main.py
Then open http://localhost:4000, click summarize_article, and paste in any URL.
Common Mistakes Quick Reference
- ❌ Wrong: pip install genkit-google-genai · ✅ Right: Install from git · Why: Plugin not on PyPI yet
- ❌ Wrong: model=‘gemini-2.0-flash’ · ✅ Right: model=‘googleai/gemini-2.0-flash’ · Why: Must include provider prefix
- ❌ Wrong: await ai.generate_stream(…) · ✅ Right: ai.generate_stream(…) (no await) · Why: Returns sync StreamResponse
- ❌ Wrong: asyncio.run(main()) · ✅ Right: ai.run_main(main()) · Why: Breaks reflection server
- ❌ Wrong: response.json · ✅ Right: response.output · Why: Correct attribute name
- ❌ Wrong: response.message · ✅ Right: response.text · Why: Correct attribute name
- ❌ Wrong: GoogleAI(api_key=’…’) · ✅ Right: Set GEMINI_API_KEY env var · Why: API reads from env
- ❌ Wrong: @ai.define_tool() · ✅ Right: @ai.tool() · Why: Correct decorator name
- ❌ Wrong: async def tool(city: str) · ✅ Right: Input as Pydantic BaseModel · Why: Gemini requires OBJECT type
What’s Next
- Article 2: Multi-turn agents and the Sessions API — how Genkit manages conversation state across requests
- Article 3: Middleware in production — retry, fallback, and tool approval patterns
The full SDK source is at github.com/firebase/genkit in py/. File issues there; join the discussion on Discord.