Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for multimodal input #961

Closed
wants to merge 5 commits into from
Closed

Add support for multimodal input #961

wants to merge 5 commits into from

Conversation

Kludex
Copy link
Member

@Kludex Kludex commented Feb 21, 2025

The idea of this PR is to be able to send audio and images on the user prompt.

API

The proposed API involves changing agent.run():

import requests
from rich.pretty import pprint

from pydantic_ai import Agent
from pydantic_ai.messages import ImageUrl

image_url = 'https://goo.gle/instrument-img'

agent = Agent(model='google-gla:gemini-2.0-flash-exp')

output = agent.run_sync(["What's in the image?", ImageUrl(url=image_url)])
pprint(output)

You can also send multiple images from different sources:

from pathlib import Path

from rich.pretty import pprint

from pydantic_ai import Agent
from pydantic_ai.messages import BinaryContent, ImageUrl

image_url = 'https://goo.gle/instrument-img'
image_bytes = Path('docs/img/tree.png').read_bytes()

agent = Agent(model='google-gla:gemini-2.0-flash-exp')

output = agent.run_sync(
    [
        "What's in the image?",
        ImageUrl(url=image_url),
        BinaryContent(data=image_bytes, media_type='image/png'),
    ]
)
pprint(output)

The types added in this PR are:

  • AudioUrl
  • ImageUrl
  • BinaryContent

Feedback welcome.

@Kludex Kludex changed the title Add support for multimodal Add support for multimodal input Feb 21, 2025
@Kludex Kludex force-pushed the support-multimodal branch from 10d6b08 to ef833f8 Compare February 22, 2025 10:01
Copy link

Docs Preview

commit: ef833f8
Preview URL: https://0c6f3663-pydantic-ai-previews.pydantic.workers.dev

@Kludex
Copy link
Member Author

Kludex commented Feb 23, 2025

Please follow #971

I couldn't fix the test suite, so I started from scratch, and I found out the issue, and I'm already adding tests over there.

@Kludex Kludex closed this Feb 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant