-
Notifications
You must be signed in to change notification settings - Fork 521
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Co-authored-by: David Montague <35119617+dmontagu@users.noreply.github.com>
- Loading branch information
Showing
40 changed files
with
8,814 additions
and
168 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
# Image and Audio Input | ||
|
||
Some LLMs are now capable of understanding both audio and image content. | ||
|
||
## Image Input | ||
|
||
!!! info | ||
Some models do not support image input. Please check the model's documentation to confirm whether it supports image input. | ||
|
||
If you have a direct URL for the image, you can use [`ImageUrl`][pydantic_ai.ImageUrl]: | ||
|
||
```py {title="main.py" test="skip" lint="skip"} | ||
from pydantic_ai import Agent, ImageUrl | ||
|
||
image_url = ImageUrl(url='https://iili.io/3Hs4FMg.png') | ||
|
||
agent = Agent(model='openai:gpt-4o') | ||
result = agent.run_sync( | ||
[ | ||
'What company is this logo from?', | ||
ImageUrl(url='https://iili.io/3Hs4FMg.png'), | ||
] | ||
) | ||
print(result.data) | ||
#> This is the logo for Pydantic, a data validation and settings management library in Python. | ||
``` | ||
|
||
If you have the image locally, you can also use [`BinaryContent`][pydantic_ai.BinaryContent]: | ||
|
||
```py {title="main.py" test="skip" lint="skip"} | ||
import httpx | ||
|
||
from pydantic_ai import Agent, BinaryContent | ||
|
||
image_response = httpx.get('https://iili.io/3Hs4FMg.png') # Pydantic logo | ||
|
||
agent = Agent(model='openai:gpt-4o') | ||
result = agent.run_sync( | ||
[ | ||
'What company is this logo from?', | ||
BinaryContent(data=image_response.content, media_type='image/png'), # (1)! | ||
] | ||
) | ||
print(result.data) | ||
#> This is the logo for Pydantic, a data validation and settings management library in Python. | ||
``` | ||
|
||
1. To ensure the example is runnable we download this image from the web, but you can also use `Path().read_bytes()` to read a local file's contents. | ||
|
||
## Audio Input | ||
|
||
!!! info | ||
Some models do not support audio input. Please check the model's documentation to confirm whether it supports audio input. | ||
|
||
You can provide audio input using either [`AudioUrl`][pydantic_ai.AudioUrl] or [`BinaryContent`][pydantic_ai.BinaryContent]. The process is analogous to the examples above. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.