Skip to content

Perform entity extraction using Azure OpenAI structured outputs

License

Notifications You must be signed in to change notification settings

Azure-Samples/azure-openai-entity-extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Entity extraction with Azure OpenAI structured outputs (Python)

Open in GitHub Codespaces Open in Dev Containers

This repository includes both the infrastructure and Python files needed so that you can create an Azure OpenAI gpt-4o model deployment and then perform entity extraction using the structured outputs mode and the Python openai SDK. Example scripts are provided for extracting details from images, PDFs, webpages, and GitHub issues.

Features

Architecture diagram

Architecture diagram: Microsoft Entra managed identity connecting to Azure AI services

Getting started

You have a few options for getting started with this template. The quickest way to get started is GitHub Codespaces, since it will setup all the tools for you, but you can also set it up locally.

GitHub Codespaces

You can run this template virtually by using GitHub Codespaces. The button will open a web-based VS Code instance in your browser:

  1. Open the template (this may take several minutes):

    Open in GitHub Codespaces

  2. Open a terminal window

  3. Continue with the deployment steps

VS Code Dev Containers

A related option is VS Code Dev Containers, which will open the project in your local VS Code using the Dev Containers extension:

  1. Start Docker Desktop (install it if not already installed)

  2. Open the project:

    Open in Dev Containers

  3. In the VS Code window that opens, once the project files show up (this may take several minutes), open a terminal window.

  4. Continue with the deployment steps

Local environment

  1. Make sure the following tools are installed:

  2. Make a new directory called azure-openai-entity-extraction and clone this template into it using the azd CLI:

    azd init -t azure-openai-entity-extraction

    You can also use git to clone the repository if you prefer.

  3. Continue with the deployment steps

Deployment

  1. Login to Azure:

    azd auth login

    For GitHub Codespaces users, if the previous command fails, try:

     azd auth login --use-device-code
  2. Provision the OpenAI account:

    azd provision

    It will prompt you to provide an azd environment name (like "entityext"), select a subscription from your Azure account, and select a location where the OpenAI model is available (like "canadaeast"). Then it will provision the resources in your account and deploy the latest code.

    ⚠️ If you get an error or timeout with deployment, changing the location can help, as there may be availability constraints for the OpenAI resource. To change the location run:

    azd env set AZURE_LOCATION "yournewlocationname"
  3. When azd has finished, you should have an OpenAI account you can use locally when logged into your Azure account, and a .env file should now exist with your Azure OpenAI configuration.

  4. Then you can proceed to run the Python examples.

Running the Python examples

To run the samples, you'll either need to have already deployed the Azure OpenAI account or use GitHub models.

  1. Check that the .env file exists in the root of the project. If you deployed an Azure OpenAI account, it should have been created for you, and look like this:

    OPENAI_HOST=azure
    AZURE_OPENAI_GPT_DEPLOYMENT=gpt-4o
    AZURE_OPENAI_SERVICE=your-service-name
    AZURE_TENANT_ID=your-tenant-id-1234

    If you're using GitHub models, create a .env file with the following content:

    OPENAI_HOST=github
    GITHUB_TOKEN=

    You can create a GitHub token by following the GitHub documentation, or open this project inside GitHub Codespaces where the token is already exposed as an environment variable.

  2. If you're not already running in a Codespace or Dev Container, create a Python virtual environment.

  3. Install the requirements:

    python -m pip install -r requirements.txt
  4. Run an example by running either python example_file.py or selecting the Run button on the opened file. Available examples:

    Script filename Description
    basic_azure.py A basic example that uses deployed Azure OpenAI resource to extract from string input.
    basic_githubmodels.py A basic example that uses free gpt-4o from GitHub Models to extract from string input.
    extract_github_issue.py Fetches a public issue using the GitHub API, and then extracts details.
    extract_github_repo.py Fetches a public README using the GitHub API, and then extracts details.
    extract_image_graph.py Parses a local image of a graph and extracts details like title, axis, legend.
    extract_image_table.py Parses a local image with tables and extracts nested tabular data.
    extract_pdf_receipt.py Parses a local PDF using pymupdf, which converts it to Markdown, and extracts order details.
    extract_webpage.py Parses a blog post using BeautifulSoup, and extracts title, description, and tags.

Guidance

Costs

This template creates only the Azure OpenAI resource, which is free to provision. However, you will be charged for the usage of the Azure OpenAI chat completions API. The pricing is based on the number of tokens used, with around 1-3 tokens used per word. You can find the pricing details for the OpenAI API on the Azure Cognitive Services pricing page.

Security guidelines

This template uses keyless authentication for authenticating to the Azure OpenAI resource. This is a secure way to authenticate to Azure resources without needing to store credentials in your code. Your Azure user account is assigned the "Cognitive Services OpenAI User" role, which allows you to access the OpenAI resource. You can find more information about the permissions of this role in the Azure OpenAI documentation.

For further security, you could also deploy the Azure OpenAI inside a private virtual network (VNet) and use a private endpoint to access it. This would prevent the OpenAI resource from being accessed from the public internet.

Resources