Skip to content

Commit

Permalink
Asyncio for data plane calls (#435)
Browse files Browse the repository at this point in the history
## Problem

The purpose of this PR is to introduce a class, `AsyncioIndex` that
provides an async version of the functionality found in the `Index`
client. This includes standard index data plane operations such as
`upsert`, `query`, etc as well as bulk import operations
(`start_import`, `list_imports`, etc).

## Solution

This is a very complex diff with many moving parts.

- New dependency on `aiohttp`, an asyncio-compatible http client.
- New dev dependency on `pytest-asyncio` to support async testing
- Heavy refactoring in `pinecone/openapi_support` to introduce
asyncio-variants of existing classes: `AsyncioApiClient`,
`AsyncioEndpoint`, and `AiohttpRestClient`. I don't love the way any of
these are currently laid out, but for simplicity sake I decided to hew
close to the existing organization since this was already going to be a
complex change.
- Adjustments to our private python openapi templates in order to
generate asyncio versions of api client (e.g.
`AsyncioVectorOperationsApi`) objects and reference the objects named
above.
- Create a new class, `AsyncioIndex` that uses these asyncio variant
objects. Since the majority of the logic (validation, etc) inside each
data plane method of `Index` was previously extracted into
`IndexRequestFactory`, the amount of actual new code needed inside this
class was minimal async from signature changes to use `async` / `await`.
- Add new integration test covering asyncio usage with both sparse and
dense indexes.
- Very mechnical refactoring to also bring bulk import functionality
into the AsyncioIndex class as a mixin. I did not add automated tests
for these due to the external dependencies required to properly
integration test this (e.g. parquet files hosted on S3). Will need to
manually verify these in testing.

Also:
- Drop python 3.8, which is now end of life
- Removed `ddtrace` dev dependency for logging test info in datadog.
This was giving me a lot of annoying errors when running tests locally.
I will troubleshoot and bring it back in later.
- Updated `jinja` and `virtualenv` versions in our poetry.lock file to
resolve dependabot alerts
- Work to implement the asyncio codepath for GRPC was previously handled
in a different diff

## Usage

In a standalone script, you might do something like this:

```python
import random
import asyncio
from pinecone import Pinecone

async def main():
    pc = Pinecone(api_key="key")
    async with pc.AsyncioIndex(name="index-name") as index:
        tasks = [
            index.query(
                vector=[random.random()] * 1024,
                namespace="ns1",
                include_values=False,
                include_metadata=True,
                top_k=2
        ) for _ in range(20)]
        
        # Execute 20 queries in parallel
        results = await asyncio.gather(*tasks)
        print(results)
    
asyncio.run(main())
```

## Type of Change

- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [x] This change requires a documentation update
- [x] Infrastructure change (CI configs, etc)
- [ ] Non-code change (docs, etc)
- [ ] None of the above: (explain here)

## Test Plan

Describe specific steps for validating this change.
  • Loading branch information
jhamon authored Jan 28, 2025
1 parent 9da6b04 commit 56d20cb
Show file tree
Hide file tree
Showing 94 changed files with 6,715 additions and 906 deletions.
45 changes: 45 additions & 0 deletions .github/actions/test-asyncio/action.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
name: 'Test Asyncio'
description: 'Runs tests on the Pinecone data plane'

inputs:
spec:
description: 'The deploy spec of the index'
required: true
use_grpc:
description: 'Whether to use gRPC or REST'
required: true
freshness_timeout_seconds:
description: 'The number of seconds to wait for the index to become fresh'
required: false
default: '60'
PINECONE_API_KEY:
description: 'The Pinecone API key'
required: true
python_version:
description: 'The version of Python to use'
required: false
default: '3.9'

runs:
using: 'composite'
steps:
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ inputs.python_version }}

- name: Setup Poetry
uses: ./.github/actions/setup-poetry
with:
include_grpc: ${{ inputs.use_grpc }}
include_dev: 'true'

- name: Run data plane tests
id: data-plane-asyncio-tests
shell: bash
run: poetry run pytest tests/integration/data_asyncio -s -vv
env:
PINECONE_API_KEY: ${{ inputs.PINECONE_API_KEY }}
USE_GRPC: ${{ inputs.use_grpc }}
SPEC: ${{ inputs.spec }}
FRESHNESS_TIMEOUT_SECONDS: ${{ inputs.freshness_timeout_seconds }}
7 changes: 1 addition & 6 deletions .github/actions/test-data-plane/action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,13 +59,8 @@ runs:
- name: Run data plane tests
id: data-plane-tests
shell: bash
run: poetry run pytest tests/integration/data --ddtrace
run: poetry run pytest tests/integration/data
env:
DD_CIVISIBILITY_AGENTLESS_ENABLED: true
DD_API_KEY: ${{ inputs.DATADOG_API_KEY }}
DD_SITE: datadoghq.com
DD_ENV: ci
DD_SERVICE: pinecone-python-client
PINECONE_API_KEY: ${{ inputs.PINECONE_API_KEY }}
USE_GRPC: ${{ inputs.use_grpc }}
METRIC: ${{ inputs.metric }}
Expand Down
2 changes: 1 addition & 1 deletion .github/actions/test-dependency-grpc/action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ runs:
timeout_minutes: 5
max_attempts: 3
retry_on: error
command: poetry run pytest tests/dependency/grpc -s -v --ddtrace
command: poetry run pytest tests/dependency/grpc -s -v
env:
PINECONE_API_KEY: ${{ inputs.PINECONE_API_KEY }}
INDEX_NAME: ${{ inputs.index_name }}
2 changes: 1 addition & 1 deletion .github/actions/test-dependency-rest/action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ runs:
timeout_minutes: 5
max_attempts: 3
retry_on: error
command: poetry run pytest tests/dependency/rest -s -v --ddtrace
command: poetry run pytest tests/dependency/rest -s -v
env:
PINECONE_API_KEY: '${{ inputs.PINECONE_API_KEY }}'
INDEX_NAME: '${{ inputs.index_name }}'
6 changes: 3 additions & 3 deletions .github/workflows/pr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.8, 3.12]
python-version: [3.9, 3.12]
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
Expand All @@ -33,7 +33,7 @@ jobs:
uses: ./.github/actions/setup-poetry
- name: Package
run: poetry build

build-docs:
name: Build docs with pdoc
runs-on: ubuntu-latest
Expand All @@ -43,4 +43,4 @@ jobs:
- name: Build docs with pdoc
uses: './.github/actions/build-docs'
with:
python-version: 3.11
python-version: 3.11
10 changes: 1 addition & 9 deletions .github/workflows/testing-dependency.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,6 @@ name: Dependency Testing
on:
workflow_call: {}

env:
DD_CIVISIBILITY_AGENTLESS_ENABLED: true
DD_API_KEY: ${{ secrets.DATADOG_API_KEY }}
DD_SITE: datadoghq.com
DD_ENV: ci
DD_SERVICE: pinecone-python-client

jobs:
dependency-matrix-setup:
name: Deps setup
Expand All @@ -36,7 +29,6 @@ jobs:
fail-fast: false
matrix:
python_version:
- 3.8
- 3.9
- "3.10"
grpcio_version:
Expand Down Expand Up @@ -124,7 +116,7 @@ jobs:
fail-fast: false
matrix:
python_version:
- 3.8
- 3.9
- 3.11
urllib3_version:
- 1.26.0
Expand Down
46 changes: 30 additions & 16 deletions .github/workflows/testing-integration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,13 @@ name: "Integration Tests"
'on':
workflow_call: {}

env:
DD_CIVISIBILITY_AGENTLESS_ENABLED: true
DD_API_KEY: ${{ secrets.DATADOG_API_KEY }}
DD_SITE: datadoghq.com
DD_ENV: ci
DD_SERVICE: pinecone-python-client

jobs:
plugin-inference:
name: Test inference plugin
runs-on: ubuntu-latest
strategy:
matrix:
python_version: [3.8, 3.12]
python_version: [3.9, 3.12]
steps:
- uses: actions/checkout@v4
- name: 'Set up Python ${{ matrix.python_version }}'
Expand All @@ -27,7 +20,7 @@ jobs:
with:
include_grpc: 'true'
- name: 'Run integration tests'
run: poetry run pytest tests/integration/inference -s -vv --ddtrace
run: poetry run pytest tests/integration/inference -s -vv
env:
PINECONE_DEBUG_CURL: 'true'
PINECONE_API_KEY: '${{ secrets.PINECONE_API_KEY }}'
Expand All @@ -38,7 +31,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python_version: [3.8, 3.12]
python_version: [3.9, 3.12]
use_grpc: [true, false]
metric:
- cosine
Expand All @@ -58,6 +51,27 @@ jobs:
PINECONE_API_KEY: '${{ secrets.PINECONE_API_KEY }}'
freshness_timeout_seconds: 600
skip_weird_id_tests: 'true'

test-asyncio:
name: Data plane asyncio
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python_version: [3.9, 3.12]
use_grpc: [false, true]
spec:
- '{ "serverless": { "region": "us-west-2", "cloud": "aws" }}'
steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/test-asyncio
with:
python_version: '${{ matrix.python_version }}'
use_grpc: '${{ matrix.use_grpc }}'
spec: '${{ matrix.spec }}'
PINECONE_API_KEY: '${{ secrets.PINECONE_API_KEY }}'
freshness_timeout_seconds: 600

# data-plane-pod:
# name: Data plane pod integration tests
# runs-on: ubuntu-latest
Expand Down Expand Up @@ -85,7 +99,7 @@ jobs:
pineconeEnv:
- prod
testConfig:
- python-version: 3.8
- python-version: 3.9
pod: { environment: 'us-east1-gcp'}
- python-version: 3.12
pod: { environment: 'us-east4-gcp'}
Expand All @@ -100,7 +114,7 @@ jobs:
uses: ./.github/actions/setup-poetry
- name: 'Run integration tests (REST, prod)'
if: matrix.pineconeEnv == 'prod'
run: poetry run pytest tests/integration/control/pod -s -v --ddtrace
run: poetry run pytest tests/integration/control/pod -s -v
env:
PINECONE_DEBUG_CURL: 'true'
PINECONE_API_KEY: '${{ secrets.PINECONE_API_KEY }}'
Expand All @@ -110,7 +124,7 @@ jobs:
METRIC: 'cosine'
- name: 'Run integration tests (REST, staging)'
if: matrix.pineconeEnv == 'staging'
run: poetry run pytest tests/integration/control/pod -s -v --ddtrace
run: poetry run pytest tests/integration/control/pod -s -v
env:
PINECONE_DEBUG_CURL: 'true'
PINECONE_CONTROLLER_HOST: 'https://api-staging.pinecone.io'
Expand All @@ -128,7 +142,7 @@ jobs:
pineconeEnv:
- prod
testConfig:
- python-version: 3.8 # Do one test run with 3.8 for sanity check
- python-version: 3.9 # Do one test run with 3.9 for sanity check
pod: { environment: 'us-east1-gcp'}
serverless: { cloud: 'aws', region: 'us-west-2'}
- python-version: 3.12
Expand All @@ -145,7 +159,7 @@ jobs:
uses: ./.github/actions/setup-poetry
- name: 'Run integration tests (REST, prod)'
if: matrix.pineconeEnv == 'prod'
run: poetry run pytest tests/integration/control/serverless -s -vv --ddtrace
run: poetry run pytest tests/integration/control/serverless -s -vv
env:
PINECONE_DEBUG_CURL: 'true'
PINECONE_API_KEY: '${{ secrets.PINECONE_API_KEY }}'
Expand All @@ -154,7 +168,7 @@ jobs:
SERVERLESS_REGION: '${{ matrix.testConfig.serverless.region }}'
- name: 'Run integration tests (REST, staging)'
if: matrix.pineconeEnv == 'staging'
run: poetry run pytest tests/integration/control/serverless -s -vv --ddtrace
run: poetry run pytest tests/integration/control/serverless -s -vv
env:
PINECONE_DEBUG_CURL: 'true'
PINECONE_CONTROLLER_HOST: 'https://api-staging.pinecone.io'
Expand Down
12 changes: 2 additions & 10 deletions .github/workflows/testing-unit.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,6 @@ name: "Unit Tests"
'on':
workflow_call: {}

env:
DD_CIVISIBILITY_AGENTLESS_ENABLED: true
DD_API_KEY: ${{ secrets.DATADOG_API_KEY }}
DD_SITE: datadoghq.com
DD_ENV: ci
DD_SERVICE: pinecone-python-client

jobs:
unit-tests:
name: Unit tests
Expand All @@ -17,7 +10,6 @@ jobs:
fail-fast: false
matrix:
python-version:
- 3.8
- 3.9
- '3.10'
- 3.11
Expand All @@ -37,10 +29,10 @@ jobs:
include_grpc: '${{ matrix.use_grpc }}'
include_types: true
- name: Run unit tests (REST)
run: poetry run pytest --cov=pinecone --timeout=120 tests/unit --ddtrace
run: poetry run pytest --cov=pinecone --timeout=120 tests/unit
- name: Run unit tests (GRPC)
if: ${{ matrix.use_grpc == true }}
run: poetry run pytest --cov=pinecone/grpc --timeout=120 tests/unit_grpc --ddtrace
run: poetry run pytest --cov=pinecone/grpc --timeout=120 tests/unit_grpc
- name: mypy check
env:
INCLUDE_GRPC: '${{ matrix.use_grpc }}'
Expand Down
6 changes: 4 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
scratch

# IDEs
.idea

Expand Down Expand Up @@ -137,7 +139,7 @@ venv.bak/
.ropeproject

# pdocs documentation
# We want to exclude any locally generated artifacts, but we rely on
# We want to exclude any locally generated artifacts, but we rely on
# keeping documentation assets in the docs/ folder.
docs/*
!docs/pinecone-python-client-fork.png
Expand All @@ -155,4 +157,4 @@ dmypy.json
*.hdf5
*~

tests/integration/proxy_config/logs
tests/integration/proxy_config/logs
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,18 @@
# Pinecone Python SDK
# Pinecone Python SDK
![License](https://img.shields.io/github/license/pinecone-io/pinecone-python-client?color=orange) [![CI](https://github.com/pinecone-io/pinecone-python-client/actions/workflows/pr.yaml/badge.svg)](https://github.com/pinecone-io/pinecone-python-client/actions/workflows/pr.yaml)

The official Pinecone Python SDK.

For more information, see the docs at https://docs.pinecone.io


## Documentation

- [**Reference Documentation**](https://sdk.pinecone.io/python/index.html)

### Upgrading the SDK

#### Upgrading from `4.x` to `5.x`
#### Upgrading from `4.x` to `5.x`

As part of an overall move to stop exposing generated code in the package's public interface, an obscure configuration property (`openapi_config`) was removed in favor of individual configuration options such as `proxy_url`, `proxy_headers`, and `ssl_ca_certs`. All of these properties were available in v3 and v4 releases of the SDK, with deprecation notices shown to affected users.

Expand Down
2 changes: 1 addition & 1 deletion codegen/apis
Submodule apis updated from 8562ca to 63e97d
11 changes: 11 additions & 0 deletions codegen/build-oas.sh
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,17 @@ for module in "${modules[@]}"; do
generate_client $module
done

# This also exists in the generated module code, but we need to reference it
# in the pinecone.openapi_support package as well without creating a circular
# dependency.
version_file="pinecone/openapi_support/api_version.py"
echo "# This file is generated by codegen/build-oas.sh" > $version_file
echo "# Do not edit this file manually." >> $version_file
echo "" >> $version_file

echo "API_VERSION = '${version}'" >> $version_file
echo "APIS_REPO_SHA = '$(git rev-parse :codegen/apis)'" >> $version_file

# Even though we want to generate multiple packages, we
# don't want to duplicate every exception and utility class.
# So we do a bit of surgery to find these shared files
Expand Down
2 changes: 1 addition & 1 deletion codegen/python-oas-templates
4 changes: 4 additions & 0 deletions pinecone/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,7 @@
from .models import *

from .utils import __version__

import logging

logging.getLogger("pinecone_plugin_interface").setLevel(logging.CRITICAL)
Loading

0 comments on commit 56d20cb

Please sign in to comment.