Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asyncio for data plane calls #435

Merged
merged 15 commits into from
Jan 28, 2025
Merged

Conversation

jhamon
Copy link
Collaborator

@jhamon jhamon commented Jan 28, 2025

Problem

The purpose of this PR is to introduce a class, AsyncioIndex that provides an async version of the functionality found in the Index client. This includes standard index data plane operations such as upsert, query, etc as well as bulk import operations (start_import, list_imports, etc).

Solution

This is a very complex diff with many moving parts.

  • New dependency on aiohttp, an asyncio-compatible http client.
  • New dev dependency on pytest-asyncio to support async testing
  • Heavy refactoring in pinecone/openapi_support to introduce asyncio-variants of existing classes: AsyncioApiClient, AsyncioEndpoint, and AiohttpRestClient. I don't love the way any of these are currently laid out, but for simplicity sake I decided to hew close to the existing organization since this was already going to be a complex change.
  • Adjustments to our private python openapi templates in order to generate asyncio versions of api client (e.g. AsyncioVectorOperationsApi) objects and reference the objects named above.
  • Create a new class, AsyncioIndex that uses these asyncio variant objects. Since the majority of the logic (validation, etc) inside each data plane method of Index was previously extracted into IndexRequestFactory, the amount of actual new code needed inside this class was minimal async from signature changes to use async / await.
  • Add new integration test covering asyncio usage with both sparse and dense indexes.
  • Very mechnical refactoring to also bring bulk import functionality into the AsyncioIndex class as a mixin. I did not add automated tests for these due to the external dependencies required to properly integration test this (e.g. parquet files hosted on S3). Will need to manually verify these in testing.

Also:

  • Drop python 3.8, which is now end of life
  • Removed ddtrace dev dependency for logging test info in datadog. This was giving me a lot of annoying errors when running tests locally. I will troubleshoot and bring it back in later.
  • Updated jinja and virtualenv versions in our poetry.lock file to resolve dependabot alerts
  • Work to implement the asyncio codepath for GRPC was previously handled in a different diff

Usage

In a standalone script, you might do something like this:

import random
import asyncio
from pinecone import Pinecone

async def main():
    pc = Pinecone(api_key="key")
    async with pc.AsyncioIndex(name="index-name") as index:
        tasks = [
            index.query(
                vector=[random.random()] * 1024,
                namespace="ns1",
                include_values=False,
                include_metadata=True,
                top_k=2
        ) for _ in range(20)]
        
        # Execute 20 queries in parallel
        results = await asyncio.gather(*tasks)
        print(results)
    
asyncio.run(main())

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • Infrastructure change (CI configs, etc)
  • Non-code change (docs, etc)
  • None of the above: (explain here)

Test Plan

Describe specific steps for validating this change.

@jhamon jhamon changed the base branch from main to release-candidate/2025-01 January 28, 2025 13:44
@jhamon jhamon marked this pull request as ready for review January 28, 2025 14:53
@jhamon jhamon requested a review from austin-denoble January 28, 2025 14:53
@jhamon jhamon merged commit 56d20cb into release-candidate/2025-01 Jan 28, 2025
73 checks passed
@jhamon jhamon deleted the jhamon/asyncio-rest branch January 28, 2025 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant