Skip to content

Commit

Permalink
Define enum classes with control plane configuration options (#437)
Browse files Browse the repository at this point in the history
## Problem

Many configuration fields take string inputs even though there is a
limited range of accepted values. It's poor UX having to go into
documentation or examples in order to know which string values are
available. This also means support via type hints from code editors are
not available to keep people moving quickly.

## Solution

- Create Enum classes for control plane configuration fields under
`pinecone.enum`:
  - General index configs: `Metric`, `VectorType`, `DeletionProtection`
- Serverless spec: `CloudProvider`, `AwsRegion`, `GcpRegion`,
`AzureRegion`
  - Pod spec: `PodIndexEnvironment`, `PodType`

kwargs that accept these values are loosely typed as the union of the
enum type and string. This should prevent unnecessary breaking changes
and maintain flexibility to accept new values that may not be avaialble
or known at the time this SDK release is published. For example, if in
the future pinecone can deploy to more Azure regions, this loose typing
would allow a person to pass that configuration as region without
necessarily having to update their SDK to satisfy to a type check.

## Usage: Serverless

```python
# Old way, which still works but requires you to know what values are available
from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key='key')

pc.create_index(
    name="my-index",
    dimension=1024,
    metric="cosine",
    spec=ServerlessSpec(
        cloud="aws", 
        region="us-west-2"
    ),
    vector_type="sparse"
)
```

```python
# New way, using enum types
from pinecone import (
    Pinecone, 
    ServerlessSpec, 
    Metric, 
    VectorType, 
    CloudProvider, 
    AwsRegion
)

pc = Pinecone(api_key='key')

pc.create_index(
    name="my-index",
    dimension=1024,
    metric=Metric.COSINE,
    spec=ServerlessSpec(
        cloud=CloudProvider.AWS, 
        region=AwsRegion.US_WEST_2
    ),
    vector_type=VectorType.SPARSE
)

```

## Usage: Pods

```python
# old way, you have to know all the magic strings

from pinecone import Pinecone, PodSpec

pc = Pinecone(api_key='key')

pc.create_index(
    name="my-index",
    dimension=1024,
    spec=PodSpec(
        pod_type='s1.x4'
        environment="us-east1-gcp"
    ),
)

# Later, when scaling
pc.configure_index(
    name="my-index",
    pod_type="s1.x8"
)
```

```python
# New way, using enum types
from pinecone import (
    Pinecone, 
    PodSpec, 
    PodIndexEnvironment,
    PodType
)

pc = Pinecone(api_key='key')

pc.create_index(
    name="my-index",
    dimension=1024,
    spec=PodSpec(
        environment=PodIndexEnvironment.US_EAST1_GCP,
        pod_type=PodType.S1_X4
    )
)

# Later, when scaling
pc.configure_index(
    name="my-index",
    pod_type=PodType.S1_X8
)
```

## Type of Change

- [ ] Bug fix (non-breaking change which fixes an issue)
- [X] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [ ] This change requires a documentation update
- [ ] Infrastructure change (CI configs, etc)
- [ ] Non-code change (docs, etc)
- [ ] None of the above: (explain here)
  • Loading branch information
jhamon authored Jan 29, 2025
1 parent 1ed5932 commit e522c7d
Show file tree
Hide file tree
Showing 18 changed files with 297 additions and 53 deletions.
1 change: 1 addition & 0 deletions pinecone/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
from .control import *
from .data import *
from .models import *
from .enums import *

from .utils import __version__

Expand Down
38 changes: 25 additions & 13 deletions pinecone/control/pinecone.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import time
import logging
from typing import Optional, Dict, Any, Union, Literal
from typing import Optional, Dict, Any, Union

from .index_host_store import IndexHostStore
from .pinecone_interface import PineconeDBControlInterface
Expand All @@ -18,7 +18,7 @@
ConfigureIndexRequest,
ConfigureIndexRequestSpec,
ConfigureIndexRequestSpecPod,
DeletionProtection,
DeletionProtection as DeletionProtectionModel,
IndexSpec,
IndexTags,
ServerlessSpec as ServerlessSpecModel,
Expand All @@ -31,6 +31,7 @@
from pinecone.utils import parse_non_empty_args, docslinks

from pinecone.data import _Index, _AsyncioIndex, _Inference
from pinecone.enums import Metric, VectorType, DeletionProtection, PodType

from pinecone_plugin_interface import load_and_install as install_plugins

Expand Down Expand Up @@ -179,19 +180,26 @@ def create_index(
name: str,
spec: Union[Dict, ServerlessSpec, PodSpec],
dimension: Optional[int] = None,
metric: Optional[Literal["cosine", "euclidean", "dotproduct"]] = "cosine",
metric: Optional[Union[Metric, str]] = Metric.COSINE,
timeout: Optional[int] = None,
deletion_protection: Optional[Literal["enabled", "disabled"]] = "disabled",
vector_type: Optional[Literal["dense", "sparse"]] = "dense",
deletion_protection: Optional[Union[DeletionProtection, str]] = DeletionProtection.DISABLED,
vector_type: Optional[Union[VectorType, str]] = VectorType.DENSE,
tags: Optional[Dict[str, str]] = None,
):
api_instance = self.index_api
# Convert Enums to their string values if necessary
metric = metric.value if isinstance(metric, Metric) else str(metric)
vector_type = vector_type.value if isinstance(vector_type, VectorType) else str(vector_type)
deletion_protection = (
deletion_protection.value
if isinstance(deletion_protection, DeletionProtection)
else str(deletion_protection)
)

if vector_type == "sparse" and dimension is not None:
if vector_type == VectorType.SPARSE.value and dimension is not None:
raise ValueError("dimension should not be specified for sparse indexes")

if deletion_protection in ["enabled", "disabled"]:
dp = DeletionProtection(deletion_protection)
dp = DeletionProtectionModel(deletion_protection)
else:
raise ValueError("deletion_protection must be either 'enabled' or 'disabled'")

Expand All @@ -202,6 +210,7 @@ def create_index(

index_spec = self._parse_index_spec(spec)

api_instance = self.index_api
api_instance.create_index(
create_index_request=CreateIndexRequest(
**parse_non_empty_args(
Expand Down Expand Up @@ -301,17 +310,19 @@ def configure_index(
self,
name: str,
replicas: Optional[int] = None,
pod_type: Optional[str] = None,
deletion_protection: Optional[Literal["enabled", "disabled"]] = None,
pod_type: Optional[Union[PodType, str]] = None,
deletion_protection: Optional[Union[DeletionProtection, str]] = None,
tags: Optional[Dict[str, str]] = None,
):
api_instance = self.index_api
description = self.describe_index(name=name)

if deletion_protection is None:
dp = DeletionProtection(description.deletion_protection)
dp = DeletionProtectionModel(description.deletion_protection)
elif isinstance(deletion_protection, DeletionProtection):
dp = DeletionProtectionModel(deletion_protection.value)
elif deletion_protection in ["enabled", "disabled"]:
dp = DeletionProtection(deletion_protection)
dp = DeletionProtectionModel(deletion_protection)
else:
raise ValueError("deletion_protection must be either 'enabled' or 'disabled'")

Expand All @@ -330,7 +341,8 @@ def configure_index(

pod_config_args: Dict[str, Any] = {}
if pod_type:
pod_config_args.update(pod_type=pod_type)
new_pod_type = pod_type.value if isinstance(pod_type, PodType) else pod_type
pod_config_args.update(pod_type=new_pod_type)
if replicas:
pod_config_args.update(replicas=replicas)

Expand Down
13 changes: 7 additions & 6 deletions pinecone/control/pinecone_interface.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from abc import ABC, abstractmethod

from typing import Optional, Dict, Union, Literal
from typing import Optional, Dict, Union


from pinecone.config import Config
Expand All @@ -9,6 +9,7 @@


from pinecone.models import ServerlessSpec, PodSpec, IndexList, CollectionList
from pinecone.enums import Metric, VectorType, DeletionProtection, PodType


class PineconeDBControlInterface(ABC):
Expand Down Expand Up @@ -177,10 +178,10 @@ def create_index(
name: str,
spec: Union[Dict, ServerlessSpec, PodSpec],
dimension: Optional[int],
metric: Optional[Literal["cosine", "euclidean", "dotproduct"]] = "cosine",
metric: Optional[Union[Metric, str]] = Metric.COSINE,
timeout: Optional[int] = None,
deletion_protection: Optional[Literal["enabled", "disabled"]] = "disabled",
vector_type: Optional[Literal["dense", "sparse"]] = "dense",
deletion_protection: Optional[Union[DeletionProtection, str]] = DeletionProtection.DISABLED,
vector_type: Optional[Union[VectorType, str]] = VectorType.DENSE,
):
"""Creates a Pinecone index.
Expand Down Expand Up @@ -377,8 +378,8 @@ def configure_index(
self,
name: str,
replicas: Optional[int] = None,
pod_type: Optional[str] = None,
deletion_protection: Optional[Literal["enabled", "disabled"]] = None,
pod_type: Optional[Union[PodType, str]] = None,
deletion_protection: Optional[Union[DeletionProtection, str]] = None,
tags: Optional[Dict[str, str]] = None,
):
"""This method is used to scale configuration fields for your pod-based Pinecone index.
Expand Down
18 changes: 18 additions & 0 deletions pinecone/enums/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
from .clouds import CloudProvider, AwsRegion, GcpRegion, AzureRegion
from .deletion_protection import DeletionProtection
from .metric import Metric
from .pod_index_environment import PodIndexEnvironment
from .pod_type import PodType
from .vector_type import VectorType

__all__ = [
"CloudProvider",
"AwsRegion",
"GcpRegion",
"AzureRegion",
"DeletionProtection",
"Metric",
"PodIndexEnvironment",
"PodType",
"VectorType",
]
30 changes: 30 additions & 0 deletions pinecone/enums/clouds.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
from enum import Enum


class CloudProvider(Enum):
"""Cloud providers available for use with Pinecone serverless indexes"""

AWS = "aws"
GCP = "gcp"
AZURE = "azure"


class AwsRegion(Enum):
"""AWS (Amazon Web Services) regions available for use with Pinecone serverless indexes"""

US_EAST_1 = "us-east-1"
US_WEST_2 = "us-west-2"
EU_WEST_1 = "eu-west-1"


class GcpRegion(Enum):
"""GCP (Google Cloud Platform) regions available for use with Pinecone serverless indexes"""

US_CENTRAL1 = "us-central1"
EUROPE_WEST4 = "europe-west4"


class AzureRegion(Enum):
"""Azure regions available for use with Pinecone serverless indexes"""

EAST_US = "eastus2"
15 changes: 15 additions & 0 deletions pinecone/enums/deletion_protection.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
from enum import Enum


class DeletionProtection(Enum):
"""The DeletionProtection setting of an index indicates whether the index
can be the index cannot be deleted using the delete_index() method.
If disabled, the index can be deleted. If enabled, calling delete_index()
will raise an error.
This setting can be changed using the configure_index() method.
"""

ENABLED = "enabled"
DISABLED = "disabled"
11 changes: 11 additions & 0 deletions pinecone/enums/metric.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
from enum import Enum


class Metric(Enum):
"""
The metric specifies how Pinecone should calculate the distance between vectors when querying an index.
"""

COSINE = "cosine"
EUCLIDEAN = "euclidean"
DOTPRODUCT = "dotproduct"
20 changes: 20 additions & 0 deletions pinecone/enums/pod_index_environment.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
from enum import Enum


class PodIndexEnvironment(Enum):
"""
These environment strings are used to specify where a pod index should be deployed.
"""

US_WEST1_GCP = "us-west1-gcp"
US_CENTRAL1_GCP = "us-central1-gcp"
US_WEST4_GCP = "us-west4-gcp"
US_EAST4_GCP = "us-east4-gcp"
NORTHAMERICA_NORTHEAST1_GCP = "northamerica-northeast1-gcp"
ASIA_NORTHEAST1_GCP = "asia-northeast1-gcp"
ASIA_SOUTHEAST1_GCP = "asia-southeast1-gcp"
US_EAST1_GCP = "us-east1-gcp"
EU_WEST1_GCP = "eu-west1-gcp"
EU_WEST4_GCP = "eu-west4-gcp"
US_EAST1_AWS = "us-east-1-aws"
EASTUS_AZURE = "eastus-azure"
20 changes: 20 additions & 0 deletions pinecone/enums/pod_type.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
from enum import Enum


class PodType(Enum):
"""
PodType represents the available pod types for a pod index.
"""

P1_X1 = "p1.x1"
P1_X2 = "p1.x2"
P1_X4 = "p1.x4"
P1_X8 = "p1.x8"
S1_X1 = "s1.x1"
S1_X2 = "s1.x2"
S1_X4 = "s1.x4"
S1_X8 = "s1.x8"
P2_X1 = "p2.x1"
P2_X2 = "p2.x2"
P2_X4 = "p2.x4"
P2_X8 = "p2.x8"
14 changes: 14 additions & 0 deletions pinecone/enums/vector_type.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
from enum import Enum


class VectorType(Enum):
"""
VectorType is used to specifiy the type of vector you will store in the index.
Dense vectors are used to store dense embeddings, which are vectors with non-zero values in most of the dimensions.
Sparse vectors are used to store sparse embeddings, which allow vectors with zero values in most of the dimensions to be represented concisely.
"""

DENSE = "dense"
SPARSE = "sparse"
3 changes: 2 additions & 1 deletion pinecone/models/__init__.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
from .index_description import ServerlessSpecDefinition, PodSpecDefinition
from .collection_description import CollectionDescription
from .serverless_spec import ServerlessSpec
from .pod_spec import PodSpec
from .pod_spec import PodSpec, PodType
from .index_list import IndexList
from .collection_list import CollectionList
from .index_model import IndexModel
from ..enums.metric import Metric

__all__ = [
"CollectionDescription",
Expand Down
62 changes: 38 additions & 24 deletions pinecone/models/pod_spec.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
from typing import NamedTuple, Optional, Dict
from dataclasses import dataclass, field
from typing import Optional, Dict, Union

from ..enums import PodIndexEnvironment, PodType

class PodSpec(NamedTuple):

@dataclass(frozen=True)
class PodSpec:
"""
PodSpec represents the configuration used to deploy a pod-based index.
Expand Down Expand Up @@ -33,41 +37,51 @@ class PodSpec(NamedTuple):
This value combines pod type and pod size into a single string. This configuration is your main lever for vertical scaling.
"""

metadata_config: Optional[Dict] = {}
metadata_config: Optional[Dict] = field(default_factory=dict)
"""
If you are storing a lot of metadata, you can use this configuration to limit the fields which are indexed for search.
If you are storing a lot of metadata, you can use this configuration to limit the fields which are indexed for search.
This configuration should be a dictionary with the key 'indexed' and the value as a list of fields to index.
For example, if your vectors have metadata along like this:
```python
from pinecone import Vector
vector = Vector(
id='237438191',
values=[...],
metadata={
'productId': '237438191',
'description': 'Stainless Steel Tumbler with Straw',
'category': 'kitchen',
'price': '19.99'
}
)
```
You might want to limit which fields are indexed with metadata config such as this:
Example:
```
{'indexed': ['field1', 'field2']}
```
"""

source_collection: Optional[str] = None
"""
The name of the collection to use as the source for the pod index. This configuration is only used when creating a pod index from an existing collection.
"""

def asdict(self):
def __init__(
self,
environment: Union[PodIndexEnvironment, str],
pod_type: Union[PodType, str] = "p1.x1",
replicas: Optional[int] = None,
shards: Optional[int] = None,
pods: Optional[int] = None,
metadata_config: Optional[Dict] = None,
source_collection: Optional[str] = None,
):
object.__setattr__(
self,
"environment",
environment.value if isinstance(environment, PodIndexEnvironment) else str(environment),
)
object.__setattr__(
self, "pod_type", pod_type.value if isinstance(pod_type, PodType) else str(pod_type)
)
object.__setattr__(self, "replicas", replicas)
object.__setattr__(self, "shards", shards)
object.__setattr__(self, "pods", pods)
object.__setattr__(
self, "metadata_config", metadata_config if metadata_config is not None else {}
)
object.__setattr__(self, "source_collection", source_collection)

def asdict(self) -> Dict:
"""
Returns the PodSpec as a dictionary.
"""
return {"pod": self._asdict()}
return {"pod": self.__dict__}
Loading

0 comments on commit e522c7d

Please sign in to comment.