diff --git a/CHANGELOG.md b/CHANGELOG.md index af3f4f78..d7d0ce64 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,6 +3,8 @@ ## Unreleased - Added option to Controller to permit caching unsafe HTTP methods. +- Support AWS S3 storages. (#164) +- Move `typing_extensions` from requirements.txt to pyproject.toml. (#161) ## 0.0.21 (29th December, 2023) diff --git a/README.md b/README.md index 3e168250..06f1a790 100644 --- a/README.md +++ b/README.md @@ -36,7 +36,7 @@ - 🧠 **Smart**: Attempts to clearly implement RFC 9111, understands `Vary`, `Etag`, `Last-Modified`, `Cache-Control`, and `Expires` headers, and *handles response re-validation automatically*. - ⚙️ **Configurable**: You have complete control over how the responses are stored and serialized. - 📦 **From the package**: - - Built-in support for [File system](https://en.wikipedia.org/wiki/File_system), [Redis](https://en.wikipedia.org/wiki/Redis), and [SQLite](https://en.wikipedia.org/wiki/SQLite) backends. + - Built-in support for [File system](https://en.wikipedia.org/wiki/File_system), [Redis](https://en.wikipedia.org/wiki/Redis), [SQLite](https://en.wikipedia.org/wiki/SQLite), and [AWS S3](https://aws.amazon.com/s3/) backends. - Built-in support for [JSON](https://en.wikipedia.org/wiki/JSON), [YAML](https://en.wikipedia.org/wiki/YAML), and [pickle](https://docs.python.org/3/library/pickle.html) serializers. - 🚀 **Very fast**: Your requests will be even faster if there are *no IO operations*. diff --git a/docker-compose.yml b/docker-compose.yml index a09b2e36..7d921ceb 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -5,4 +5,4 @@ services: image: 'redis:7.0.12-alpine' command: 'redis-server' ports: - - "127.0.0.1:6379:6379" \ No newline at end of file + - "127.0.0.1:6379:6379" diff --git a/docs/advanced/storages.md b/docs/advanced/storages.md index 530b27a5..7804cac4 100644 --- a/docs/advanced/storages.md +++ b/docs/advanced/storages.md @@ -222,49 +222,55 @@ If you do this, `Hishel` will delete any stored responses whose ttl has expired. In this example, the stored responses were limited to 1 hour. -## Which storage is the best? +### :material-aws: AWS S3 storage -Let's start with some basic benchmarks to see which one is the fastest. +`Hishel` has built-in [AWS S3](https://aws.amazon.com/s3/) support, allowing users to store responses in the cloud. -So there are the results of the benchmarks, where we simply sent 1000 synchronous requests to [hishel.com](https://hishel.com). +Example: -| Storage | Time | -| ----------- | ---- | -| `FileStorage` | 0.4s | -| `SQLiteStorage` | 2s | -| `RedisStorage` | 0.5s | -| `InMemoryStorage` | 0.2s | +```python +import hishel +storage = hishel.S3Storage(bucket_name="cached_responses") +client = hishel.CacheClient(storage=storage) +``` -!!! note - It is important to note that the results may differ for your environment due to a variety of factors that we ignore. +Or if you are using Transports +```python -In most cases, `FileStorage`, `RedisStorage` and `InMemoryStorage` are significantly faster than `SQLiteStorage`, but `SQLiteStorage` can be used if you already have a well-configured sqlite database and want to keep cached responses close to your application data. +import httpx +import hishel -For each storage option, there are some benefits. +storage = hishel.S3Storage(bucket_name="cached_responses") +transport = hishel.CacheTransport(httpx.HTTPTransport(), storage=storage) +``` -FileStorage +#### Custom AWS S3 client -1. **0 configuration** -2. **very fast** -3. **easy access** +If you want to manually configure the client instance, pass it to Hishel. -RedisStorage +```python +import hishel +import boto3 -1. **can be shared** -2. **very fast** -3. **redis features** +s3_client = boto3.client('s3') -InMemoryStorage +storage = hishel.S3Storage(bucket_name="cached_responses", client=client) +client = hishel.CacheClient(storage=storage) +``` + +#### Responses ttl in S3Storage -1. **temporary cache** -2. **very fast** +You can explicitly specify the ttl for stored responses in this manner. -SQLiteStorage +```python +import hishel + +storage = hishel.S3Storage(ttl=3600) +``` + +If you do this, `Hishel` will delete any stored responses whose ttl has expired. +In this example, the stored responses were limited to 1 hour. -1. **can be shared** -2. **sqlite features** -!!! tip - Any [serializer](serializers.md) can be used with any storage because they are all fully compatible. diff --git a/docs/contributing.md b/docs/contributing.md index defc907b..28b08a0b 100644 --- a/docs/contributing.md +++ b/docs/contributing.md @@ -26,7 +26,7 @@ git switch -c my-feature-name - **scripts/install** _Set up the virtual environment and install all the necessary dependencies_ - **scripts/lint** _Runs linter, formatter, and unasync to enforce code style_ - **scripts/check** _Runs all the necessary checks, including linter, formatter, static type analyzer, and unasync checks_ -- **scripts/install** _Runs `scripts/check` + `pytest` over the coverage._ +- **scripts/test** _Runs `scripts/check` + `pytest` over the coverage._ Example: diff --git a/docs/index.md b/docs/index.md index 9ca11ebe..ce0e2879 100644 --- a/docs/index.md +++ b/docs/index.md @@ -36,7 +36,7 @@ - :brain: **Smart**: Attempts to clearly implement RFC 9111, understands `Vary`, `Etag`, `Last-Modified`, `Cache-Control`, and `Expires` headers, and *handles response re-validation automatically*. - :gear: **Configurable**: You have complete control over how the responses are stored and serialized. - :package: **From the package**: - - Built-in support for [File system](https://en.wikipedia.org/wiki/File_system) :file_folder: , [Redis](https://en.wikipedia.org/wiki/Redis) :simple-redis:, and [SQLite](https://en.wikipedia.org/wiki/SQLite) :simple-sqlite: backends. + - Built-in support for [File system](https://en.wikipedia.org/wiki/File_system) :file_folder: , [Redis](https://en.wikipedia.org/wiki/Redis) :simple-redis:, [SQLite](https://en.wikipedia.org/wiki/SQLite) :simple-sqlite: , and [AWS S3](https://aws.amazon.com/s3/) :material-aws: backends. - Built-in support for [JSON](https://en.wikipedia.org/wiki/JSON) :simple-json: , [YAML](https://en.wikipedia.org/wiki/YAML) :simple-yaml:, and [pickle](https://docs.python.org/3/library/pickle.html) serializers. - :rocket: **Very fast**: Your requests will be even faster if there are *no IO operations*. diff --git a/hishel/_async/_storages.py b/hishel/_async/_storages.py index 1bc73f83..88a42447 100644 --- a/hishel/_async/_storages.py +++ b/hishel/_async/_storages.py @@ -5,6 +5,11 @@ from copy import deepcopy from pathlib import Path +try: + import boto3 +except ImportError: # pragma: no cover + boto3 = None # type: ignore + try: import anysqlite except ImportError: # pragma: no cover @@ -16,13 +21,14 @@ from hishel._serializers import BaseSerializer, clone_model from .._files import AsyncFileManager +from .._s3 import AsyncS3Manager from .._serializers import JSONSerializer, Metadata from .._synchronization import AsyncLock from .._utils import float_seconds_to_int_milliseconds logger = logging.getLogger("hishel.storages") -__all__ = ("AsyncFileStorage", "AsyncRedisStorage", "AsyncSQLiteStorage", "AsyncInMemoryStorage") +__all__ = ("AsyncFileStorage", "AsyncRedisStorage", "AsyncSQLiteStorage", "AsyncInMemoryStorage", "AsyncS3Storage") StoredResponse: TypeAlias = tp.Tuple[Response, Request, Metadata] @@ -402,3 +408,89 @@ async def _remove_expired_caches(self) -> None: for key in keys_to_remove: self._cache.remove_key(key) + + +class AsyncS3Storage(AsyncBaseStorage): # pragma: no cover + """ + AWS S3 storage. + + :param bucket_name: The name of the bucket to store the responses in + :type bucket_name: str + :param serializer: Serializer capable of serializing and de-serializing http responses, defaults to None + :type serializer: tp.Optional[BaseSerializer], optional + :param ttl: Specifies the maximum number of seconds that the response can be cached, defaults to None + :type ttl: tp.Optional[tp.Union[int, float]], optional + :param client: A client for S3, defaults to None + :type client: tp.Optional[tp.Any], optional + """ + + def __init__( + self, + bucket_name: str, + serializer: tp.Optional[BaseSerializer] = None, + ttl: tp.Optional[tp.Union[int, float]] = None, + client: tp.Optional[tp.Any] = None, + ) -> None: + super().__init__(serializer, ttl) + + if boto3 is None: # pragma: no cover + raise RuntimeError( + ( + f"The `{type(self).__name__}` was used, but the required packages were not found. " + "Check that you have `Hishel` installed with the `s3` extension as shown.\n" + "```pip install hishel[s3]```" + ) + ) + + self._bucket_name = bucket_name + client = client or boto3.client("s3") + self._s3_manager = AsyncS3Manager(client=client, bucket_name=bucket_name, is_binary=self._serializer.is_binary) + self._lock = AsyncLock() + + async def store(self, key: str, response: Response, request: Request, metadata: Metadata) -> None: + """ + Stores the response in the cache. + + :param key: Hashed value of concatenated HTTP method and URI + :type key: str + :param response: An HTTP response + :type response: httpcore.Response + :param request: An HTTP request + :type request: httpcore.Request + :param metadata: Additioal information about the stored response + :type metadata: Metadata` + """ + + async with self._lock: + serialized = self._serializer.dumps(response=response, request=request, metadata=metadata) + await self._s3_manager.write_to(path=key, data=serialized) + + await self._remove_expired_caches() + + async def retrieve(self, key: str) -> tp.Optional[StoredResponse]: + """ + Retreives the response from the cache using his key. + + :param key: Hashed value of concatenated HTTP method and URI + :type key: str + :return: An HTTP response and its HTTP request. + :rtype: tp.Optional[StoredResponse] + """ + + await self._remove_expired_caches() + async with self._lock: + try: + return self._serializer.loads(await self._s3_manager.read_from(path=key)) + except Exception: + return None + + async def aclose(self) -> None: # pragma: no cover + return + + async def _remove_expired_caches(self) -> None: + if self._ttl is None: + return + + async with self._lock: + converted_ttl = float_seconds_to_int_milliseconds(self._ttl) + await self._s3_manager.remove_expired(ttl=converted_ttl) diff --git a/hishel/_s3.py b/hishel/_s3.py new file mode 100644 index 00000000..c09ab63f --- /dev/null +++ b/hishel/_s3.py @@ -0,0 +1,54 @@ +import typing as tp +from datetime import datetime, timedelta, timezone + +from anyio import to_thread + + +class S3Manager: + def __init__(self, client: tp.Any, bucket_name: str, is_binary: bool = False): + self._client = client + self._bucket_name = bucket_name + self._is_binary = is_binary + + def write_to(self, path: str, data: tp.Union[bytes, str]) -> None: + path = "hishel-" + path + if isinstance(data, str): + data = data.encode("utf-8") + + self._client.put_object(Bucket=self._bucket_name, Key=path, Body=data) + + def read_from(self, path: str) -> tp.Union[bytes, str]: + path = "hishel-" + path + response = self._client.get_object( + Bucket=self._bucket_name, + Key=path, + ) + + content = response["Body"].read() + + if self._is_binary: # pragma: no cover + return tp.cast(bytes, content) + + return tp.cast(str, content.decode("utf-8")) + + def remove_expired(self, ttl: int) -> None: + for obj in self._client.list_objects(Bucket=self._bucket_name).get("Contents", []): + if not obj["Key"].startswith("hishel-"): # pragma: no cover + continue + + if datetime.now(timezone.utc) - obj["LastModified"] > timedelta(milliseconds=ttl): + self._client.delete_object(Bucket=self._bucket_name, Key=obj["Key"]) + + +class AsyncS3Manager: + def __init__(self, client: tp.Any, bucket_name: str, is_binary: bool = False): + self._sync_manager = S3Manager(client, bucket_name, is_binary) + + async def write_to(self, path: str, data: tp.Union[bytes, str]) -> None: + return await to_thread.run_sync(self._sync_manager.write_to, path, data) + + async def read_from(self, path: str) -> tp.Union[bytes, str]: + return await to_thread.run_sync(self._sync_manager.read_from, path) + + async def remove_expired(self, ttl: int) -> None: + return await to_thread.run_sync(self._sync_manager.remove_expired, ttl) diff --git a/hishel/_sync/_storages.py b/hishel/_sync/_storages.py index ecbce657..8e02dec2 100644 --- a/hishel/_sync/_storages.py +++ b/hishel/_sync/_storages.py @@ -5,6 +5,11 @@ from copy import deepcopy from pathlib import Path +try: + import boto3 +except ImportError: # pragma: no cover + boto3 = None # type: ignore + try: import sqlite3 except ImportError: # pragma: no cover @@ -16,13 +21,14 @@ from hishel._serializers import BaseSerializer, clone_model from .._files import FileManager +from .._s3 import S3Manager from .._serializers import JSONSerializer, Metadata from .._synchronization import Lock from .._utils import float_seconds_to_int_milliseconds logger = logging.getLogger("hishel.storages") -__all__ = ("FileStorage", "RedisStorage", "SQLiteStorage", "InMemoryStorage") +__all__ = ("FileStorage", "RedisStorage", "SQLiteStorage", "InMemoryStorage", "S3Storage") StoredResponse: TypeAlias = tp.Tuple[Response, Request, Metadata] @@ -402,3 +408,89 @@ def _remove_expired_caches(self) -> None: for key in keys_to_remove: self._cache.remove_key(key) + + +class S3Storage(BaseStorage): # pragma: no cover + """ + AWS S3 storage. + + :param bucket_name: The name of the bucket to store the responses in + :type bucket_name: str + :param serializer: Serializer capable of serializing and de-serializing http responses, defaults to None + :type serializer: tp.Optional[BaseSerializer], optional + :param ttl: Specifies the maximum number of seconds that the response can be cached, defaults to None + :type ttl: tp.Optional[tp.Union[int, float]], optional + :param client: A client for S3, defaults to None + :type client: tp.Optional[tp.Any], optional + """ + + def __init__( + self, + bucket_name: str, + serializer: tp.Optional[BaseSerializer] = None, + ttl: tp.Optional[tp.Union[int, float]] = None, + client: tp.Optional[tp.Any] = None, + ) -> None: + super().__init__(serializer, ttl) + + if boto3 is None: # pragma: no cover + raise RuntimeError( + ( + f"The `{type(self).__name__}` was used, but the required packages were not found. " + "Check that you have `Hishel` installed with the `s3` extension as shown.\n" + "```pip install hishel[s3]```" + ) + ) + + self._bucket_name = bucket_name + client = client or boto3.client("s3") + self._s3_manager = S3Manager(client=client, bucket_name=bucket_name, is_binary=self._serializer.is_binary) + self._lock = Lock() + + def store(self, key: str, response: Response, request: Request, metadata: Metadata) -> None: + """ + Stores the response in the cache. + + :param key: Hashed value of concatenated HTTP method and URI + :type key: str + :param response: An HTTP response + :type response: httpcore.Response + :param request: An HTTP request + :type request: httpcore.Request + :param metadata: Additioal information about the stored response + :type metadata: Metadata` + """ + + with self._lock: + serialized = self._serializer.dumps(response=response, request=request, metadata=metadata) + self._s3_manager.write_to(path=key, data=serialized) + + self._remove_expired_caches() + + def retrieve(self, key: str) -> tp.Optional[StoredResponse]: + """ + Retreives the response from the cache using his key. + + :param key: Hashed value of concatenated HTTP method and URI + :type key: str + :return: An HTTP response and its HTTP request. + :rtype: tp.Optional[StoredResponse] + """ + + self._remove_expired_caches() + with self._lock: + try: + return self._serializer.loads(self._s3_manager.read_from(path=key)) + except Exception: + return None + + def close(self) -> None: # pragma: no cover + return + + def _remove_expired_caches(self) -> None: + if self._ttl is None: + return + + with self._lock: + converted_ttl = float_seconds_to_int_milliseconds(self._ttl) + self._s3_manager.remove_expired(ttl=converted_ttl) diff --git a/pyproject.toml b/pyproject.toml index 75cb79f4..613d7329 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -29,7 +29,8 @@ classifiers = [ "Topic :: Internet :: WWW/HTTP", ] dependencies = [ - "httpx>=0.22.0" + "httpx>=0.22.0", + "typing_extensions>=4.8.0" ] [project.optional-dependencies] @@ -46,6 +47,10 @@ sqlite = [ "anysqlite>=0.0.5" ] +s3 = [ + "boto3>=1.15.0,<=1.15.3" +] + [project.urls] Homepage = "https://hishel.com" Source = "https://github.com/karpetrosyan/hishel" @@ -88,7 +93,8 @@ filterwarnings = [] [tool.coverage.run] omit = [ "venv/*", - "hishel/_sync/*" + "hishel/_sync/*", + "hishel/_s3.py" ] include = ["hishel/*", "tests/*"] diff --git a/requirements.txt b/requirements.txt index 12a5b4d1..d602c170 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,4 +1,4 @@ --e .[yaml,redis,sqlite] +-e .[yaml,redis,sqlite,s3] # linting ruff==0.1.6 @@ -10,13 +10,13 @@ mkdocs-material==9.5.1 # tests pytest==7.4.3 -pytest-asyncio==0.21.1 -types-redis==4.6.0.7 +pytest-asyncio==0.23.3 +types-boto3==1.0.2 +types-redis==4.6.0.11 anyio==4.1.0 -trio==0.23.1 +trio==0.24.0 coverage==7.3.2 types-PyYAML==6.0.12.12 -typing_extensions==4.8.0 # build hatch==1.7.0 diff --git a/unasync.py b/unasync.py index 93ff2b8f..db66b294 100644 --- a/unasync.py +++ b/unasync.py @@ -18,6 +18,8 @@ ("AsyncRedisStorage", "RedisStorage"), ("AsyncSQLiteStorage", "SQLiteStorage"), ("AsyncInMemoryStorage", "InMemoryStorage"), + ("AsyncS3Storage", "S3Storage"), + ("AsyncS3Manager", "S3Manager"), ("import redis.asyncio as redis", "import redis"), ("AsyncCacheTransport", "CacheTransport"), ("AsyncBaseTransport", "BaseTransport"),