Skip to content

Commit

Permalink
Merge branch 'master' into unsafe_methods
Browse files Browse the repository at this point in the history
  • Loading branch information
karpetrosyan authored Jan 26, 2024
2 parents 958538f + 2004ebc commit dc02a05
Show file tree
Hide file tree
Showing 12 changed files with 296 additions and 42 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
## Unreleased

- Added option to Controller to permit caching unsafe HTTP methods.
- Support AWS S3 storages. (#164)
- Move `typing_extensions` from requirements.txt to pyproject.toml. (#161)

## 0.0.21 (29th December, 2023)

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
- 🧠 **Smart**: Attempts to clearly implement RFC 9111, understands `Vary`, `Etag`, `Last-Modified`, `Cache-Control`, and `Expires` headers, and *handles response re-validation automatically*.
- ⚙️ **Configurable**: You have complete control over how the responses are stored and serialized.
- 📦 **From the package**:
- Built-in support for [File system](https://en.wikipedia.org/wiki/File_system), [Redis](https://en.wikipedia.org/wiki/Redis), and [SQLite](https://en.wikipedia.org/wiki/SQLite) backends.
- Built-in support for [File system](https://en.wikipedia.org/wiki/File_system), [Redis](https://en.wikipedia.org/wiki/Redis), [SQLite](https://en.wikipedia.org/wiki/SQLite), and [AWS S3](https://aws.amazon.com/s3/) backends.
- Built-in support for [JSON](https://en.wikipedia.org/wiki/JSON), [YAML](https://en.wikipedia.org/wiki/YAML), and [pickle](https://docs.python.org/3/library/pickle.html) serializers.
- 🚀 **Very fast**: Your requests will be even faster if there are *no IO operations*.

Expand Down
2 changes: 1 addition & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@ services:
image: 'redis:7.0.12-alpine'
command: 'redis-server'
ports:
- "127.0.0.1:6379:6379"
- "127.0.0.1:6379:6379"
64 changes: 35 additions & 29 deletions docs/advanced/storages.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,49 +222,55 @@ If you do this, `Hishel` will delete any stored responses whose ttl has expired.
In this example, the stored responses were limited to 1 hour.


## Which storage is the best?
### :material-aws: AWS S3 storage

Let's start with some basic benchmarks to see which one is the fastest.
`Hishel` has built-in [AWS S3](https://aws.amazon.com/s3/) support, allowing users to store responses in the cloud.

So there are the results of the benchmarks, where we simply sent 1000 synchronous requests to [hishel.com](https://hishel.com).
Example:

| Storage | Time |
| ----------- | ---- |
| `FileStorage` | 0.4s |
| `SQLiteStorage` | 2s |
| `RedisStorage` | 0.5s |
| `InMemoryStorage` | 0.2s |
```python
import hishel

storage = hishel.S3Storage(bucket_name="cached_responses")
client = hishel.CacheClient(storage=storage)
```

!!! note
It is important to note that the results may differ for your environment due to a variety of factors that we ignore.
Or if you are using Transports
```python

In most cases, `FileStorage`, `RedisStorage` and `InMemoryStorage` are significantly faster than `SQLiteStorage`, but `SQLiteStorage` can be used if you already have a well-configured sqlite database and want to keep cached responses close to your application data.
import httpx
import hishel

For each storage option, there are some benefits.
storage = hishel.S3Storage(bucket_name="cached_responses")
transport = hishel.CacheTransport(httpx.HTTPTransport(), storage=storage)
```

FileStorage
#### Custom AWS S3 client

1. **0 configuration**
2. **very fast**
3. **easy access**
If you want to manually configure the client instance, pass it to Hishel.

RedisStorage
```python
import hishel
import boto3

1. **can be shared**
2. **very fast**
3. **redis features**
s3_client = boto3.client('s3')

InMemoryStorage
storage = hishel.S3Storage(bucket_name="cached_responses", client=client)
client = hishel.CacheClient(storage=storage)
```

#### Responses ttl in S3Storage

1. **temporary cache**
2. **very fast**
You can explicitly specify the ttl for stored responses in this manner.

SQLiteStorage
```python
import hishel

storage = hishel.S3Storage(ttl=3600)
```

If you do this, `Hishel` will delete any stored responses whose ttl has expired.
In this example, the stored responses were limited to 1 hour.

1. **can be shared**
2. **sqlite features**

!!! tip
Any [serializer](serializers.md) can be used with any storage because they are all fully compatible.

2 changes: 1 addition & 1 deletion docs/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ git switch -c my-feature-name
- **scripts/install** _Set up the virtual environment and install all the necessary dependencies_
- **scripts/lint** _Runs linter, formatter, and unasync to enforce code style_
- **scripts/check** _Runs all the necessary checks, including linter, formatter, static type analyzer, and unasync checks_
- **scripts/install** _Runs `scripts/check` + `pytest` over the coverage._
- **scripts/test** _Runs `scripts/check` + `pytest` over the coverage._

Example:

Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
- :brain: **Smart**: Attempts to clearly implement RFC 9111, understands `Vary`, `Etag`, `Last-Modified`, `Cache-Control`, and `Expires` headers, and *handles response re-validation automatically*.
- :gear: **Configurable**: You have complete control over how the responses are stored and serialized.
- :package: **From the package**:
- Built-in support for [File system](https://en.wikipedia.org/wiki/File_system) :file_folder: , [Redis](https://en.wikipedia.org/wiki/Redis) :simple-redis:, and [SQLite](https://en.wikipedia.org/wiki/SQLite) :simple-sqlite: backends.
- Built-in support for [File system](https://en.wikipedia.org/wiki/File_system) :file_folder: , [Redis](https://en.wikipedia.org/wiki/Redis) :simple-redis:, [SQLite](https://en.wikipedia.org/wiki/SQLite) :simple-sqlite: , and [AWS S3](https://aws.amazon.com/s3/) :material-aws: backends.
- Built-in support for [JSON](https://en.wikipedia.org/wiki/JSON) :simple-json: , [YAML](https://en.wikipedia.org/wiki/YAML) :simple-yaml:, and [pickle](https://docs.python.org/3/library/pickle.html) serializers.
- :rocket: **Very fast**: Your requests will be even faster if there are *no IO operations*.

Expand Down
94 changes: 93 additions & 1 deletion hishel/_async/_storages.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@
from copy import deepcopy
from pathlib import Path

try:
import boto3
except ImportError: # pragma: no cover
boto3 = None # type: ignore

try:
import anysqlite
except ImportError: # pragma: no cover
Expand All @@ -16,13 +21,14 @@
from hishel._serializers import BaseSerializer, clone_model

from .._files import AsyncFileManager
from .._s3 import AsyncS3Manager
from .._serializers import JSONSerializer, Metadata
from .._synchronization import AsyncLock
from .._utils import float_seconds_to_int_milliseconds

logger = logging.getLogger("hishel.storages")

__all__ = ("AsyncFileStorage", "AsyncRedisStorage", "AsyncSQLiteStorage", "AsyncInMemoryStorage")
__all__ = ("AsyncFileStorage", "AsyncRedisStorage", "AsyncSQLiteStorage", "AsyncInMemoryStorage", "AsyncS3Storage")

StoredResponse: TypeAlias = tp.Tuple[Response, Request, Metadata]

Expand Down Expand Up @@ -402,3 +408,89 @@ async def _remove_expired_caches(self) -> None:

for key in keys_to_remove:
self._cache.remove_key(key)


class AsyncS3Storage(AsyncBaseStorage): # pragma: no cover
"""
AWS S3 storage.
:param bucket_name: The name of the bucket to store the responses in
:type bucket_name: str
:param serializer: Serializer capable of serializing and de-serializing http responses, defaults to None
:type serializer: tp.Optional[BaseSerializer], optional
:param ttl: Specifies the maximum number of seconds that the response can be cached, defaults to None
:type ttl: tp.Optional[tp.Union[int, float]], optional
:param client: A client for S3, defaults to None
:type client: tp.Optional[tp.Any], optional
"""

def __init__(
self,
bucket_name: str,
serializer: tp.Optional[BaseSerializer] = None,
ttl: tp.Optional[tp.Union[int, float]] = None,
client: tp.Optional[tp.Any] = None,
) -> None:
super().__init__(serializer, ttl)

if boto3 is None: # pragma: no cover
raise RuntimeError(
(
f"The `{type(self).__name__}` was used, but the required packages were not found. "
"Check that you have `Hishel` installed with the `s3` extension as shown.\n"
"```pip install hishel[s3]```"
)
)

self._bucket_name = bucket_name
client = client or boto3.client("s3")
self._s3_manager = AsyncS3Manager(client=client, bucket_name=bucket_name, is_binary=self._serializer.is_binary)
self._lock = AsyncLock()

async def store(self, key: str, response: Response, request: Request, metadata: Metadata) -> None:
"""
Stores the response in the cache.
:param key: Hashed value of concatenated HTTP method and URI
:type key: str
:param response: An HTTP response
:type response: httpcore.Response
:param request: An HTTP request
:type request: httpcore.Request
:param metadata: Additioal information about the stored response
:type metadata: Metadata`
"""

async with self._lock:
serialized = self._serializer.dumps(response=response, request=request, metadata=metadata)
await self._s3_manager.write_to(path=key, data=serialized)

await self._remove_expired_caches()

async def retrieve(self, key: str) -> tp.Optional[StoredResponse]:
"""
Retreives the response from the cache using his key.
:param key: Hashed value of concatenated HTTP method and URI
:type key: str
:return: An HTTP response and its HTTP request.
:rtype: tp.Optional[StoredResponse]
"""

await self._remove_expired_caches()
async with self._lock:
try:
return self._serializer.loads(await self._s3_manager.read_from(path=key))
except Exception:
return None

async def aclose(self) -> None: # pragma: no cover
return

async def _remove_expired_caches(self) -> None:
if self._ttl is None:
return

async with self._lock:
converted_ttl = float_seconds_to_int_milliseconds(self._ttl)
await self._s3_manager.remove_expired(ttl=converted_ttl)
54 changes: 54 additions & 0 deletions hishel/_s3.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
import typing as tp
from datetime import datetime, timedelta, timezone

from anyio import to_thread


class S3Manager:
def __init__(self, client: tp.Any, bucket_name: str, is_binary: bool = False):
self._client = client
self._bucket_name = bucket_name
self._is_binary = is_binary

def write_to(self, path: str, data: tp.Union[bytes, str]) -> None:
path = "hishel-" + path
if isinstance(data, str):
data = data.encode("utf-8")

self._client.put_object(Bucket=self._bucket_name, Key=path, Body=data)

def read_from(self, path: str) -> tp.Union[bytes, str]:
path = "hishel-" + path
response = self._client.get_object(
Bucket=self._bucket_name,
Key=path,
)

content = response["Body"].read()

if self._is_binary: # pragma: no cover
return tp.cast(bytes, content)

return tp.cast(str, content.decode("utf-8"))

def remove_expired(self, ttl: int) -> None:
for obj in self._client.list_objects(Bucket=self._bucket_name).get("Contents", []):
if not obj["Key"].startswith("hishel-"): # pragma: no cover
continue

if datetime.now(timezone.utc) - obj["LastModified"] > timedelta(milliseconds=ttl):
self._client.delete_object(Bucket=self._bucket_name, Key=obj["Key"])


class AsyncS3Manager:
def __init__(self, client: tp.Any, bucket_name: str, is_binary: bool = False):
self._sync_manager = S3Manager(client, bucket_name, is_binary)

async def write_to(self, path: str, data: tp.Union[bytes, str]) -> None:
return await to_thread.run_sync(self._sync_manager.write_to, path, data)

async def read_from(self, path: str) -> tp.Union[bytes, str]:
return await to_thread.run_sync(self._sync_manager.read_from, path)

async def remove_expired(self, ttl: int) -> None:
return await to_thread.run_sync(self._sync_manager.remove_expired, ttl)
Loading

0 comments on commit dc02a05

Please sign in to comment.