Skip to content

Commit

Permalink
added continuous optimization
Browse files Browse the repository at this point in the history
  • Loading branch information
vnkn committed Oct 17, 2024
1 parent 5d74836 commit aed757d
Showing 1 changed file with 54 additions and 0 deletions.
54 changes: 54 additions & 0 deletions docs/features/continuous_optimization.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Continuous Optimization with Live Data

Continuous optimization refers to a process where machine learning experiments are continuously tuned based on live data inputs, ensuring that models remain adaptive and responsive to changing conditions. This can be crucial for experiments requiring real-time data updates, such as applications using constantly evolving datasets from sources like cloud storage or APIs.

In the following setup, we demonstrate how to implement continuous optimization using live data retrieved from an Amazon S3 bucket. We use a hyperparameter tuning approach and an experiment configuration that allows fetching the latest data for evaluation. By adjusting model hyperparameters and using fresh data, the experiment optimizes for better performance over time.

## Example Python Code

```python
experiment = Experiment(
name="Sample_Nomadic_Experiment",
model=OpenAIModel(api_keys={"OPENAI_API_KEY": os.environ["OPENAI_API_KEY"]}),
evaluation_dataset=Dataset(
continuous_source={
"bucket_name": "testBucket",
"json_file_key": "YOUR_KEY_HERE"
}
),
params={"temperature", "max_tokens"},
evaluator=SemanticSimilarityEvaluator(embed_model=OpenAIEmbedding()),
# current_hp_values = {...}
# fixed_param_dict = {...}
)

results = experiment.run(
param_dict={
"temperature": tune.choice([0.1, 0.5, 0.9]),
"max_tokens": tune.choice([50, 100, 200]),
},
evaluation_dataset_cutoff_date=datetime.fromisoformat("2023-09-01T00:00:00")
)

best_result = experiment.experiment_result.run_results[0]

print(f"Best result: {best_result.params} - Score: {best_result.score}")
```
### Explanation

#### Experiment Configuration:
- `name`: Assigns a name to your experiment.
- `model`: Specifies the model to use, here an `OpenAIModel` with the necessary API key.
- `evaluation_dataset`: Uses a `Dataset` that fetches data from a continuous source defined by your S3 bucket and JSON file key, ensuring that the experiment runs with the latest data.
- `params`: Lists the hyperparameters ("temperature" and "max_tokens") that are subject to tuning during the experiment.
- `evaluator`: Sets up the evaluation method, here using `SemanticSimilarityEvaluator` which evaluates the experiment results based on semantic similarity using `OpenAIEmbedding`.

#### Running the Experiment:
- `param_dict`: Defines the ranges of hyperparameter values to explore using `tune.choice()`. The specified hyperparameters in this example are `temperature` and `max_tokens`.
- `evaluation_dataset_cutoff_date`: Filters the dataset to include only data uploaded after the specified cutoff date (in this case, September 1, 2023), ensuring that older data does not influence the results.

#### Retrieving Results:
- After running the experiment, the best result is retrieved from the experiment's run results.
- The code prints out the best hyperparameter settings and their corresponding evaluation score, which can be used to guide further model optimizations.

This setup allows for the integration of live data into a continuous optimization process, making it easier to keep machine learning models updated and aligned with current data trends.

0 comments on commit aed757d

Please sign in to comment.