🎓 LLM Drifts: How Is ChatGPT’s Behavior Changing over Time?

Large language models (LLM) services such as GPT-4 and GPT-3.5 are widely being used. However, when and how these models are updated over time is opaque. Towards filling in this gap, this repository contains (i) a diverse set of datasets, and (ii) generations from popular LLMs (including GPT-4 and GPT-3.5) on these datasets over time.

🔍 Main Findings

Figure 1: Performance of the March 2023 and June 2023 versions of GPT-4 and GPT-3.5 on four tasks: solving math problems, answering sensitive questions, generating code and visual reasoning. The performances of GPT-4 and GPT-3.5 can vary substantially over time, and for the worse in some tasks.

What are the main findings? In a nutshell, there are many interesting performance shifts over time. For example, GPT-4 (March 2023) was very good at identifying prime numbers (accuracy 97.6%) but GPT-4 (June 2023) was very poor on these same questions (accuracy 2.4%). Interestingly GPT-3.5 (June 2023) was much better than GPT-3.5 (March 2023) in this task. We hope releasing the datasets and generations can help the community to understand how LLM services drift better. The above figure gives a quantatitive summary.

🚀 Reproducing the Results

You can directly run the Google Colab Notebook to reproduce the monitored performance drifts in our paper. You don't need API keys to get started. You can also use the local intro notebook directly.

💾 Datasets and Generations

The datasets and generations can be found under generation/. Each csv file corresponds to one dataset. One record/row corresponds to one query and the generation from one LLM service.

Figure 2: The first few rows in the LLM generations on PRIME dataset.

The above figure shows the first few rows in the generation/PRIME_EVAL.csv. It includes the model, query parameters (such as temperature), the query, the reference answer, the generated answer, and latency. Such information could be leverage to study various aspects of LLM services.

📚 Read More

You can find more details in the academic paper:

How Is ChatGPT’s Behavior Changing over Time?

🎯 Reference

If you use our findings and/or datasets in a research paper, please cite our work as follows:

@article{chen2023LLMDrift,
  title={How Is ChatGPT’s Behavior Changing over Time?},
  author={Chen, Lingjiao and Zaharia, Matei and Zou, James},
  journal={arXiv preprint arXiv:2307.09009},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
asset		asset
figure		figure
generation		generation
src		src
Intro.ipynb		Intro.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎓 LLM Drifts: How Is ChatGPT’s Behavior Changing over Time?

🔍 Main Findings

🚀 Reproducing the Results

💾 Datasets and Generations

📚 Read More

🎯 Reference

About

Releases

Packages

Languages

License

grkhcl/LLMDrift

Folders and files

Latest commit

History

Repository files navigation

🎓 LLM Drifts: How Is ChatGPT’s Behavior Changing over Time?

🔍 Main Findings

🚀 Reproducing the Results

💾 Datasets and Generations

📚 Read More

🎯 Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages