Skip to content

Commit

Permalink
docs: clarified language on "prompt template" vs. "prompt"
Browse files Browse the repository at this point in the history
  • Loading branch information
MoritzLaurer committed Nov 28, 2024
1 parent b62097a commit d54497b
Show file tree
Hide file tree
Showing 3 changed files with 48 additions and 23 deletions.
10 changes: 5 additions & 5 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
# Prompts on the Hugging Face Hub
# Prompt templates on the Hugging Face Hub

Prompts have become key artifacts for researchers and practitioners working with AI. There is, however, no standardized way of sharing prompts. Prompts are shared on the HF Hub in [.txt files](https://huggingface.co/HuggingFaceFW/fineweb-edu-classifier/blob/main/utils/prompt.txt), in [HF datasets](https://huggingface.co/datasets/fka/awesome-chatgpt-prompts), as strings in [model cards](https://huggingface.co/OpenGVLab/InternVL2-8B#grounding-benchmarks), or on GitHub as [python strings](https://github.com/huggingface/cosmopedia/tree/main/prompts), in [JSON, YAML](https://github.com/hwchase17/langchain-hub/blob/master/prompts/README.md), or in [Jinja2](https://github.com/argilla-io/distilabel/tree/main/src/distilabel/steps/tasks/templates).
Prompt templates have become key artifacts for researchers and practitioners working with AI. There is, however, no standardized way of sharing prompt templates. Prompts and prompt templates are shared on the HF Hub in [.txt files](https://huggingface.co/HuggingFaceFW/fineweb-edu-classifier/blob/main/utils/prompt.txt), in [HF datasets](https://huggingface.co/datasets/fka/awesome-chatgpt-prompts), as strings in [model cards](https://huggingface.co/OpenGVLab/InternVL2-8B#grounding-benchmarks), or on GitHub as [python strings](https://github.com/huggingface/cosmopedia/tree/main/prompts), in [JSON, YAML](https://github.com/hwchase17/langchain-hub/blob/master/prompts/README.md), or in [Jinja2](https://github.com/argilla-io/distilabel/tree/main/src/distilabel/steps/tasks/templates).



## Objectives and non-objectives of this library
### Objectives
1. Provide a Python library that simplifies and standardises the sharing of prompts on the Hugging Face Hub.
2. Start an open discussion on the best way of standardizing and encouraging the sharing of prompts on the HF Hub, building upon the HF Hub's existing repository types and ensuring interoperability with other prompt-related libraries.
1. Provide a Python library that simplifies and standardises the sharing of prompt templates on the Hugging Face Hub.
2. Start an open discussion on the best way of standardizing and encouraging the sharing of prompt templates on the HF Hub, building upon the HF Hub's existing repository types and ensuring interoperability with other prompt-related libraries.
### Non-Objectives:
- Compete with full-featured prompting libraries like [LangChain](https://github.com/langchain-ai/langchain), [ell](https://docs.ell.so/reference/index.html), etc. The objective is, instead, a simple solution for sharing prompts on the HF Hub, which is compatible with other libraries and which the community can build upon.
- Compete with full-featured prompting libraries like [LangChain](https://github.com/langchain-ai/langchain), [ell](https://docs.ell.so/reference/index.html), etc. The objective is, instead, a simple solution for sharing prompt templates on the HF Hub, which is compatible with other libraries and which the community can build upon.


## Quick start
Expand Down
31 changes: 15 additions & 16 deletions docs/repo_types_examples.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@

# Prompts on the HF Hub
# Prompt templates on the HF Hub

The HF Hub is currently organized around three main repository types:

Expand All @@ -12,7 +12,7 @@ Prompt templates can be integrated into any of these repository types as .yaml o


## 1. Prompt templates as independent artifacts in model repos
Many prompt templates can be reused with various models and are not linked to specific model weights. These prompt templates can be shared in an HF model repo, where the model card provides a description and usage instructions, and prompts are shared via .yaml or .json files in the same repository.
Many prompt templates can be reused with various models and are not linked to specific model weights. These prompt templates can be shared in an HF model repo, where the model card provides a description and usage instructions, and prompt templates are shared via .yaml or .json files in the same repository.


<details>
Expand Down Expand Up @@ -113,16 +113,16 @@ prompt_template = PromptTemplateLoader.from_hub(


## 2. Sharing prompts together with model weights
Some open-weight LLMs have been trained to exhibit specific behaviours with specific prompts.
The vision language model [InternVL2](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e) was trained to predict bounding boxes for manually specified areas with a special prompt;
the VLM [Molmo](https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19) was trained to predict point coordinates of objects of images with a special prompt; etc.
Some open-weight LLMs have been trained to exhibit specific behaviours with specific prompt templates.
The vision language model [InternVL2](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e) was trained to predict bounding boxes for manually specified areas with a special prompt template;
the VLM [Molmo](https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19) was trained to predict point coordinates of objects of images with a special prompt template; etc.

These prompts are currently either mentioned unsystematically in model cards or need to be tracked down on github or paper appendices by users.
These prompt templates are currently either mentioned unsystematically in model cards or need to be tracked down on github or paper appendices by users.

`hf_hub_prompts` proposes to share these types of prompts in YAML files in the model repository together with the model weights.
`hf_hub_prompts` proposes to share these types of prompt templates in YAML or JSON files in the model repository together with the model weights.

<details>
<summary>1. Example: Sharing the <a href="https://huggingface.co/MoritzLaurer/open_models_special_prompts">InternVL2 special task prompts</a></summary>
<summary>1. Example: Sharing the <a href="https://huggingface.co/MoritzLaurer/open_models_special_prompts">InternVL2 special task prompt templates</a></summary>

```python
# download image prompt template
Expand All @@ -145,7 +145,7 @@ print(messages)
# 'text': 'Please provide the bounding box coordinate of the region this sentence describes: <ref>the bird</ref>'}]}]
```

This prompt can then directly be used in a vLLM container, e.g. hosted on HF Inference Endpoints, using the OpenAI messages format and client.
This populated prompt can then directly be used in a vLLM container, e.g. hosted on HF Inference Endpoints, using the OpenAI messages format and client.

```py
from openai import OpenAI
Expand Down Expand Up @@ -173,21 +173,20 @@ response.choices[0].message.content

## 3. Attaching prompts to datasets
LLMs are increasingly used to help create datasets, for example for quality filtering or synthetic text generation.
The prompts used for creating a dataset are currently unsystematically shared on GitHub ([example](https://github.com/huggingface/cosmopedia/tree/main/prompts)),
The prompt templates used for creating a dataset are currently unsystematically shared on GitHub ([example](https://github.com/huggingface/cosmopedia/tree/main/prompts)),
referenced in dataset cards ([example](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu#annotation)), or stored in .txt files ([example](https://huggingface.co/HuggingFaceFW/fineweb-edu-classifier/blob/main/utils/prompt.txt)),
hidden in paper appendices or not shared at all.
This makes reproducibility unnecessarily difficult.

To facilitate reproduction, these dataset prompts can be shared in YAML files in HF dataset repositories together with metadata on generation parameters, model_ids etc.
To facilitate reproduction, these dataset prompt templates can be shared in YAML files in HF dataset repositories together with metadata on generation parameters, model_ids etc.


<details>
<summary>1. Example: the <a href="https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu">FineWeb-edu</a> prompt</summary>
The FineWeb-Edu dataset was created by prompting `Meta-Llama-3-70B-Instruct` to score the educational value of web texts.
The authors <a href="https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu#annotation">provide the prompt</a> in a .txt file.
The authors <a href="https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu#annotation">provide the prompt template</a> in a <a href="https://huggingface.co/HuggingFaceFW/fineweb-edu-classifier/blob/main/utils/prompt.txt">.txt</a> file.

When provided in a YAML file in the dataset repo, the prompt can easily be loaded and supplemented with metadata
like the model_id or generation parameters for easy reproducibility.
When provided in a YAML/JSON file in the dataset repo, the prompt template can easily be loaded and supplemented with metadata like the model_id or generation parameters for easy reproducibility.
See this <a href="https://huggingface.co/datasets/MoritzLaurer/dataset_prompts">example dataset repository</a>


Expand Down Expand Up @@ -230,11 +229,11 @@ print(outputs[0]["generated_text"][-1])
<details>
<summary>2. Example: the <a href="https://huggingface.co/collections/HuggingFaceTB/cosmopedia-65d4e44c693d9451ce4344d6">Cosmopedia dataset</a></summary>
Cosmopedia is a dataset of synthetic textbooks, blogposts, stories, posts and WikiHow articles generated by Mixtral-8x7B-Instruct-v0.1.
The dataset shares it's prompts on <a href="https://github.com/huggingface/cosmopedia/tree/main/prompts">GitHub</a>
The dataset shares it's prompt templates on <a href="https://github.com/huggingface/cosmopedia/tree/main/prompts">GitHub</a>
with a <a href="https://github.com/huggingface/cosmopedia/blob/main/prompts/auto_math_text/build_science_prompts.py">custom build logic</a>.
The prompts are not available in the <a href="https://huggingface.co/datasets/HuggingFaceTB/cosmopedia/tree/main">HF dataset repo</a>

The prompts could be directly added to the dataset repository in the standardized YAML format.
The prompts could be directly added to the dataset repository in the standardized YAML/JSON format.

</details>

Expand Down
30 changes: 28 additions & 2 deletions docs/standard_prompt_format.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,7 @@ A prompt template YAML or JSON file must follow the following standardized struc
This structure is inspired by the LangChain [PromptTemplate](https://python.langchain.com/api_reference/core/prompts/langchain_core.prompts.prompt.PromptTemplate.html)
and [ChatPromptTemplate](https://python.langchain.com/api_reference/core/prompts/langchain_core.prompts.chat.ChatPromptTemplate.html).


Example YAML prompt template:
Example prompt template in YAML:
```yaml
prompt:
messages:
Expand All @@ -33,6 +32,33 @@ prompt:
author: "Karl Marx"
```
**Naming convention:** We call a file a *"prompt template"*, when it has placeholders ({...}) for dynamically populating the template like an f-string. This makes files more useful and reusable by others for different use-cases. Once the placeholders in the template are populated with specific input variables, we call it a *"prompt"*.
The following example illustrates how the prompt template becomes a prompt.
```python
>>> # 1. Download a prompt template:
>>> from hf_hub_prompts import PromptTemplateLoader
>>> prompt_template = PromptTemplateLoader.from_hub(
... repo_id="MoritzLaurer/example_prompts",
... filename="code_teacher.yaml"
... )

>>> # 2. Inspect the template and it's input variables:
>>> prompt_template.messages
[{'role': 'system', 'content': 'You are a coding assistant who explains concepts clearly and provides short examples.'}, {'role': 'user', 'content': 'Explain what {concept} is in {programming_language}.'}]
>>> prompt_template.input_variables
['concept', 'programming_language']

>>> # 3. Populate the template with its input variables
>>> prompt = prompt_template.populate_template(
... concept="list comprehension",
... programming_language="Python"
... )
>>> prompt.content
[{'role': 'system', 'content': 'You are a coding assistant who explains concepts clearly and provides short examples.'}, {'role': 'user', 'content': 'Explain what list comprehension is in Python.'}]
```
## Pros/Cons for different file formats for sharing prompt templates
Expand Down

0 comments on commit d54497b

Please sign in to comment.