feat: introduce jinja2 renderer; and use {{...}} as standard template…

… variable syntax; and introduce better variable validation and naming
MoritzLaurer · Dec 1, 2024 · 5565b03 · 5565b03
1 parent d54497b
commit 5565b03
Show file tree

Hide file tree

Showing 10 changed files with 650 additions and 141 deletions.
diff --git a/docs/standard_prompt_format.md b/docs/standard_prompt_format.md
@@ -5,20 +5,20 @@ The library expects prompt templates to be stored as modular YAML or JSON files.
 A prompt template YAML or JSON file must follow the following standardized structure:
 
 - Top-level key (required): `prompt`. This top-level key signals to the parser that the content of the file is a prompt template.
-- Second-level key (required): *Either* `messages` *or* `template`. If `messages`, the prompt template must be provided as a list of dictionaries following the OpenAI messages format. This format is recommended for use with LLM APIs or inference containers. If `template`, the prompt template should be provided as a single string. Input variable placeholders for populating the prompt template are denoted with curly brackets, similar to Python f-strings.
+- Second-level key (required): *Either* `messages` *or* `template`. If `messages`, the prompt template must be provided as a list of dictionaries following the OpenAI messages format. This format is recommended for use with LLM APIs or inference containers. If `template`, the prompt template should be provided as a single string. Variable placeholders for populating the prompt template string are denoted with double curly brackets {{...}}.
 - Second-level keys (optional): (1) `input_variables`: an optional list of variables for populating the prompt template. This is also used for input validation; (2) `metadata`: Other information, such as the source, date, author etc.; (3) Any other key of relevance, such as `client_settings` with parameters for reproducibility with a specific inference client, or `metrics` form evaluations on specific datasets.
 
 This structure is inspired by the LangChain [PromptTemplate](https://python.langchain.com/api_reference/core/prompts/langchain_core.prompts.prompt.PromptTemplate.html) 
 and [ChatPromptTemplate](https://python.langchain.com/api_reference/core/prompts/langchain_core.prompts.chat.ChatPromptTemplate.html).
 
-Example prompt template in YAML: 
+Example prompt template following the standard in YAML: 
 ```yaml
 prompt:
   messages:
     - role: "system"
       content: "You are a coding assistant who explains concepts clearly and provides short examples."
     - role: "user"
-      content: "Explain what {concept} is in {programming_language}."
+      content: "Explain what {{concept}} is in {{programming_language}}."
   input_variables:
     - concept
     - programming_language
@@ -32,7 +32,7 @@ prompt:
     author: "Karl Marx"
 ```
 
-**Naming convention:** We call a file a *"prompt template"*, when it has placeholders ({...}) for dynamically populating the template like an f-string. This makes files more useful and reusable by others for different use-cases. Once the placeholders in the template are populated with specific input variables, we call it a *"prompt"*. 
+**Naming convention:** We call a file a *"prompt template"*, when it has placeholders ({{...}}) for dynamically populating the template similr to an f-string. This makes files more useful and reusable by others for different use-cases. Once the placeholders in the template are populated with specific input variables, we call it a *"prompt"*. 
 
 The following example illustrates how the prompt template becomes a prompt. 
 

diff --git a/examples/example-usage.ipynb b/examples/example-usage.ipynb
@@ -13,7 +13,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 1,
    "id": "c4b5ab80-b526-408d-92f3-0ecdf5a92845",
    "metadata": {},
    "outputs": [],
@@ -50,7 +50,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 2,
    "id": "947ac23c",
    "metadata": {},
    "outputs": [
@@ -60,7 +60,7 @@
        "<module 'hf_hub_prompts.tools' from '/Users/moritzlaurer/huggingface/projects/hf-hub-prompts/hf_hub_prompts/tools.py'>"
       ]
      },
-     "execution_count": 1,
+     "execution_count": 2,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -76,6 +76,109 @@
     "importlib.reload(hf_hub_prompts.tools)\n"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "30a9c0c8",
+   "metadata": {},
+   "source": [
+    "### Example Jinja2 use"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "2f616b8d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "os.chdir(\"/Users/moritzlaurer/huggingface/projects/hf-hub-prompts\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "f0446fe1",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "PopulatedPrompt(content=[{'role': 'system', 'content': 'You are an expert translator who can translate English text to German, French, Chinese.\\n\\nHere are some example translations:\\nEnglish: \"Good morning, how are you?\" translates to German: \"Guten Morgen, wie geht es dir?\"\\nEnglish: \"The weather is beautiful today\" translates to Chinese: \"今天天气很好\"\\n\\nAdditional guidance: \\n- Provide a strictly faithful translation that prioritizes the original meaning over naturalness.\\n'}])"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from hf_hub_prompts import PromptTemplateLoader\n",
+    "\n",
+    "prompt_template = PromptTemplateLoader.from_local(\"./tests/test_data/translate_jinja2.yaml\")\n",
+    "\n",
+    "few_shot_examples = [\n",
+    "    {\n",
+    "        \"source_lang\": \"English\",\n",
+    "        \"target_lang\": \"German\",\n",
+    "        \"source_text\": \"Good morning, how are you?\",\n",
+    "        \"target_text\": \"Guten Morgen, wie geht es dir?\"\n",
+    "    },\n",
+    "    {\n",
+    "        \"source_lang\": \"English\",\n",
+    "        \"target_lang\": \"Chinese\",\n",
+    "        \"source_text\": \"The weather is beautiful today\",\n",
+    "        \"target_text\": \"今天天气很好\"\n",
+    "    }\n",
+    "]\n",
+    "\n",
+    "prompt = prompt_template.populate_template(languages=\"German, French, Chinese\", few_shot_examples=few_shot_examples, strictly_faithful_translation=True)\n",
+    "\n",
+    "prompt\n",
+    "\n",
+    "# TODO: should have way to also display populated prompt nicely\n",
+    "#prompt.content\n",
+    "\n",
+    "# TODO: does not display Jinja2 properly\n",
+    "#prompt_template.display(format=\"yaml\")\n",
+    "\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "46d0413b",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "TextPromptTemplate(renderer=<hf_hub_prompts.prompt_templates.DoubleBraceRender..., renderer_type='double_brace', template='Translate the following text to {{language}}:\\n{{..., input_variables=['language', 'text'], metadata={'name': 'Simple Translator', 'description': 'A si..., other_data={})\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "PopulatedPrompt(content='Translate the following text to German:\\nHello world')"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "prompt_template = PromptTemplateLoader.from_local(\"./tests/test_data/translate.yaml\")\n",
+    "print(prompt_template)\n",
+    "\n",
+    "prompt = prompt_template.populate_template(language=\"German\", text=\"Hello world\")\n",
+    "\n",
+    "prompt\n",
+    "\n"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "509335ed",
@@ -131,41 +234,6 @@
     "\n"
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "id": "307d7efb",
-   "metadata": {},
-   "outputs": [
-    {
-     "ename": "AttributeError",
-     "evalue": "module 'hf_hub_prompts' has no attribute 'load_tool'",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
-      "\u001b[0;31mAttributeError\u001b[0m                            Traceback (most recent call last)",
-      "Cell \u001b[0;32mIn[4], line 4\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mhf_hub_prompts\u001b[39;00m\n\u001b[1;32m      2\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mos\u001b[39;00m\n\u001b[0;32m----> 4\u001b[0m tool \u001b[38;5;241m=\u001b[39m \u001b[43mhf_hub_prompts\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mload_tool\u001b[49m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m./tests/test_data/get_stock_price.py\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m      6\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mUninstalled dependencies:\u001b[39m\u001b[38;5;124m\"\u001b[39m, tool\u001b[38;5;241m.\u001b[39mreturn_uninstalled_dependencies())\n\u001b[1;32m      8\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mTool class:\u001b[39m\u001b[38;5;124m\"\u001b[39m, tool\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__dict__\u001b[39m)\n",
-      "\u001b[0;31mAttributeError\u001b[0m: module 'hf_hub_prompts' has no attribute 'load_tool'"
-     ]
-    }
-   ],
-   "source": [
-    "import hf_hub_prompts\n",
-    "import os\n",
-    "\n",
-    "tool = hf_hub_prompts.load_tool(\"./tests/test_data/get_stock_price.py\")\n",
-    "\n",
-    "print(\"Uninstalled dependencies:\", tool.return_uninstalled_dependencies())\n",
-    "\n",
-    "print(\"Tool class:\", tool.__dict__)\n",
-    "\n",
-    "print(\"OpenAI function:\", tool.to_openai_function())\n",
-    "\n",
-    "result = tool(ticker=\"AAPL\", days=\"5d\")\n",
-    "\n",
-    "print(\"Result:\", result)\n"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "35adccfb",

diff --git a/hf_hub_prompts/__init__.py b/hf_hub_prompts/__init__.py
@@ -1,3 +1,4 @@
+from .constants import Jinja2SecurityLevel, RendererType
 from .hub_api import PromptTemplateLoader, ToolLoader, list_prompt_templates, list_tools
 from .populated_prompt import PopulatedPrompt
 from .prompt_templates import BasePromptTemplate, ChatPromptTemplate, TextPromptTemplate
@@ -14,4 +15,6 @@
     "ToolLoader",
     "list_tools",
     "Tool",
+    "RendererType",
+    "Jinja2SecurityLevel",
 ]
diff --git a/hf_hub_prompts/constants.py b/hf_hub_prompts/constants.py
@@ -0,0 +1,10 @@
+from typing import Literal
+
+
+# File extensions
+VALID_PROMPT_EXTENSIONS = (".yaml", ".yml")
+VALID_TOOL_EXTENSIONS = (".py",)
+
+# Template types
+RendererType = Literal["double_brace", "single_brace", "jinja2"]
+Jinja2SecurityLevel = Literal["strict", "standard", "relaxed"]
diff --git a/hf_hub_prompts/hub_api.py b/hf_hub_prompts/hub_api.py
@@ -5,23 +5,20 @@
 import sys
 import warnings
 from pathlib import Path
-from typing import Any, Dict, List, Optional, Set, Union
+from typing import Any, Dict, List, Literal, Optional, Set, Union
 
 import yaml
 from huggingface_hub import HfApi, hf_hub_download
 from huggingface_hub.utils import validate_repo_id
 
+from .constants import VALID_PROMPT_EXTENSIONS, RendererType
 from .prompt_templates import ChatPromptTemplate, TextPromptTemplate
 from .tools import Tool
 
 
 logger = logging.getLogger(__name__)
 
 
-VALID_PROMPT_EXTENSIONS = (".yaml", ".yml")  # can be extended to other file types in the future
-VALID_TOOL_EXTENSIONS = (".py",)  # can be extended to other file types in the future
-
-
 class PromptTemplateLoader:
     """Class for loading prompt templates from different sources.
 
@@ -45,7 +42,12 @@ class PromptTemplateLoader:
     """
 
     @classmethod
-    def from_local(cls, path: Union[str, Path]) -> Union[TextPromptTemplate, ChatPromptTemplate]:
+    def from_local(
+        cls,
+        path: Union[str, Path],
+        renderer: Optional[RendererType] = None,
+        jinja2_security_level: Literal["strict", "standard", "relaxed"] = "standard",
+    ) -> Union[TextPromptTemplate, ChatPromptTemplate]:
         """Load a prompt template from a local YAML file.
 
         Args:
@@ -80,11 +82,19 @@ def from_local(cls, path: Union[str, Path]) -> Union[TextPromptTemplate, ChatPro
                 f"Error details: {str(e)}"
             ) from e
 
-        return cls._load_template_from_yaml(prompt_file)
+        return cls._load_template_from_yaml(
+            prompt_file, renderer=renderer, jinja2_security_level=jinja2_security_level
+        )
 
     @classmethod
     def from_hub(
-        cls, repo_id: str, filename: str, repo_type: str = "model", revision: Optional[str] = None
+        cls,
+        repo_id: str,
+        filename: str,
+        repo_type: str = "model",
+        revision: Optional[str] = None,
+        renderer: Optional[RendererType] = None,
+        jinja2_security_level: Literal["strict", "standard", "relaxed"] = "standard",
     ) -> Union[TextPromptTemplate, ChatPromptTemplate]:
         """Load a prompt template from the Hugging Face Hub.
 
@@ -168,11 +178,16 @@ def from_hub(
 
         # Add prompt URL to metadata
         prompt_url = f"https://huggingface.co/{repo_id}/blob/main/{filename}"
-        return cls._load_template_from_yaml(prompt_file, prompt_url=prompt_url)
+        return cls._load_template_from_yaml(
+            prompt_file, prompt_url=prompt_url, renderer=renderer, jinja2_security_level=jinja2_security_level
+        )
 
     @staticmethod
     def _load_template_from_yaml(
-        prompt_file: Dict[str, Any], prompt_url: Optional[str] = None
+        prompt_file: Dict[str, Any],
+        prompt_url: Optional[str] = None,
+        renderer: Optional[RendererType] = None,
+        jinja2_security_level: Literal["strict", "standard", "relaxed"] = "standard",
     ) -> Union[TextPromptTemplate, ChatPromptTemplate]:
         """Internal method to load a template from parsed YAML data.
 
@@ -197,9 +212,19 @@ def _load_template_from_yaml(
 
         # Determine which PromptTemplate class to instantiate
         if "messages" in prompt_data:
-            return ChatPromptTemplate(prompt_data=prompt_data, prompt_url=prompt_url)
+            return ChatPromptTemplate(
+                prompt_data=prompt_data,
+                prompt_url=prompt_url,
+                renderer=renderer,
+                jinja2_security_level=jinja2_security_level,
+            )
         elif "template" in prompt_data:
-            return TextPromptTemplate(prompt_data=prompt_data, prompt_url=prompt_url)
+            return TextPromptTemplate(
+                prompt_data=prompt_data,
+                prompt_url=prompt_url,
+                renderer=renderer,
+                jinja2_security_level=jinja2_security_level,
+            )
         else:
             raise ValueError(
                 f"Invalid YAML structure under 'prompt' key: {list(prompt_data.keys())}. "