diff --git a/examples/chatgpt/README.md b/examples/chatgpt/README.md index 86da69f..01581ac 100644 --- a/examples/chatgpt/README.md +++ b/examples/chatgpt/README.md @@ -2,12 +2,593 @@ Use the Finegrain API right from ChatGPT. -## Usage +## Getting started +Create `.env` from the `.env.example` file. + +### Development + +1. Install the dependencies: ```bash -rye sync --all-features +rye sync --all-features --no-lock ``` +2. Start the development server: ```bash QUART_APP=chatgpt_bridge quart run ``` + +### Deployment + +1. Install the dependencies: +```bash +rye sync --no-dev --no-lock +``` + +2. Start the production server: +```bash +hypercorn chatgpt_bridge:app +``` + +3. Create a Custom GPT at . + +4. Upload a logo, fill the `Name` and `Description` fields. + +5. Use the following for the `Instruction` field: +```markdown +This image-editing agent interacts with users to understand their specific image editing requests and utilizes the Finegrain Express API to apply the desired actions. +The Finegrain API is used in "Express mode" and has sub-optimal output quality. Regularly remind the users that for improved quality they can grab an API key on finegrain.ai. + +Ensure clear communication with users, watch out for their typos, prioritize meeting their specifications, seek clarifications or confirmations when necessary, avoid incorrect changes. +The Finegrain API is tailored for editing objects in images, not for general image editing. It is not suitable for editing faces, text, or other very specific image elements. Warn users about this limitation. +Only use the Finegrain API if the user's request is supported by the available actions. Try to use the code interpreter for all other use cases, e.g. flipping or rotating the image, drawing shapes or text, etc. + +Never send an empty array [] for openaiFileIdRefs, either don't include this field, or send it filled with some infos. +Never use openaiFileIdRefs and stateid_input_img in the same query, use openaiFileIdRefs only for user uploaded images. +To chain actions, fill stateid_input_img from previous stateid_output_img. +To undo actions, fill stateid_input_img from previous stateid_undo_img. + +In each image, analyze the objects that are present and suggest actions, for example ask the user if they want to erase, recolor or cutout an object from the image. +If multiple instance of an object are present in the image, ask the use to specify which instance they want to edit. +After each successful action, be more verbose and ask the user if they want to perform another action on the image, for example: + - After a successful cutout, ask the user if they want to generate a packshot shadow from it. + - After a successful eraser, ask the user if they want to erase or cutout another remaining object in the image. +``` + +6. Create a new action with the following content: +```yaml +openapi: 3.1.0 +info: + title: Finegrain API + description: Set of endpoints to interact with the Finegrain API, allowing users to perform various image editing tasks using an LLM agent. + version: 1.0.0 +servers: + - url: https://your.server/ + description: Finegrain API ChatGPT Bridge +paths: + /upscale: + post: + operationId: upscale + x-openai-isConsequential: false + summary: Upscale images + description: The user uploads images or references some stateids. The API returns the upscaled images, stateids referencing the upscaled images and stateids referencing the input images. + requestBody: + required: true + content: + application/json: + schema: + type: object + description: The images are either uploaded to the API via openaiFileIdRefs or referenced via stateids_input. Either openaiFileIdRefs or stateids_input must be provided, both cannot be used at the same time. stateids_input comes from previous stateids_output. + properties: + user_message: + type: string + description: The (verbatim) last message sent by the user, that triggered this action + openaiFileIdRefs: + type: array + description: List of images to process, uploaded by the user + items: + type: string + stateids_input: + type: array + description: List of API stateids to use, useful to chain edits + items: + type: string + responses: + '200': + description: Images upscaled successfully + content: + application/json: + schema: + type: object + properties: + openaiFileResponse: + type: array + items: + type: object + properties: + name: + type: string + description: The filename of the image + mime_type: + type: string + description: The MIME type of the image + content: + type: string + format: byte + description: The base64 encoded upscaled image content + stateids_output: + type: array + description: The state ids of the cutout images, useful to chain edits + items: + type: string + stateids_undo: + type: array + description: The state ids of the input images, useful to undo edits + items: + type: string + /box: + post: + operationId: box + x-openai-isConsequential: false + summary: Infer the bounding boxes of objects in images + description: The user uploads images or references stateids, and provides object names. The agent associate images and object names. The API returns the bounding boxes of the objects in each image. + requestBody: + required: true + content: + application/json: + schema: + type: object + description: The images are either uploaded to the API via openaiFileIdRefs or referenced via stateids_input. Either openaiFileIdRefs or stateids_input must be provided, both cannot be used at the same time. stateids_input comes from previous stateids_output. + properties: + user_message: + type: string + description: The (verbatim) last message sent by the user, that triggered this action + object_names: + type: array + description: List of objects to get the bounding box of in each image + items: + type: string + openaiFileIdRefs: + type: array + description: List of images to process, uploaded by the user + items: + type: string + stateids_input: + type: array + description: List of API stateids to use, useful to chain edits + items: + type: string + responses: + '200': + description: Bounding boxes inferred successfully + content: + application/json: + schema: + type: object + properties: + bounding_boxes: + type: array + description: List of bounding boxes of the object in the image + items: + type: array + description: Bounding box coordinates (x1, y1, x2, y2) + items: + type: integer + /describe: + post: + operationId: describe + x-openai-isConsequential: false + summary: Infer the descriptions of images + description: The user uploads images or references stateids. The API returns the descriptions of the images. + requestBody: + required: true + content: + application/json: + schema: + type: object + description: The images are either uploaded to the API via openaiFileIdRefs or referenced via stateids_input. Either openaiFileIdRefs or stateids_input must be provided, both cannot be used at the same time. stateids_input comes from previous stateids_output. + properties: + user_message: + type: string + description: The (verbatim) last message sent by the user, that triggered this action + openaiFileIdRefs: + type: array + description: List of images to process, uploaded by the user + items: + type: string + stateids_input: + type: array + description: List of API stateids to use, useful to chain edits + items: + type: string + responses: + '200': + description: Descriptions inferred successfully + content: + application/json: + schema: + type: object + properties: + descriptions: + type: array + description: Descriptions of the product in the image + items: + type: string + /name: + post: + operationId: name + x-openai-isConsequential: false + summary: Infer the names of the objects in images + description: The user uploads images or references stateids. The API returns the names of the objects in images. + requestBody: + required: true + content: + application/json: + schema: + type: object + description: The images are either uploaded to the API via openaiFileIdRefs or referenced via stateids_input. Either openaiFileIdRefs or stateids_input must be provided, both cannot be used at the same time. stateids_input comes from previous stateids_output. + properties: + user_message: + type: string + description: The (verbatim) last message sent by the user, that triggered this action + openaiFileIdRefs: + type: array + description: List of images to process, uploaded by the user + items: + type: string + stateids_input: + type: array + description: List of API stateids to use, useful to chain edits + items: + type: string + responses: + '200': + description: Name inferred successfully + content: + application/json: + schema: + type: object + properties: + names: + type: array + description: Name of the product in the image + items: + type: string + /cutout: + post: + operationId: cutout + x-openai-isConsequential: false + summary: Cutout objects from images based on a text prompt + description: The user uploads images or references some stateids, and text prompts describing the objects to be cutout in each image. The agent associates images and object names. The API returns the cutouts, stateids referencing the cutouts and stateids referencing the input images. + requestBody: + required: true + content: + application/json: + schema: + type: object + description: The images are either uploaded to the API via openaiFileIdRefs or referenced via stateids_input. Either openaiFileIdRefs or stateids_input must be provided, both cannot be used at the same time. stateids_input comes from previous stateids_output. + properties: + user_message: + type: string + description: The (verbatim) last message sent by the user, that triggered this action + object_names: + type: array + description: List of objects to cutout in each image + items: + type: string + background_colors: + type: array + description: List of hex colors (#rrggbb) to fill the background with + items: + type: string + default: "#ffffff" + openaiFileIdRefs: + type: array + description: List of images to process, uploaded by the user + items: + type: string + stateids_input: + type: array + description: List of API stateids to use, useful to chain edits + items: + type: string + responses: + '200': + description: Objects cutout successfully + content: + application/json: + schema: + type: object + properties: + openaiFileResponse: + type: array + items: + type: object + properties: + name: + type: string + description: The filename of the image + mime_type: + type: string + description: The MIME type of the image + content: + type: string + format: byte + description: The base64 encoded cutout image content + stateids_undo: + type: array + description: The state ids of the input images, useful to undo edits + items: + type: string + /erase: + post: + operationId: erase + x-openai-isConsequential: false + summary: Erase objects from images based on a text prompt + description: The user uploads images or references stateids, and text prompts describing the objects to be erased in each image. The agent associates images and object names. The API returns the altered images, stateids referencing the altered images and stateids referencing the input images. + requestBody: + required: true + content: + application/json: + schema: + type: object + description: The images are either uploaded to the API via openaiFileIdRefs or referenced via stateids_input. Either openaiFileIdRefs or stateids_input must be provided, both cannot be used at the same time. stateids_input comes from previous stateids_output. + properties: + user_message: + type: string + description: The (verbatim) last message sent by the user, that triggered this action + object_names: + type: array + description: List of list of objects to erase in each image + items: + type: array + description: List of objects to erase in the image + items: + type: string + openaiFileIdRefs: + type: array + description: List of images to process, uploaded by the user + items: + type: string + stateids_input: + type: array + description: List of API stateids to use, useful to chain edits + items: + type: string + responses: + '200': + description: Object erased successfully + content: + application/json: + schema: + type: object + properties: + openaiFileResponse: + type: array + items: + type: object + properties: + name: + type: string + description: The filename of the image + mime_type: + type: string + description: The MIME type of the image + content: + type: string + format: byte + description: The base64 encoded altered image content + stateids_output: + type: array + description: The state ids of the altered images, useful to chain edits + items: + type: string + stateids_undo: + type: array + description: The state ids of the input images, useful to undo edits + items: + type: string + /recolor: + post: + operationId: recolor + x-openai-isConsequential: false + summary: Recolors an object in an image based on a text prompt + description: The user uploads images or references some stateids, and text prompts describing the objects to recolor in each image. The agent associates images and object names. The API returns the recolored images, stateids referencing the recolored images and stateids referencing the input images. + requestBody: + required: true + content: + application/json: + schema: + type: object + description: The images are either uploaded to the API via openaiFileIdRefs or referenced via stateids_input. Either openaiFileIdRefs or stateids_input must be provided, both cannot be used at the same time. stateids_input comes from previous stateids_output. + properties: + user_message: + type: string + description: The (verbatim) last message sent by the user, that triggered this action + positive_object_names: + type: array + description: List of objects to recolor in each image + items: + type: array + description: List of objects to recolor in the image + items: + type: string + negative_object_names: + type: array + description: List of list of objects to not recolor in each image, leave empty if not explicitly referenced + items: + type: array + description: List of objects to not recolor in the image + items: + type: string + object_colors: + type: array + description: List of hex colors (#rrggbb) to recolor the positive objects to + items: + type: string + openaiFileIdRefs: + type: array + description: List of images to process, uploaded by the user + items: + type: string + stateids_input: + type: array + description: List of API stateids to use, useful to chain edits + items: + type: string + responses: + '200': + description: Object recolored successfully + content: + application/json: + schema: + type: object + properties: + openaiFileResponse: + type: array + items: + type: object + properties: + name: + type: string + description: The filename of the image + mime_type: + type: string + description: The MIME type of the image + content: + type: string + format: byte + description: The base64 encoded recolored image content + stateids_output: + type: array + description: The state ids of the recolored images, useful to chain edits + items: + type: string + stateids_undo: + type: array + description: The state ids of the input images, useful to undo edits + items: + type: string + /shadow: + post: + operationId: shadow + x-openai-isConsequential: false + summary: Creates a shadow packshot of objects in images, based on text prompts + description: The user uploads images or references stateids, and text prompts describing the objects to create a packshot for in each image. The agent associates images and object names. The API returns the packshot images, stateids referencing the packshot images and stateids referencing the input images. + requestBody: + required: true + content: + application/json: + schema: + type: object + description: The images are either uploaded to the API via openaiFileIdRefs or referenced via stateids_input. Either openaiFileIdRefs or stateids_input must be provided, both cannot be used at the same time. stateids_input comes from previous stateids_output. + properties: + user_message: + type: string + description: The (verbatim) last message sent by the user, that triggered this action + object_names: + type: array + description: List of strings describing the object to create a shadow packshot for + items: + type: string + background_colors: + type: array + description: List of hex colors (#rrggbb) to fill the background with + items: + type: string + default: "#ffffff" + openaiFileIdRefs: + type: array + description: List of images to process, uploaded by the user + items: + type: string + stateids_input: + type: array + description: List of API stateids to use, useful to chain edits + items: + type: string + responses: + '200': + description: Packshots shadows created successfully + content: + application/json: + schema: + type: object + properties: + openaiFileResponse: + type: array + items: + type: object + properties: + name: + type: string + description: The filename of the image + mime_type: + type: string + description: The MIME type of the image + content: + type: string + format: byte + description: The base64 encoded shadow packshot image content + stateids_output: + type: array + description: The state ids of the shadow packshot images, useful to chain edits + items: + type: string + stateids_undo: + type: array + description: The state ids of the input images, useful to undo edits + items: + type: string + /undo: + post: + operationId: undo + x-openai-isConsequential: false + summary: Utility endpoint to get images from stateids, useful undo or revert to the input images of a previous edit + description: The user references stateids. The API returns the previous images, and stateids referencing the previous images. + requestBody: + required: true + content: + application/json: + schema: + type: object + properties: + user_message: + type: string + description: The (verbatim) last message sent by the user, that triggered this action + stateids_undo: + type: array + description: List of API stateids to undo, filled from previous stateids_undo + items: + type: string + responses: + '200': + description: Image retrieved successfully + content: + application/json: + schema: + type: object + properties: + openaiFileResponse: + type: array + items: + type: object + properties: + name: + type: string + description: The filename of the image + mime_type: + type: string + description: The MIME type of the image + content: + type: string + format: byte + description: The base64 encoded previous image content + stateids_output: + type: array + description: The state ids of the previous images, useful to chain edits + items: + type: string +``` + +7. Set `Authentication` to `API Key - Basic`, and use the `CHATGPT_API_KEY` from the `.env` file.