Anirud/multi modal model support #169

anirudTT · 2025-02-05T15:04:39Z

Changelog

Introduces support for 11B Vision Llama model vision with changes in model request prompt to support URLs or images as base64.
Improves Rag Table by showing PDF name metadata.
Resolved input area resizing issues to ensure consistent UX across different screen sizes.
Added functionality to show which rag context is selected in the chat, enhancing clarity and interactions.
Images shared in the chat are now visible and can be resized if needed.
Default prompt is passed when the text is empty along with an image, but if text is provided, it’s passed through as-is.
Added ability to pass text-based files (markdown, shell code, etc. except PDFs) which are extracted and passed to the LLM as a rag context, with support for passing multiple files.

RAG Pill displayed in chat thread

Default prompt not used when user passes a prompt

Multiple code / text based files can be passed tru to the llm

Attaching/ Uploading the code file and asking the llm to explain the code

Known Issues / In progress

In the same chat thread, if a second question is asked about the image pasted in a previous chat, it isn’t correctly passed through to the prompt templating.
Adding a warning to users to use rag when uploading PDFs or connecting it to the same flow of the rag table via the input area. in chatui

…structure

- to be minimized or maximized - opened via a dialog box

- also add background blur when image is open in max view - adds z index for the image dialog box

* copy run agent container + helper func * copied in updated docker views * copied in model utils to stream agent response * copy in model views * added agent view * copy in all frontend components * add search api key to docker compose yml * copy in updated model urls * added requirements for dockerfile * rename hf_model_id * remove commetned code intepretor tool * added fix so agent works with other llama models * fix requirements in dockerfile * add thread id to match stateful chat * add readme * add agent workflow diagram * Update README.md * Delete app/api/agent_control/Agent.png * Add files via upload * Delete app/api/agent_control/Agent.png * Add files via upload * Update README.md * Delete app/api/agent_control/Agent.png * Add files via upload * Delete app/api/agent_control/Agent.png * Add files via upload * Update README.md * fix link href (#180) * refactor(chat history component): improve file handling and add RAG support - Add RAG datasource integration with metadata display - Create reusable FileDisplay component for file management - Implement FileViewerDialog for improved file preview experience - Support both image and non-image file types with download option - Clean up file handling logic and separate from image-specific code - Add visual indicator for RAG-enabled messages * Show RAG pill based on the message's stored RAG context * feat(add support in chat component): - Use the RAG datasource from the message if available * move image display to its own component * include rag source name when selected * refactor(types): clean up and organize type definitions - Remove redundant and commented-out interfaces - Group related interfaces together (chat, inference, file, voice) - Add proper JSDoc comments for better documentation - Consolidate duplicate type definitions - Add explicit typing for RAG-related interfaces * add pdfjs-dist to test * feat: improve file display - show images in better aspect ratio * display file display for images ,code files and other file types * add icons for file display in chat thread * extend types * extend to add - File extensions mapping for code files and other file types * extend to allow for files to be passed as text * fix alignment * limit upload to a single image file * set focused state in input area * feat: add ability to process multiple code and or text file types and send to model * re add resizing input are * fix copy button logic * Anirud/update vllm setup steps (#189) * update readme to reflect new flow * fix readme issues * add Supported models tab: pointing to tt-inference-server readme * docs: Update main readme - add better quick start guide - add better notes for running in development mode * docs: re add Mock model steps * docs: fix links * docs: fix vllm * Update HowToRun_vLLM_Models.md * Update HowToRun_vLLM_Models.md

…/tenstorrent/tt-studio into anirud/multi-modal-model-support

…el-support

mvanniasingheTT · 2025-03-03T20:03:58Z

All features work well! Just ran into one issue. When giving an image url, the response is not as expected.

Taking an image in as an URL seems to not work for me as expected

mvanniasingheTT · 2025-03-03T20:12:14Z

Sometimes, if it can't fetch the URL, the previous input image will be referred to. This only happened for some URLs. For example:

But this is the 2nd link: https://images.squarespace-cdn.com/content/v1/607f89e638219e13eee71b1e/1684821560422-SD5V37BAG28BURTLIXUQ/michael-sum-LEpfefQf4rU-unsplash.jpg?format=2500w

anirudTT · 2025-03-04T15:06:40Z

Sometimes, if it can't fetch the URL, the previous input image will be referred to. This only happened for some URLs. For example:

But this is the 2nd link: https://images.squarespace-cdn.com/content/v1/607f89e638219e13eee71b1e/1684821560422-SD5V37BAG28BURTLIXUQ/michael-sum-LEpfefQf4rU-unsplash.jpg?format=2500w

ok great catch , I logged both as a issues , which I will fix

anirudTT mentioned this pull request Feb 6, 2025

rc v1.2.0 #174

Merged

anirudTT marked this pull request as ready for review February 7, 2025 16:16

This comment was marked as outdated.

Sign in to view

anirudTT requested review from bgoelTT, tstescoTT and mvanniasingheTT February 7, 2025 16:17

This comment was marked as outdated.

Sign in to view

anirudTT self-assigned this Feb 25, 2025

anirudTT mentioned this pull request Feb 25, 2025

Use Default Prompt only if user prompt is empty #206

Open

anirudTT added 21 commits February 25, 2025 19:24

fix ts bug

a42556e

add console log

8eef00f

adds support to input area to allow for file upload

9f10864

adds flow to send over base64 files to model

80f55b8

add support to show model names in rag management table

eca4b6e

add icon + have it look similar to models deployed table

61ffd38

add updates to handle uploading image file

ab6a52b

adds new file data + url ts props

dcdc61b

add utility file to handle upload and encode images to base64

320f3fb

modify to support showing images

a305ee7

fix upload open / close in input area

eb6c15c

modify to: If files are uploaded or url are sent, to use new message …

a7f664d

…structure

update 11b vision to new image tag from tt-inference-server

c055bb2

add image linking component

339e7d7

show image as a link preview in chat thread

5790e90

adds some improvements

ff95335

allow url / links to be passed in correct post structure

5db7227

add better user feedback to input upload area

b9ba0ae

add tooltips to upload icon to help convey to the user

adce895

add better error feedback on uploads

c2cfe1f

adds features for the image sent via user:

9b41c41

- to be minimized or maximized - opened via a dialog box

anirudTT added 6 commits February 25, 2025 19:25

- adds border bettwen image and text in chat thread

d0ebba4

- also add background blur when image is open in max view - adds z index for the image dialog box

change default prompt to allow user prompt + image to be passed to model

3afb319

add loading and error states to rag view

7ab1147

updates rag form error handling and toast

82d1711

Adds better control when user tries to replace upload rag document

60ad6d6

anirudTT force-pushed the anirud/multi-modal-model-support branch from 4c39c27 to f8314c9 Compare February 25, 2025 19:26

anirudTT added 3 commits February 25, 2025 20:44

fix merge issues

29fb80a

remove unused console log

75f657f

Merge branch 'dev' into anirud/multi-modal-model-support

5ea5c48

Merge branch 'anirud/multi-modal-model-support' of https://github.com…

0af2715

…/tenstorrent/tt-studio into anirud/multi-modal-model-support

anirudTT linked an issue Feb 27, 2025 that may be closed by this pull request

Multi Modal Model Support #216

Open

anirudTT requested a review from bgoelTT February 27, 2025 23:13

Merge remote-tracking branch 'origin/dev' into anirud/multi-modal-mod…

7b2a73d

…el-support

Merge branch 'dev' into anirud/multi-modal-model-support

aa8025c

mvanniasingheTT approved these changes Mar 4, 2025

View reviewed changes

anirudTT mentioned this pull request Mar 4, 2025

URL not correctly send to 11b vision multi modal modals #222

Open

anirudTT mentioned this pull request Mar 4, 2025

11b vision model url issues #224

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anirud/multi modal model support #169

Anirud/multi modal model support #169

anirudTT commented Feb 5, 2025 •

edited

Loading

This comment was marked as outdated.

This comment was marked as outdated.

mvanniasingheTT commented Mar 3, 2025 •

edited

Loading

mvanniasingheTT commented Mar 3, 2025

anirudTT commented Mar 4, 2025

Anirud/multi modal model support #169

Are you sure you want to change the base?

Anirud/multi modal model support #169

Conversation

anirudTT commented Feb 5, 2025 • edited Loading

Changelog

This comment was marked as outdated.

This comment was marked as outdated.

mvanniasingheTT commented Mar 3, 2025 • edited Loading

mvanniasingheTT commented Mar 3, 2025

anirudTT commented Mar 4, 2025

anirudTT commented Feb 5, 2025 •

edited

Loading

mvanniasingheTT commented Mar 3, 2025 •

edited

Loading