Releases · tenstorrent/tt-inference-server

19 Feb 19:42

tstescoTT

v0.0.3

ba7a180

v0.0.3 Pre-release

Pre-release

What's Changed

Release Candidate v0.0.3: Improved setup.sh and documentation with models supported list by @tstescoTT in #103

Full Changelog:

#101 Consider links when using local model
#102 update documentation to show all supported models and give correct links, mark experimental preview models
setup.sh supports all base and Instruct models
#106 add host disk and RAM check in setup.sh addressing #76
#107 update setup documentation in vllm-tt-metal-llama3/README.md to describe other models and put setup and installation first to avoid confusion

Compare: v0.0.2...v0.0.3

Co-authored-by: Pavle Petrovic ppetrovic@tenstorrent.com

Contributors

tstescoTT

Assets 2

07 Feb 01:48

tstescoTT

v0.0.2

a97b620

v0.0.2 Pre-release

Pre-release

What's Changed

Release Candidate v0.0.2: Qwen2.5 72B support by @tstescoTT in #100

Full Changelog:

#75: Check if a token can access a model on HuggingFace
Workaround for outdated apt-get keyring (#90)
Provide a model from your local storage (#91)
add CONTAINER_APP_UID Docker ARG for cloud image and dev image with develoment instructions (#94)
remove git lfs from tt-metal build
add CONTAINER_APP_UID Docker ARG for cloud image and dev image with develoment instructions
tstesco/qwen25-support (#98)
avoid jq dependency when checking HF_TOKEN has repo access
adding Qwen2.5-72B-Instruct setup
update example scripts for minimal text request with user input without dependencies
adding qwen setup support
increase output tokens for example usage
add TODO note on tt-metal cache in new impl
add prompt to example output
fix prompt generation text without images
adding default env vars for Qwen2.5 72B
Tstesco/doc update qwen25 (#99)
update docs
add container id
fix import for non qwen models
update gchr ref
update main model impl README.md
remove tt-metal-llama3-70b, archive tt-metal-mistral-7b (#97)
Consider links when using local model (#101)

Co-authored-by: Pavle Petrovic ppetrovic@tenstorrent.com

Contributors

tstescoTT

Assets 2

05 Feb 22:46

tstescoTT

v0.0.1

6061606

v0.0.1 Pre-release

Pre-release

What's Changed

Added mistral 7b from project falcon repo by @mvanniasingheTT in #1
tstesco/uplift-llama3 by @tstescoTT in #2
Add Llama 3.1 70B T3K inference server prefill+decode by @tstescoTT in #6
Anirud/add licenses by @anirudTT in #8
Llama 3.1 70B T3K with continuous batching by @tstescoTT in #12
Mistral7B with Prefill and N300 Support by @mvanniasingheTT in #15
license file updated for formatting and readme file fleshed out by @sbennettTT in #18
add GH workflow : by @anirudTT in #20
evals and benchmarking structure by @tstescoTT in #21
Adding Mock Model + Test Directory (Redo) by @mvanniasingheTT in #29
online mock_model and test scripts by @tstescoTT in #30
rm tt-metal-llama3-70b/src/scripts/demo_llama3_alpaca_eval.py by @tstescoTT in #31
vLLM deploy script and example client scripts by @tstescoTT in #32
Updating Mistral README w/ Warning by @mvanniasingheTT in #38
Add Logging Utils by @mvanniasingheTT in #39
Prompt generation CLI and utils by @tstescoTT in #41
Logging utils and config by @tstescoTT in #42
Tests with mock model vllm open ai inference server by @tstescoTT in #43
Offline inference benchmarking by @tstescoTT in #44
Uplift vllm-tt-metal-llama3-70b by @tstescoTT in #45
Add pyproject.toml, update README.md, fix add_spdx_header.py by @tstescoTT in #46
move ttnn mocking and setup_mock_model_weights to mock_vllm_model.py by @tstescoTT in #47
remove old dockerfile vllm.llama3.src.base.inference.v0.52.0.Dockerfile by @tstescoTT in #48
update eval scripts by @tstescoTT in #49
Add Locust testing scripts and configurations by @milank94 in #50
fix ttnn mock for all import cases by @tstescoTT in #54
Fix early exiting behaviour in setup_weights and add setup of cache d… by @bgoelTT in #60
Add YoloV4 inference server by @bgoelTT in #52
adding online benchmarking scripts by @tstescoTT in #55
Add JWT API key authentication to YOLOv4 by @bgoelTT in #65
adding test vllm script by @tstescoTT in #58
benchmark and evals changes for Llama 3.1 70B v0 drop testing by @tstescoTT in #59
Llama 3.x model support, setup.sh script multiple model support using HF download by @tstescoTT in #67
rename PERSISTENT_VOLUME to MODEL_VOLUME in legacy documentation by @tstescoTT in #68
update documentation links and instructions for setup by @tstescoTT in #69
YOLOv4 improvements by @bgoelTT in #80
Llama 3.x multimodal support for evaluations and benchmarking by @tstescoTT in #79
#77: fix permissions setup of mounted volumes by @tstescoTT in #78
Release Candidate v0.0.1 by @tstescoTT in #88