Releases: tenstorrent/tt-inference-server
Releases · tenstorrent/tt-inference-server
v0.0.3
What's Changed
- Release Candidate v0.0.3: Improved setup.sh and documentation with models supported list by @tstescoTT in #103
Full Changelog:
- #101 Consider links when using local model
- #102 update documentation to show all supported models and give correct links, mark experimental preview models
- setup.sh supports all base and Instruct models
- #106 add host disk and RAM check in setup.sh addressing #76
- #107 update setup documentation in vllm-tt-metal-llama3/README.md to describe other models and put setup and installation first to avoid confusion
Compare: v0.0.2...v0.0.3
Co-authored-by: Pavle Petrovic ppetrovic@tenstorrent.com
v0.0.2
What's Changed
- Release Candidate v0.0.2: Qwen2.5 72B support by @tstescoTT in #100
Full Changelog:
- #75: Check if a token can access a model on HuggingFace
- Workaround for outdated apt-get keyring (#90)
- Provide a model from your local storage (#91)
- add CONTAINER_APP_UID Docker ARG for cloud image and dev image with develoment instructions (#94)
- remove git lfs from tt-metal build
- add CONTAINER_APP_UID Docker ARG for cloud image and dev image with develoment instructions
- tstesco/qwen25-support (#98)
- avoid jq dependency when checking HF_TOKEN has repo access
- adding Qwen2.5-72B-Instruct setup
- update example scripts for minimal text request with user input without dependencies
- adding qwen setup support
- increase output tokens for example usage
- add TODO note on tt-metal cache in new impl
- add prompt to example output
- fix prompt generation text without images
- adding default env vars for Qwen2.5 72B
- Tstesco/doc update qwen25 (#99)
- update docs
- add container id
- fix import for non qwen models
- update gchr ref
- update main model impl README.md
- remove tt-metal-llama3-70b, archive tt-metal-mistral-7b (#97)
- Consider links when using local model (#101)
Co-authored-by: Pavle Petrovic ppetrovic@tenstorrent.com
v0.0.1
What's Changed
- Added mistral 7b from project falcon repo by @mvanniasingheTT in #1
- tstesco/uplift-llama3 by @tstescoTT in #2
- Add Llama 3.1 70B T3K inference server prefill+decode by @tstescoTT in #6
- Anirud/add licenses by @anirudTT in #8
- Llama 3.1 70B T3K with continuous batching by @tstescoTT in #12
- Mistral7B with Prefill and N300 Support by @mvanniasingheTT in #15
- license file updated for formatting and readme file fleshed out by @sbennettTT in #18
- add GH workflow : by @anirudTT in #20
- evals and benchmarking structure by @tstescoTT in #21
- Adding Mock Model + Test Directory (Redo) by @mvanniasingheTT in #29
- online mock_model and test scripts by @tstescoTT in #30
- rm tt-metal-llama3-70b/src/scripts/demo_llama3_alpaca_eval.py by @tstescoTT in #31
- vLLM deploy script and example client scripts by @tstescoTT in #32
- Updating Mistral README w/ Warning by @mvanniasingheTT in #38
- Add Logging Utils by @mvanniasingheTT in #39
- Prompt generation CLI and utils by @tstescoTT in #41
- Logging utils and config by @tstescoTT in #42
- Tests with mock model vllm open ai inference server by @tstescoTT in #43
- Offline inference benchmarking by @tstescoTT in #44
- Uplift vllm-tt-metal-llama3-70b by @tstescoTT in #45
- Add pyproject.toml, update README.md, fix add_spdx_header.py by @tstescoTT in #46
- move ttnn mocking and setup_mock_model_weights to mock_vllm_model.py by @tstescoTT in #47
- remove old dockerfile vllm.llama3.src.base.inference.v0.52.0.Dockerfile by @tstescoTT in #48
- update eval scripts by @tstescoTT in #49
- Add Locust testing scripts and configurations by @milank94 in #50
- fix ttnn mock for all import cases by @tstescoTT in #54
- Fix early exiting behaviour in setup_weights and add setup of cache d… by @bgoelTT in #60
- Add YoloV4 inference server by @bgoelTT in #52
- adding online benchmarking scripts by @tstescoTT in #55
- Add JWT API key authentication to YOLOv4 by @bgoelTT in #65
- adding test vllm script by @tstescoTT in #58
- benchmark and evals changes for Llama 3.1 70B v0 drop testing by @tstescoTT in #59
- Llama 3.x model support, setup.sh script multiple model support using HF download by @tstescoTT in #67
- rename PERSISTENT_VOLUME to MODEL_VOLUME in legacy documentation by @tstescoTT in #68
- update documentation links and instructions for setup by @tstescoTT in #69
- YOLOv4 improvements by @bgoelTT in #80
- Llama 3.x multimodal support for evaluations and benchmarking by @tstescoTT in #79
- #77: fix permissions setup of mounted volumes by @tstescoTT in #78
- Release Candidate v0.0.1 by @tstescoTT in #88
New Contributors
- @anirudTT made their first contribution in #8
- @sbennettTT made their first contribution in #18
- @milank94 made their first contribution in #50
- @bgoelTT made their first contribution in #60
Full Changelog: https://github.com/tenstorrent/tt-inference-server/commits/v0.0.1