-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes to start up DeepSeekR1 #219
Comments
Aside: when done using DeepSeekR1, I went to Deployments, I pressed 'Delete' and the QuietBox rebooted. I haven't dug into that yet. |
@windwardly: Thanks for opening this issue and trying out the DeepSeek model via As you noticed correctly , we do not yet have support / dropdown in Usually, if something goes wrong during setup, the persistence storage directory for the model (e.g., It's amazing that you eventually got it running! 💯
|
Anytime a model is deleted from deployment page/table a tt-smi reset is run on the tenstorrent board ; this is by design and is triggered automatically as long as the tenstorrent devices are mounted to the backend container |
It's more than the board getting reset: the whole machine rebooted, losing state (well, vim keeps edits on disk, but that means I need to find them and |
Oh, that's new. We have never observed this before. Just so I understand, you deleted the model, and the whole system rebooted? |
Yes. A snippet from
|
FYI update: Yes, and the reboot it is repeatable. I'll work on tracking it down later. |
Reboot happens independently of |
Problem
Running DeepSeekR1 in tt-studio
Steps
I used
tt-inference-server
to setup att_studio_persistent_volume
forDeepSeek-R1-Distill-Llama-70B
. The weights were downloaded; everything looked good.I then ran
startup.sh
and in thett-studio
web page, there was no DeepSeek selection on the Home Screen.Expected behavior
That I could select and deploy DeepSeekR1
A Remedy
I guessed at making and filling an entry in
tt-studio/app/api/shared_config/model_config.py
(please correct):The docker image could now be selected, but deploy failed.
I ran some tests just starting the docker image to see where it was failing.
It failed on
T3K permissions
The directory
tt_studio_persistent_volume/volume_id_tt-metal-DeepSeek-R1-Distill-Llama-70B-v0.0.1/model_weights/DeepSeek-R1-Distill-Llama-70B/T3K
was owned by root.Aside: Running
find /home/container_app_user -user root
list also~/cache_root/huggingface
hierarchy. It doesn't seem to cause a problem.I switched them all to userid/groupid 1000 and restarted the docker container manually and it converted DeepSeek weights into the
T3K
directory.I shut down the manual image, and started up tt-studio.
The
DeepSeek-R1-Distill-Llama-70B
deployed and runs. Yay!==
Additional cleanup: all files under
/home/container_app_user
were set to executable.In the container:
/usr/local/bin/docker-entrypoint.sh
In the tt-inference-server repo:
docker-entrypoint.sh
The
chmod -R 2775 "$var_dir"
, I think this should operate only the directories, something like:find "$var_dir" -type d -print0 | xargs -0 chmod 2775
The text was updated successfully, but these errors were encountered: