You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I hope this message finds you well. I want to express my gratitude for providing the repository; it has been immensely helpful in enabling me to successfully execute the example.py script.
Furthermore, I have thoroughly reviewed the associated paper, which has given me a solid understanding of the project's context. However, I now have a few queries regarding the practical usage of the repository.
I have successfully managed to work with video and text inputs, but I am a bit unsure about how to incorporate the "RT-2" component, which is designed for Video-Language-Action interaction. I might be overlooking something, and I'd appreciate any guidance or clarification you could provide in this regard.
Additionally, while I've been able to obtain results using video and text inputs, I would greatly appreciate some clarification on the interpretation of these results. If you could shed some light on the meaning or implications of these outcomes, it would be immensely helpful.
Thank you very much for your assistance, and I look forward to your response.
Best regards,
Upvote & Fund
We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.
The text was updated successfully, but these errors were encountered:
import torch
from rt2.model import RT2
model = RT2()
video = torch.randn(2, 3, 6, 224, 224)
instructions = [
'bring me that apple sitting on the table',
'please pass the butter'
]
# compute the train logits
train_logits = model.train(video, instructions)
# set the model to evaluation mode
model.model.eval()
# compute the eval logits with a conditional scale of 3
eval_logits = model.eval(video, instructions, cond_scale=3.)
Hello,
I hope this message finds you well. I want to express my gratitude for providing the repository; it has been immensely helpful in enabling me to successfully execute the example.py script.
Furthermore, I have thoroughly reviewed the associated paper, which has given me a solid understanding of the project's context. However, I now have a few queries regarding the practical usage of the repository.
I have successfully managed to work with video and text inputs, but I am a bit unsure about how to incorporate the "RT-2" component, which is designed for Video-Language-Action interaction. I might be overlooking something, and I'd appreciate any guidance or clarification you could provide in this regard.
Additionally, while I've been able to obtain results using video and text inputs, I would greatly appreciate some clarification on the interpretation of these results. If you could shed some light on the meaning or implications of these outcomes, it would be immensely helpful.
Thank you very much for your assistance, and I look forward to your response.
Best regards,
Upvote & Fund
The text was updated successfully, but these errors were encountered: