CogVLM (agents) does poorly with text still #196
-
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
我们收到你的反馈,尝试不要使用照片,使用扫描会有更好的效果 |
Beta Was this translation helpful? Give feedback.
-
hi, Thanks for your discussion. Charts and tables are relatively difficult forms of OCR, and CogAgent / VLM may fail in some cases. Thanks for bringing it up and we are actively improving our model on this. We've just released another checkpoint called Also, using complete sentences such as "What's the xxx of xxx?" as queries is suggested. |
Beta Was this translation helpful? Give feedback.
hi, Thanks for your discussion. Charts and tables are relatively difficult forms of OCR, and CogAgent / VLM may fail in some cases. Thanks for bringing it up and we are actively improving our model on this.
We've just released another checkpoint called
cogagent-vqa
, which has better performance on single-round VQA. How about trying that?Also, using complete sentences such as "What's the xxx of xxx?" as queries is suggested.