Skip to content

CogVLM (agents) does poorly with text still #196

Closed Answered by wenyihong
pseudotensor asked this question in Bad Case
Discussion options

You must be logged in to vote

hi, Thanks for your discussion. Charts and tables are relatively difficult forms of OCR, and CogAgent / VLM may fail in some cases. Thanks for bringing it up and we are actively improving our model on this.

We've just released another checkpoint called cogagent-vqa, which has better performance on single-round VQA. How about trying that?

Also, using complete sentences such as "What's the xxx of xxx?" as queries is suggested.

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by zRzRzRzRzRzRzR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants
Converted from issue

This discussion was converted from issue #194 on December 16, 2023 09:26.