CogVLM (agents) does poorly with text still #196

pseudotensor · 2023-12-15T16:36:02Z

pseudotensor
Dec 15, 2023

fastrotate3.png.zip

Dec 17, 2023

hi, Thanks for your discussion. Charts and tables are relatively difficult forms of OCR, and CogAgent / VLM may fail in some cases. Thanks for bringing it up and we are actively improving our model on this.

We've just released another checkpoint called cogagent-vqa, which has better performance on single-round VQA. How about trying that?

Also, using complete sentences such as "What's the xxx of xxx?" as queries is suggested.

View full answer

zRzRzRzRzRzRzR · 2023-12-16T09:27:02Z

zRzRzRzRzRzRzR
Dec 16, 2023
Maintainer

我们收到你的反馈，尝试不要使用照片，使用扫描会有更好的效果

0 replies

wenyihong · 2023-12-17T04:10:43Z

wenyihong
Dec 17, 2023
Maintainer

hi, Thanks for your discussion. Charts and tables are relatively difficult forms of OCR, and CogAgent / VLM may fail in some cases. Thanks for bringing it up and we are actively improving our model on this.

We've just released another checkpoint called cogagent-vqa, which has better performance on single-round VQA. How about trying that?

Also, using complete sentences such as "What's the xxx of xxx?" as queries is suggested.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CogVLM (agents) does poorly with text still #196

{{title}}

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

CogVLM (agents) does poorly with text still #196

pseudotensor Dec 15, 2023

Replies: 2 comments

zRzRzRzRzRzRzR Dec 16, 2023 Maintainer

wenyihong Dec 17, 2023 Maintainer

pseudotensor
Dec 15, 2023

zRzRzRzRzRzRzR
Dec 16, 2023
Maintainer

wenyihong
Dec 17, 2023
Maintainer