Skip to content

V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM Resources

License

Notifications You must be signed in to change notification settings

abdur75648/V-Zen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM

Introduction

SuperAGI logo SuperAGI logo

V-Zen SuperAGI arXiv Demo

V-Zen is a novel multimodal large language model (LLM) designed for efficient GUI understanding and precise grounding. Our model introduces an innovative architecture that significantly improves the performance of GUI automation tasks.

Code Availability

Coming Soon...

Citation

If you find this work useful, please consider citing the following paper:

@article{author2024vzen,
      title={V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM},
      author={Abdur Rahman and Rajat Chawla and Muskaan Kumar and Arkajit Datta and Adarsh Jha and Mukunda NS and Ishaan Bhola},
      journal={arXiv preprint arXiv:2405.15341},
      year={2024},
      eprint={2405.15341},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2405.15341}, 
}