Skip to content

๐Ÿ“–Curated list about reasoning abilitiy of MLLM, including OpenAI o1, OpenAI o3-mini, and Slow-Thinking.

License

Notifications You must be signed in to change notification settings

Ruiyang-061X/Awesome-MLLM-Reasoning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 
ย 
ย 

Repository files navigation

Stargazers Forks Contributors MIT License

Awesome MLLM Reasoning

โญโญโญ If you find this repo useful, please star it!

๐Ÿ”ฅ for papers with >50 citations or repositories with >200 stars.
๐Ÿ“– for papers accepted by reputed conferences/journals.

โš’๏ธ Technical Report

2025

  1. OpenAI o3-mini (31 Jan, 2025)

2024

  1. OpenAI o1 (12 Sept, 2024)

๐Ÿ“ Survey

2025

  1. Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models (16 Jan 2025)

    Fengli Xu, Qianyue Hao, Zefang Zong, Jingwei Wang, Yunke Zhang, Jingyi Wang, Xiaochong Lan, Jiahui Gong, Tianjian Ouyang, Fanjin Meng, Chenyang Shao, Yuwei Yan, Qinglong Yang, Yiwen Song, Sijian Ren, Xinyuan Hu, Yu Li, Jie Feng, Chen Gao, Yong Li

2024

  1. Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning (10 Jan 2024)

    Yiqi Wang, Wentao Chen, Xiaotian Han, Xudong Lin, Haiteng Zhao, Yongfei Liu, Bohan Zhai, Jianbo Yuan, Quanzeng You, Hongxia Yang

  2. Reasoning with Large Language Models, a Survey (16 Jul 2024)

    Aske Plaat, Annie Wong, Suzan Verberne, Joost Broekens, Niki van Stein, Thomas Back

  3. ๐Ÿ“– LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models (1 Apr 2024, COLM 2024)

    Yadong Zhang, Shaoguang Mao, Tao Ge, Xun Wang, Adrian de Wynter, Yan Xia, Wenshan Wu, Ting Song, Man Lan, Furu Wei

2022

  1. ๐Ÿ”ฅ๐Ÿ“– Towards Reasoning in Large Language Models: A Survey (20 Dec 2022, ACL 2023 Findings)

    Jie Huang, Kevin Chen-Chuan Chang

๐Ÿ“ˆ Benchmark

2025

  1. EMMA Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark [Code] (9 Jan 2025)

    Yunzhuo Hao, Jiawei Gu, Huichen Will Wang, Linjie Li, Zhengyuan Yang, Lijuan Wang, Yu Cheng

2024

  1. Polymath Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark [Code] (6 Oct 2024)

    Himanshu Gupta, Shreyas Verma, Ujjwala Anantheswaran, Kevin Scaria, Mihir Parmar, Swaroop Mishra, Chitta Baral

  2. ๐Ÿ“– MLLM-CompBench MLLM-CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs [Code] (23 Jul 2024, NeurIPS 2024)

    Jihyung Kil, Zheda Mai, Justin Lee, Zihe Wang, Kerrie Cheng, Lemeng Wang, Ye Liu, Arpita Chowdhury, Wei-Lun Chao

  3. LogicVista LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts [Code] (6 Jul 2024)

    Yijia Xiao, Edward Sun, Tianyu Liu, Wei Wang

  4. ๐Ÿ”ฅ๐Ÿ“– Visual CoT Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning [Code] (25 Mar 2024, NeurIPS 2024 Spotlight)

    Hao Shao, Shengju Qian, Han Xiao, Guanglu Song, Zhuofan Zong, Letian Wang, Yu Liu, Hongsheng Li

  5. ๐Ÿ”ฅ Mementos Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences [Code] (19 Jan 2024)

    Xiyao Wang, Yuhang Zhou, Xiaoyu Liu, Hongjin Lu, Yuancheng Xu, Feihong He, Jaehong Yoon, Taixi Lu, Gedas Bertasius, Mohit Bansal, Huaxiu Yao, Furong Huang

โ†‘ Back to Top โ†‘

โœจ Paper

2025

  1. Virgo Virgo: A Preliminary Exploration on Reproducing o1-like MLLM [Code] (3 Jan 2025)

    Yifan Du, Zikang Liu, Yifan Li, Wayne Xin Zhao, Yuqi Huo, Bingning Wang, Weipeng Chen, Zheng Liu, Zhongyuan Wang, Ji-Rong Wen

2024

  1. ๐Ÿ”ฅ Mulberry Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search [Code] (24 Dec 2024)

    Huanjin Yao, Jiaxing Huang, Wenhao Wu, Jingyi Zhang, Yibo Wang, Shunyu Liu, Yingjie Wang, Yuxin Song, Haocheng Feng, Li Shen, Dacheng Tao

  2. AtomThink AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning [Code] (18 Nov 2024)

    Kun Xiang, Zhili Liu, Zihao Jiang, Yunshuang Nie, Runhui Huang, Haoxiang Fan, Hanhui Li, Weiran Huang, Yihan Zeng, Jianhua Han, Lanqing Hong, Hang Xu, Xiaodan Liang

  3. ๐Ÿ”ฅ LLaVA-CoT LLaVA-CoT: Let Vision Language Models Reason Step-by-Step [Code] (15 Nov 2024)

    Guowei Xu, Peng Jin, Hao Li, Yibing Song, Lichao Sun, Li Yuan

  4. ๐Ÿ“– IRED Learning Iterative Reasoning through Energy Diffusion [Code] [Website] (17 Jun 2024, ICML 2024)

    Yilun Du, Jiayuan Mao, Joshua B. Tenenbaum

  5. ๐Ÿ“– GeoReasoner GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model [Code] (3 Jun 2024, ICML 2024)

    Ling Li, Yu Ye, Bingchuan Jiang, Wei Zeng

  6. ๐Ÿ”ฅ๐Ÿ“– VoT Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition [Code] [Website] (7 May 2024, ICML 2024 Oral)

    Hao Fei, Shengqiong Wu, Wei Ji, Hanwang Zhang, Meishan Zhang, Mong-Li Lee, Wynne Hsu

  7. Cantor Cantor: Inspiring Multimodal Chain-of-Thought of MLLM [Code] (24 Apr 2024)

    Timin Gao, Peixian Chen, Mengdan Zhang, Chaoyou Fu, Yunhang Shen, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Xing Sun, Liujuan Cao, Rongrong Ji

  8. ๐Ÿ“– Momentor Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning [Code] (18 Feb 2024, ICML 2024)

    Long Qian, Juncheng Li, Yu Wu, Yaobo Ye, Hao Fei, Tat-Seng Chua, Yueting Zhuang, Siliang Tang

  9. ๐Ÿ“– ContPhy ContPhy: Continuum Physical Concept Learning and Reasoning from Videos [Code] [Website] (9 Feb 2024, ICML 2024)

    Zhicheng Zheng, Xin Yan, Zhenfang Chen, Jingzhou Wang, Qin Zhi Eddie Lim, Joshua B. Tenenbaum, Chuang Gan

  10. ๐Ÿ“– ConTextual ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models [Code] [Website] (24 Jan 2024, ICML 2024)

    Rohan Wadhawan, Hritik Bansal, Kai-Wei Chang, Nanyun Peng

2023

  1. ๐Ÿ”ฅ๐Ÿ“– MM-CoT Multimodal Chain-of-Thought Reasoning in Language Models [Code] (2 Feb 2023, TMLR)

    Zhuosheng Zhang, Aston Zhang, Mu Li, Hai Zhao, George Karypis, Alex Smola

โ†‘ Back to Top โ†‘

๐Ÿบ Contributing

All issues and pull requests are warmly welcomed to contribute related papers to this curated list! Feel free to submit any relevant additions to help expand and enhance this collection.

๐Ÿ’ก Contributors