Hello 🎊

I am Jinxi He (何锦熙),
a Master student at Carnegie Mellon University ,
supervised by Prof. Katia Sycara .

My research focused on Multi-modal Large Language Model (MLLM) hallucination and all kinds of interesting generation tasks.
I am also deeply interested in Robot Learning, particularly long horizon visual task planning and execution.

News 🐝

[2025.04] - Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning [arXiv] [github]
[2025.04] - Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting [arXiv] [github]
[2025.03] - VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity. [arXiv] [website]

Click to see more!