I am currently a research engineer at the Qwen team, Alibaba Inc.
I obtained my Master's degree from Peking University, under the supervision of Dongyan Zhao.
I have had the wonderful experience of working with Zilong Zheng @ BIGAI, Cihang Xie @ UCSC, and Alan L. Yuille @ JHU.
My current work primarily focuses on omni-LMs. I am especially interested in studies that offer novel insights and impactful applications.
I am looking for Interns for omni-LM and open-world modeling research. Please feel free to contact me!
HawkEye: Training Video-Text LLMs for Grounding Text in Videos
Yueqian Wang, Xiaojun Meng, Jianxin Liang, Yuxuan Wang, Qun Liu, Dongyan Zhao
PDF| Code| Cite
Open-Source Projects
Open-Omni-Nexus
A fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.
Multimodal Needle In A Video Haystack
Pressure Testing Large Video-Language Models (LVLM): Doing multimodal retrieval from LVLM at various video lengths to measure accuracy.
Streaming Grounded SAM 2
Grounded SAM 2 for streaming video tracking using natural language queries.
Open-Source Learning Hub
Colorful Multimodal Research
Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics, and Fundamental Sciences such as Mathematics.
Language Modeling Research Hub
A comprehensive compendium for enthusiasts and scholars delving into the fascinating realm of language models (LMs), with a particular focus on large language models (LLMs).
Multimodal Memory Research
Reading List of Memory Augmented Multimodal Research, including multimodal context modeling, memory in vision and robotics, and external memory/knowledge augmented MLLM.