Yuxuan Wang

I am currently a research engineer at the NLP Lab of the Beijing Institute for General Artificial Intelligence (BIGAI), led by Zilong Zheng and Songchun Zhu. Prior to this, I obtained my Master's degree from Peking University under the supervision of Dongyan Zhao. Additionally, I completed a summer internship at Johns Hopkins University, where I was mentored by Zhuowan Li and Alan L. Yuille. Currently, I also collaborate with Cihang Xie at the University of California, Santa Cruz. My current research primarily concentrates on the domains of video-language learning and multimodal agents. I am especially captivated by studies that provide novel insights and helpful applications.

I am looking for a PhD position. Please feel free to contact me without any hesitation!
ScholarGithub CV

Selected Publications (* = equal contribution)

STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question Answering
Yueqian Wang, Yuxuan Wang, Kai Chen, Dongyan Zhao
AAAI 2024 | PDF | Code | Cite
VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions
Yuxuan Wang, Zilong Zheng, Xueliang Zhao, Jinpeng Li, Yueqian Wang, Dongyan Zhao
ACL 2023 | PDF | Code | Homepage | Cite
Shuo Wen Jie Zi: Rethinking Dictionaries and Glyphs for Chinese Language Pre-training
Yuxuan Wang, Jianghui Wang, Dongyan Zhao, Zilong Zheng
ACL 2023 Findings | PDF | Code | Cite
Collaborative Reasoning on Multi-Modal Semantic Graphs for Video-Grounded Dialogue Generation
Xueliang Zhao*, Yuxuan Wang*, Chongyang Tao, Chenshuo Wang, Dongyan Zhao
EMNLP 2022 Findings | PDF | Code | Cite

Preprints (* = equal contribution)

LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form Video-Text Understanding
Yuxuan Wang, Yueqian Wang, Pengfei Wu, Jianxin Liang, Dongyan Zhao, Zilong Zheng
PDF | Code & Demo | Cite
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
Jianghui Wang*, Yuxuan Wang*, Dongyan Zhao, Zilong Zheng
PDF | Code | Homepage | Cite
HawkEye: Training Video-Text LLMs for Grounding Text in Videos
Yueqian Wang, Xiaojun Meng, Jianxin Liang, Yuxuan Wang, Qun Liu, Dongyan Zhao
PDF | Code | Cite
Teaching Text-to-Image Models to Communicate
Xiaowen Sun*, Jiazhan Feng*, Yuxuan Wang, Yuxuan Lai, Dongyan Zhao
PDF | Cite

Open-Source Learning Hub

Colorful Multimodal Research
Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics, and Fundamental Sciences such as Mathematics.
Language Modeling Research Hub
A comprehensive compendium for enthusiasts and scholars delving into the fascinating realm of language models (LMs), with a particular focus on large language models (LLMs)