VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges
Yuxuan Wang*, Yiqi Song*, Cihang Xie, Yang Liu, Zilong Zheng
ICCV 2025
|
PDF
|
Code
|
Homepage
|
Cite
|
OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
Yuxuan Wang, Yueqian Wang, Bo Chen, Tong Wu, Dongyan Zhao, Zilong Zheng
CVPR 2025
|
PDF
|
Code(OmniMMI)
|
Code(M4)
|
Homepage
|
Cite
|
Friends-MMC: Dataset for Multi-modal Multi-party Conversation Understanding
Yueqian Wang, Xiaojun Meng, Yuxuan Wang, Jianxin Liang, Qun Liu, Dongyan Zhao
AAAI 2025
|
PDF
|
Code
|
Cite
|
Efficient Temporal Extrapolation of Multimodal Large Language Models with Temporal Grounding Bridge
Yuxuan Wang, Yueqian Wang, Pengfei Wu, Jianxin Liang, Dongyan Zhao, Zilong Zheng
EMNLP 2024
|
PDF
|
Code & Demo
|
Cite
|
ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning
Yuxuan Wang, Alan Yuille, Zhuowan Li, Zilong Zheng
COLM 2024
|
PDF
|
Code
|
Cite
|
STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question Answering
Yueqian Wang, Yuxuan Wang, Kai Chen, Dongyan Zhao
AAAI 2024
|
PDF
|
Code
|
Cite
|
VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions
Yuxuan Wang, Zilong Zheng, Xueliang Zhao, Jinpeng Li, Yueqian Wang, Dongyan Zhao
ACL 2023
|
PDF
|
Code
|
Homepage
|
Cite
|
Shuo Wen Jie Zi: Rethinking Dictionaries and Glyphs for Chinese Language Pre-training
Yuxuan Wang, Jianghui Wang, Dongyan Zhao, Zilong Zheng
ACL 2023 Findings
|
PDF
|
Code
|
Cite
|
Collaborative Reasoning on Multi-Modal Semantic Graphs for Video-Grounded Dialogue Generation
Xueliang Zhao*, Yuxuan Wang*, Chongyang Tao, Chenshuo Wang, Dongyan Zhao
EMNLP 2022 Findings
|
PDF
|
Code
|
Cite
|