Multimodal Needle In A Video Haystack
Pressure Testing Large Video-Language Models (LVLM): Doing multimodal retrieval from LVLM at various video lengths to measure accuracy.
Streaming Grounded SAM 2
Grounded SAM 2 for streaming video tracking using natural language queries.
Open-Source Learning Hub
Colorful Multimodal Research
Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics, and Fundamental Sciences such as Mathematics.
Language Modeling Research Hub
A comprehensive compendium for enthusiasts and scholars delving into the fascinating realm of language models (LMs), with a particular focus on large language models (LLMs).
Multimodal Memory Research
Reading List of Memory Augmented Multimodal Research, including multimodal context modeling, memory in vision and robotics, and external memory/knowledge augmented MLLM.