Long-Form Video Understanding through Multi-Modal Large Language Models
Date:
Invited talk at Microsoft Research Asia on long-form video understanding with multimodal LLMs — benchmarks, architectures, and open problems in ultra-long egocentric video reasoning.