Long-Form Video Understanding through Multi-Modal Large Language Models

Date:

Invited talk at Microsoft Research Asia on long-form video understanding with multimodal LLMs — benchmarks, architectures, and open problems in ultra-long egocentric video reasoning.