Publications

2026

SpatialMem: Unified 3D Memory with Metric Anchoring and Fast Retrieval

Xinyi Zheng, Yunze Liu, Chi-Hao Wu, Fan Zhang, Hao Zheng, Wenqi Zhou, Walterio W. Mayol-Cuevas, Junxiao Shen

arXiv preprint arXiv:2601.14895

Unified 3D memory architecture with metric anchoring and fast retrieval for spatial AI systems — language grounding and question-answering over long-horizon video.

PDF arXiv Details
CAMP-VQA: Caption-Embedded Multimodal Perception for No-Reference Quality Assessment of Compressed Video

Xinyao Wang, Angeliki Katsenou, Junxiao Shen, David Bull

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2026

Novel approach to video quality assessment using caption-embedded multimodal perception for compressed video analysis without reference frames.

PDF arXiv Details
St-think: How Multimodal Large Language Models Reason about 4D Worlds from Ego-Centric Videos

Peiran Wu, Yiling Liu, Meiyi Liu, Junxiao Shen

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2026

Exploring how multimodal large language models can understand and reason about temporal and spatial relationships in egocentric video content.

PDF arXiv Details
MARC: Memory-Augmented RL Token Compression for Efficient Video Understanding

Peiran Wu, Zihong Yu, Yiling Liu, Ching-Hui Wu, Enmin Zhou, Junxiao Shen

International Conference on Learning Representations (ICLR) 2026

This work presents MARC, a novel approach for efficient video understanding through memory-augmented reinforcement learning token compression, enabling better processing of long-form video content.

PDF arXiv Details

2025

X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding

Wenqi Zhou, Kai Cao, Hao Zheng, Xinyi Zheng, Meiyi Liu, Per Ola Kristensson, Walterio Mayol-Cuevas, Feng Zhang, Wei Lin, Junxiao Shen

Conference on Empirical Methods in Natural Language Processing (EMNLP) 2025 — Findings

Comprehensive benchmark for evaluating AI models on extremely long egocentric video sequences, addressing challenges in temporal video understanding.

PDF arXiv Details
CULTURE3D: A Large-Scale and Diverse Dataset of Cultural Landmarks and Terrains for Gaussian-Based Scene Rendering

Xinyi Zheng, Shijie Zhang, Wei Lin, Aoran Zhang, Walterio W. Mayol-Cuevas, Yiling Liu, Junxiao Shen

IEEE/CVF International Conference on Computer Vision (ICCV) 2025

Large-scale dataset of cultural landmarks and terrains designed for advancing Gaussian-based scene rendering techniques in computer vision.

PDF arXiv Details
Prompt-Driven Agentic Video Editing System: Autonomous Comprehension of Long-Form, Story-Driven Media

Zihan Ding, Xinyi Wang, Junlong Chen, Per Ola Kristensson, Junxiao Shen

arXiv preprint arXiv:2509.16811

Prompt-driven agentic system that autonomously comprehends and edits long-form, story-driven video — bridging LLM planning and video understanding.

PDF arXiv Details
UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks

Peiran Wu, Yunze Liu, Zhengdong Zhu, Enmin Zhou, Junxiao Shen

arXiv preprint arXiv:2507.11336

An omni captioning model and new benchmarks for detailed description of user-generated video content across diverse domains.

PDF arXiv Details
VideoMAP: Toward Scalable Mamba-based Video Autoregressive Pretraining

Yunze Liu, Peiran Wu, Cheng Liang, Junxiao Shen, Limin Wang, Li Yi

arXiv preprint arXiv:2503.12332

Scalable Mamba-based autoregressive pretraining recipe for long-form video understanding.

PDF arXiv Details
AutoMR: A Universal Time Series Motion Recognition Pipeline

Likun Zhang, Sicheng Yang, Zhuo Wang, Haining Liang, Junxiao Shen

arXiv preprint arXiv:2502.15228

Universal time-series motion-recognition pipeline that automates model selection and training across heterogeneous sensor streams.

PDF arXiv Details
Duo Streamers: A Streaming Gesture Recognition Framework

Boxuan Zhu, Sicheng Yang, Zhuo Wang, Haining Liang, Junxiao Shen

arXiv preprint arXiv:2502.12297

Low-latency streaming gesture-recognition framework for on-device real-time input in XR and wearable settings.

PDF arXiv Details

2024

Lucia: A Temporal Computing Platform for Contextual Intelligence

Weizhe Lin, Junxiao Shen

arXiv preprint arXiv:2411.12778

A temporal computing platform that maintains persistent visual memory across long-horizon interactions — the technical vision underpinning the Memories.ai product.

PDF arXiv Details
RingGesture: A Ring-Based Mid-Air Gesture Typing System Powered by a Deep-Learning Word Prediction Framework

Junxiao Shen, Roger Boldu, Arpit Kalla, Michael Glueck, Hemant Bhaskar Surale, Amy Karlson

IEEE Transactions on Visualization and Computer Graphics (TVCG), 30(11):7441–7451 (ISMAR 2024)

Ring-based mid-air gesture typing system for AR glasses, achieving 25 WPM at 96% accuracy using a smart-ring input device and a deep-learning word-prediction framework. Meta patent filed.

PDF arXiv Details
Gesture2Text: A Generalizable Decoder for Word-Gesture Keyboards in XR Through Trajectory Coarse Discretization and Pre-training

Junxiao Shen, Khadija Khaldi, Enmin Zhou, Hemant Bhaskar Surale, Amy Karlson

IEEE Transactions on Visualization and Computer Graphics (TVCG), 30(11):7118–7128 (ISMAR 2024)

First pre-trained foundation model for word-gesture decoding in XR. Trajectory coarse discretization plus pre-training generalizes across keyboard layouts, improving accuracy by ~40% over prior methods.

PDF arXiv Details
Human-inspired Perspectives: A Survey on AI Long-term Memory

Zihong He, Weizhe Lin, Hao Zheng, Fan Zhang, Matt W. Jones, Laurence Aitchison, Xuhai Xu, Miao Liu, Per Ola Kristensson, Junxiao Shen

arXiv preprint arXiv:2411.00489

Comprehensive survey establishing a human-inspired taxonomy for AI long-term memory. Widely cited foundational reference for emerging work on persistent AI systems.

PDF arXiv Details
Towards Open-World Gesture Recognition

Junxiao Shen, Matthias De Lange, Xuhai "Orson" Xu, Enmin Zhou, Ran Tan, Naveen Suda, Per Ola Kristensson, Amy Karlson, Evan Strasnick

IEEE International Symposium on Mixed and Augmented Reality (ISMAR) 2024

First open-world continual-learning framework for gesture recognition — handles unseen gesture classes and distribution shifts in deployed XR systems.

PDF arXiv Details
Encode-Store-Retrieve: Enhancing Memory Augmentation through Language-Encoded Egocentric Perception

Junxiao Shen, John Dudley, Per Ola Kristensson

IEEE International Symposium on Mixed and Augmented Reality (ISMAR) 2024

Foundational encode–store–retrieve architecture for AI memory augmentation via language-encoded egocentric perception. The core technology behind the Memories.ai platform.

PDF arXiv Details
Design and Evaluation of Controller-Based Raycasting Methods for Efficient Alphanumeric and Special Character Entry in Virtual Reality

Tiffany Wan, Yunhan Wei, Ruolin Shi, Junxiao Shen, Per Ola Kristensson, Keith Atkinson, Hai-Ning Liang

IEEE Transactions on Visualization and Computer Graphics (TVCG), 30(9):6493–6506

Comprehensive evaluation of controller-based raycasting methods for text entry in virtual reality environments, improving alphanumeric and special character input efficiency.

DOI Details
Boosting Gesture Recognition with an Automatic Gesture Annotation Framework

Junxiao Shen, Xuhai Xu, Ran Tan, Amy Karlson, Evan Strasnick

IEEE International Conference on Automatic Face and Gesture Recognition (FG) 2024

Automatic gesture-annotation framework that reduces manual labelling effort by roughly 90% while measurably improving downstream gesture-recognition accuracy. FG is the leading venue for face and gesture recognition.

PDF arXiv Details

2023

Fast and Robust Mid-Air Gesture Typing for Augmented Reality Headsets using 3D Trajectory Decoding

Junxiao Shen, John Dudley, Per Ola Kristensson

IEEE Transactions on Visualization and Computer Graphics (TVCG), 29(11):4622–4632

Fast and robust mid-air gesture typing for AR headsets using 3D trajectory decoding. The core framework has been adopted by 15+ research groups internationally. Demo video.

PDF Details
Promptor: A Conversational and Autonomous Prompt Generation Agent for Intelligent Text Entry Techniques

Junxiao Shen, John J. Dudley, Jingyao Zheng, Bill Byrne, Per Ola Kristensson

arXiv preprint arXiv:2310.08101

A conversational and autonomous agent that generates prompts on-the-fly for LLM-powered intelligent text-entry techniques.

PDF arXiv Details
XAIR: A Framework of Explainable AI in Augmented Reality

Xuhai Xu, Mengjie Yu, Tanya R. Jonker, Kashyap Todi, Feiyu Lu, Xun Qian, João Marcelo Evangelista Belo, Tianyi Wang, Michelle Li, Aran Mun, Te-Yen Wu, Junxiao Shen, Ting Zhang, Narine Kokhlikyan, Fulton Wang, Paul Sorenson, Sophie Kahyun Kim, Hrvoje Benko

ACM Conference on Human Factors in Computing Systems (CHI) 2023

First framework for explainable AI in augmented reality — surfacing AI decision-making through augmented visual overlays. Published at CHI, the premier HCI venue (acceptance rate ~25%).

PDF arXiv Details

2022

Personalization of a Mid-Air Gesture Keyboard using Multi-Objective Bayesian Optimization

Junxiao Shen, Jinghui Hu, John Dudley, Per Ola Kristensson

IEEE International Symposium on Mixed and Augmented Reality (ISMAR) 2022

Personalization of a mid-air gesture keyboard using multi-objective Bayesian optimization to trade off typing speed and accuracy for each user.

PDF Details
Gesture Spotter: A Rapid Prototyping Tool for Key Gesture Spotting in Virtual and Augmented Reality Applications

Junxiao Shen, John Dudley, George Mo, Per Ola Kristensson

IEEE Transactions on Visualization and Computer Graphics (TVCG), 28(11):3618–3628 (presented at ISMAR 2022)

Open-source toolkit enabling rapid prototyping and evaluation of key-gesture spotting in VR/AR. Demo video.

PDF Details
KWickChat: A Multi-Turn Dialogue System for AAC Using Context-Aware Sentence Generation by Bag-of-Keywords

Junxiao Shen, Boyin Yang, John J. Dudley, Per Ola Kristensson

27th International Conference on Intelligent User Interfaces (IUI) 2022

First context-aware multi-turn dialogue system for augmentative and alternative communication. KWickChat expands a small bag of keywords into fluent, context-appropriate sentences — improving text-entry speed for people who rely on AAC.

PDF Details
Reinforcement Learning in Presence of Discrete Markovian Context Evolution

Hang Ren, Aivar Sootla, Taher Jafferjee, Junxiao Shen, Jun Wang, Haitham Bou-Ammar

International Conference on Learning Representations (ICLR) 2022

A context-dependent reinforcement-learning method using a Hierarchical Dirichlet Process prior to handle discrete Markovian context evolution. Published at ICLR, a top-tier machine-learning venue.

PDF arXiv Details

2021

The imaginative generative adversarial network: Automatic data augmentation for dynamic skeleton-based hand gesture and human action recognition

Junxiao Shen, John Dudley, Per Ola Kristensson

2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG)

An imaginative GAN that automatically augments skeleton-based motion data, improving downstream hand-gesture and human-action recognition accuracy.

PDF Details
Simulating realistic human motion trajectories of mid-air gesture typing

Junxiao Shen, John Dudley, Per Ola Kristensson

2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)

Simulation of realistic human motion trajectories for mid-air gesture typing, enabling data-driven AR text-entry research without costly user studies.

PDF Details