Embodied AI and Robotic Foundation Models

This project focuses on robot foundation models that connect visual perception, world understanding, planning, and physical interaction.

Robots: Aloha Stationary ×2, Unitree G1, InspireHand, AGIBot G1, and more.
Facilities: 3D printer.

本テーマでは、視覚認識、世界理解、計画、物理的インタラクションをつなぐロボット基盤モデルを研究します。

ロボット：Aloha Stationary ×2、Unitree G1、InspireHand、AGIBot G1 ほか
備品：3Dプリンタ

Selected Publications

代表的な研究成果

Daichi Yashima, Shuhei Kurita, Yusuke Oda, Komei Sugiura, “ReMoRa: Multimodal Large Language Model based on Refined Motion Representation for Long-Video Understanding”, Conference on Computer Vision and Pattern Recognition 2026 (CVPR2026).
Tomoya Yoshida, Shuhei Kurita, Taichi Nishimura, Shinsuke Mori, “Developing Vision-Language-Action Model from Egocentric Videos”, 2026 IEEE International Conference on Robotics & Automation (ICRA2026).
Tomoya Yoshida, Shuhei Kurita, Taichi Nishimura, Shinsuke Mori, “Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision”, Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025.
Daichi Azuma(*), Taiki Miyanishi(*), Shuhei Kurita(*) and Motoaki Kawanabe, “ScanQA: 3D Question Answering for Spatial Scene Understanding,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR2022).