Language Models and Vision-Language Models

This project studies large language models and vision-language models for instruction following, multimodal grounding, and agentic interaction.

本テーマでは、指示追従、マルチモーダル理解、エージェント的な振る舞いを実現するための大規模言語モデル・視覚言語モデルを研究します。

Issa Sugiura, Shuhei Kurita, Yusuke Oda, Ryuichiro Higashinaka, “Llama-Mimi: Speech Language Models with Interleaved Semantic and Acoustic Tokens,” arXiv:2509.14882. [paper]
Eri Onami, Shuhei Kurita, Taiki Miyanishi, and Taro Watanabe, “JDocQA: Japanese Document Question Answering Dataset for Generative Language Models,” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024).
Shunya Kato, Shuhei Kurita, Chenhui Chu, and Sadao Kurohashi, “ARKitSceneRefer: Text-based Localization of Small Objects in Diverse Real-World 3D Indoor Scenes,” In Findings of the Association for Computational Linguistics: EMNLP 2023 (EMNLP2023 findings).
Shuhei Kurita, Daisuke Kawahara, and Sadao Kurohashi, “Neural Joint Model for Transition-based Chinese Syntactic Analysis,” In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL2017).