Language Models and Vision-Language Models 言語モデル・視覚言語モデル
Foundation models for language understanding, multimodal reasoning, and interactive agents. 言語理解・マルチモーダル推論・対話エージェントのための基盤モデル研究です。
This project studies large language models and vision-language models for instruction following, multimodal grounding, and agentic interaction.
本テーマでは、指示追従、マルチモーダル理解、エージェント的な振る舞いを実現するための大規模言語モデル・視覚言語モデルを研究します。
Selected Publications
代表的な研究成果
- Issa Sugiura, Shuhei Kurita, Yusuke Oda, Ryuichiro Higashinaka, “Llama-Mimi: Speech Language Models with Interleaved Semantic and Acoustic Tokens,” arXiv:2509.14882. [paper]
- Eri Onami, Shuhei Kurita, Taiki Miyanishi, and Taro Watanabe, “JDocQA: Japanese Document Question Answering Dataset for Generative Language Models,” In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024).
- Shunya Kato, Shuhei Kurita, Chenhui Chu, and Sadao Kurohashi, “ARKitSceneRefer: Text-based Localization of Small Objects in Diverse Real-World 3D Indoor Scenes,” In Findings of the Association for Computational Linguistics: EMNLP 2023 (EMNLP2023 findings).
- Shuhei Kurita, Daisuke Kawahara, and Sadao Kurohashi, “Neural Joint Model for Transition-based Chinese Syntactic Analysis,” In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL2017).