豊橋技術科学大学

Search

Search

Kitaoka, Norihide

Affiliation Department of Computer Science and Engineering
Title Professor
Fields of Research Speech information processing
Degree Dr. of Eng. (Toyohashi University of Technology)
Academic Societies IEEE, ISCA, APSIPA, IEICE, IPSJ, ASJ, JSAI, ANLP
E-mail kitaoka
Please append "tut.jp" to the end of the address above.
Laboratory website URL http://www.slp.cs.tut.ac.jp
Researcher information URL(researchmap) Researcher information

Research

Almost all the humans use spoken dialog, which is the most natural communication method. If we can recognize/manage/ synthesize speech in computers, this speech can be not only the best method of communication but can also be used as data storage media. I am engaged in technologies on spoken language.

Theme1:Speech recognition

Overview

Making transcriptions of monologues such as lectures is a very promising research area. We improve acoustic modeling of the human voice using deep learning models.

Selected publications and works

Takahiro Kinouchi, Atsunori Ogawa, Yukoh Wakabayashi, Kengo Ohta, Norihide Kitaoka, “Domain adaptation using non-parallel target domain corpus for self-supervised learning-based automatic speech recognition,” SPEECH COMMUNICATION, Vol. 174, 103303, (8 pages) Oct., 2025.
Daiki Mori, Kengo Ohta, Ryota Nishimura, Atsunori Ogawa, Norihide Kitaoka, “Recognition of target domain Japanese speech using language model replacement,” EURASIP Journal on Audio, Speech and Music Processing, Article number: 40 (2024), 14 pages, 2024. (DOI: 10.1186/s13636-024-00360-8)

Keywords

Speech recognition, deep learning model

Theme2:Friendly spoken dialog system

Overview

The first impression of a spoken dialog system for novice users is that it is unnatural, because the time-lag between a human utterance and the system reply is too long and as such the user cannot distinguish whether or not the system works. This is one of the reasons why users do not feel that spoken dialog systems can be used in a comfortable, frendly manner. Thus, we focus on prosodic features like timing and pitch change in a dialog. Our dialog system has begun to speak with appropriate prosodic features considering previous user utterances. When the dialog gets lively, the pitch of the system utterances chase the user's pitch. On the other hand, we also study a semantic dialog strategy. We are now developing a robust and natural response generation method in a system that considers its own misunderstandings.

Selected publications and works

Kazuya Tsubokura, Yurie Iribe, Norihide Kitaoka, “Analysis of the Relationship between User Response to Dialog Breakdown and Personality Traits,” Advanced Robotics, Vol. 37, Issue 21, pp.1-10 ,Nov., 2023. (DOI: 10.1080/01691864.2023.2279610)
Norihide Kitaoka, Masashi Takeuchi, Ryota Nishimura, Seiichi Nakagawa, "Response timing detection using prosodic and linguistic information for human-friendly spoken dialog systems," Transactions of the Japanese Society for Artificial Intellignece, Vol.20, No.3 SP-E, pp. 220-228, Mar., 2005.

Keywords

Spoken dialog system

Theme3:Multimodal interface

Overview

Human often uses gestures such as finger pointing and gaze to transmit his/her intention. We are trying to realize such interaction between human and machine.
Consider the operation of an autonomous vehicle. How do you let it know where you want to go and where you want to turn? It is useful if you can use finger pointing and gaze. We are developing an autonomous vehicle with such interface!

Selected publications and works

Tamon Mikawa, Yasuhisa Fujii, Yukoh Wakabayashi, Kengo Ohta, Ryota Nishimura, Norihide Kitaoka, “Improving Listening Head Generation Performance Using Speech Representations from Self-Supervised Learning,” Proc. APSIPA ASC 2025, Oct., 2025.

Keywords

Multimodal interface, autonomous vehicle

Title of class

Introduction to Data Structures
Formal language
Spoken Language Processing


to Pagetop