
Kitaoka, Norihide
| Affiliation | Department of Computer Science and Engineering | 
|---|---|
| Title | Professor | 
| Fields of Research | Speech information processing | 
| Degree | Dr. of Eng. (Toyohashi University of Technology) | 
| Academic Societies | IEEE, ISCA, APSIPA, IEICE, IPSJ, ASJ, JSAI, ANLP | 
| kitaoka Please append "tut.jp" to the end of the address above. | |
| Laboratory website URL | http://www.slp.cs.tut.ac.jp | 
| Researcher information URL(researchmap) | Researcher information | 
Research
Almost all the humans use spoken dialog, which is the most natural communication method. If we can recognize/manage/ synthesize speech in computers, this speech can be not only the best method of communication but can also be used as data storage media. I am engaged in technologies on spoken language.
Theme1:Speech recognition
Overview
Making transcriptions of monologues such as lectures is a very promising research area. We improve acoustic modeling of the human voice using deep learning models.
Selected publications and works
Takahiro Kinouchi, Atsunori Ogawa, Yukoh Wakabayashi, Kengo Ohta, Norihide Kitaoka, “Domain adaptation using non-parallel target domain corpus for self-supervised learning-based automatic speech recognition,” SPEECH COMMUNICATION, Vol. 174, 103303, (8 pages) Oct., 2025.
Daiki Mori, Kengo Ohta, Ryota Nishimura, Atsunori Ogawa, Norihide Kitaoka, “Recognition of target domain Japanese speech using language model replacement,” EURASIP Journal on Audio, Speech and Music Processing, Article number: 40 (2024), 14 pages, 2024. (DOI: 10.1186/s13636-024-00360-8)
Keywords
Theme2:Friendly spoken dialog system
Overview
The first impression of a spoken dialog system for novice users is that it is unnatural, because the time-lag between a human utterance and the system reply is too long and as such the user cannot distinguish whether or not the system works. This is one of the reasons why users do not feel that spoken dialog systems can be used in a comfortable, frendly manner. Thus, we focus on prosodic features like timing and pitch change in a dialog. Our dialog system has begun to speak with appropriate prosodic features considering previous user utterances. When the dialog gets lively, the pitch of the system utterances chase the user's pitch. On the other hand, we also study a semantic dialog strategy. We are now developing a robust and natural response generation method in a system that considers its own misunderstandings.
Selected publications and works
Kazuya Tsubokura, Yurie Iribe, Norihide Kitaoka, “Analysis of the Relationship between User Response to Dialog Breakdown and Personality Traits,” Advanced Robotics, Vol. 37, Issue 21, pp.1-10 ,Nov., 2023. (DOI: 10.1080/01691864.2023.2279610)
Norihide Kitaoka, Masashi Takeuchi, Ryota Nishimura, Seiichi Nakagawa, "Response timing detection using prosodic and linguistic information for human-friendly spoken dialog systems," Transactions of the Japanese Society for Artificial Intellignece, Vol.20, No.3 SP-E, pp. 220-228, Mar., 2005. 
Keywords
Theme3:Multimodal interface
Overview
Human often uses gestures such as finger pointing and gaze to transmit his/her intention. We are trying to realize such interaction between human and machine.
Consider the operation of an autonomous vehicle. How do you let it know where you want to go and where you want to turn? It is useful if you can use finger pointing and gaze. We are developing an autonomous vehicle with such interface!
Selected publications and works
Tamon Mikawa, Yasuhisa Fujii, Yukoh Wakabayashi, Kengo Ohta, Ryota Nishimura, Norihide Kitaoka, “Improving Listening Head Generation Performance Using Speech Representations from Self-Supervised Learning,” Proc. APSIPA ASC 2025, Oct., 2025.
Keywords
Title of class
Introduction to Data Structures
Formal language
Spoken Language Processing

