|Affiliation||Department of Computer Science and Engineering|
|Fields of Research||Speech information processing|
|Degree||Dr. of Eng. (Toyohashi University of Technology)|
|Academic Societies||IEEE, ISCA, APSIPA, IEICE, IPSJ, ASJ, JSAI|
Please append "tut.jp" to the end of the address above.
|Laboratory website URL||https://sites.google.com/site/a5gistokushimau/|
|Researcher information URL（researchmap）||Researcher information|
Almost all the humans use spoken dialog, which is the most natural communication method. If we can recognize/manage/ synthesize speech in computers, this speech can be not only the best method of communication but can also be used as data storage media. I am engaged in technologies on spoken language.
Making transcriptions of monologues such as lectures is a very promising research area. We improve acoustic modeling of the human voice using models such as the Hidden Markov Model (HMM) and Deep Neural Network (DNN), and statistical language modeling (N-gram) . We also improve the decoding algorithm.
Selected publications and works
Norihide Kitaoka, Daisuke Enami, Seiichi Nakagawa, "Effect of acoustic and linguistic contexts on human and machine speech recognition," Computer Speech and Language, Vol. 28, pp. 769-787, Feb., 2014.
Arata Itoh, Sunao Hara, Norihide Kitaoka, Kazuya Takeda, "Acoustic model training using pseudo-speaker feature generated by MLLR transofrmaions for robust speech recognition," IEICE Trans. Inf. & Syst., vol. E95-D, No. 10, pp. 2479-2485, Oct., 2012.
Theme2：Friendly spoken dialog system
The first impression of a spoken dialog system for novice users is that it is unnatural, because the time-lag between a human utterance and the system reply is too long and as such the user cannot distinguish whether or not the system works. This is one of the reasons why users do not feel that spoken dialog systems can be used in a comfortable, frendly manner. Thus, we focus on prosodic features like timing and pitch change in a dialog. Our dialog system has begun to speak with appropriate prosodic features considering previous user utterances. When the dialog gets lively, the pitch of the system utterances chase the user's pitch. On the other hand, we also study a semantic dialog strategy. We are now developing a robust and natural response generation method in a system that considers its own misunderstandings.
Selected publications and works
Norihide Kitaoka, Yuji Kinoshita, Sunao Hara, Chiyomi Miyajima, Kazuya Takeda, "A graph-based spoken dialog strategy utilizing multiple understanding hypotheses," Transactions of the Japanese Society for Artificial Intelligence, Vol.29, No.1, Jan, 2014
Sunao Hara, Norihide Kitaoka, Kazuya Takeda, "Field data collection of a distributed spoken dialog system for music retrieval and its evaluation," Global Engineering, Science, and Technology society International Transaction on Computer Science and Engineering, vol. 64, no. 1, pp. 33-58, May, 2011.
Norihide Kitaoka, Masashi Takeuchi, Ryota Nishimura, Seiichi Nakagawa, "Response timing detection using prosodic and linguistic information for human-friendly spoken dialog systems," Transactions of the Japanese Society for Artificial Intellignece, Vol.20, No.3 SP-E, pp. 220-228, Mar., 2005.
Human often uses gestures such as finger pointing and gaze to transmit his/her intention. We are trying to realize such interaction between human and machine.
Consider the operation of an autonomous vehicle. How do you let it know where you want to go and where you want to turn? It is useful if you can use finger pointing and gaze. We are developing an autonomous vehicle with such interface!
Title of class
Introduction to Data Structures