Font Size

HOME > No.10, Sep 2017 > Human Pose Estimation for Care Robots Using Deep Learning

Human Pose Estimation for Care Robots Using Deep Learning

Efficient generation method of big data for pose estimation By Jun Miura
Associate Professor Ryoji Inada (left) with his students

A research group led by Professor Jun Miura has developed a method to estimate various poses using deep learning with depth data alone. Although it requires a large volume of data, the group has realized a technology which efficiently generates data using computer graphics and motion capture technologies. This data is freely available, and expected to contribute to the progress of research across a wide range of related fields.

Given the backdrop of declining birthrates, an aging population, and a lack of nursing or care staff, there is an increasing expectation that care robots will be required to meet society’s needs. It is anticipated, for example, that robots will be used to check the condition of the residents while patrolling nursing homes and other such facilities. When evaluating a person’s condition, while an initial estimation of the pose (standing, sitting, fallen, etc.) is useful, most methods to date have utilized images. These methods face challenges such as privacy issues, and difficulties concerning application within darkly lit spaces. As such, the research group (Kaichiro Nishi, a 2016 master's program graduate, and Professor Miura) has developed a method of pose recognition using depth data alone (Fig.1).

Figure 1.
Fig.1 Example of pose estimation using depth data:
Left: Experiment scene (this image is not used for estimation), Center: Depth data corresponding to the extracted person region, Right: Estimation result (the colors correspond to each part of the body

For poses such as upright positions and sitting positions, where body parts are able to be recognized relatively easily, methods and instruments which can estimate poses with high precision are available. In the case of care, however, it is necessary to recognize various poses, such as a recumbent position (the state of lying down) and a crouching position, which has posed a challenge up until now. Along with the recent progress of deep learning (a technique using a multistage neural network), the development of a method to estimate complex poses using images is advancing. Although deep learning requires preparation of a large amount of training data, in the case of image data, it is relatively easy for a person to see each part in an image and identify it, with some datasets also having been made open to the public. In the case of depth data, however, it is difficult to see the boundaries of parts, making it difficult to generate training data.

Figure 2.
Fig.2 Procedure of generating training data

As such, this research has established a method to generate a large amount of training data by combining computer graphics (CG) technology and motion capture technology (Fig. 2). This method first creates CG data of various body shapes. Next, it adds to the data information of each part (11 parts including a head part, a torso part, and a right upper arm part), and skeleton information including each joint position. This makes it possible to make CG models take arbitrary poses simply by giving the joint angles using a motion capture system. Fig.3 shows an example of generating data for various sitting poses.

Figure 2.
Fig.3 Example of generating training data for various sitting positions. First row: body part label images, Second row: depth data

By using this developed method, training data can be generated corresponding to a combination of persons with arbitrary body shapes, and arbitrary poses. So far, we have created and released a total of about 100,000 pieces of data, both for sitting positions (with/without occlusions), and for several poses in a recumbent positions. This data is freely available for research purposes ( In the future, we will release human models and detailed procedures for data generation so that everyone can make data easily by using them. We hope that this will contribute to the progress of the related fields.

The result of this research was published in Pattern Recognition on June 3, 2017.

This research was partially supported by JSPS Kakenhi (Grants-in-Aid for Scientific Research) No. 25280093.


Kaichiro Nishi and Jun Miura (2017). Generation of human depth images with body part labels for complex human pose recognition, Pattern Recognition,



豊橋技術科学大学の情報・知能工学系 三浦純教授らの研究グループは、人物のさまざまな姿勢を距離データのみから深層学習を用いて推定する方法を開発しました。深層学習では大量の学習データを必要としますが、コンピュータグラフィクス技術とモーションキャプチャ技術を用いて効率的にデータを生成する技術を実現しています。作成したデータは一般に公開しており、広く関連分野の研究の進展に資することを期待しています。

少子高齢化や介護人材不足などを背景に、ロボットによる見守りへの期待が高まっています。例えば、介護施設等での見守りでは、ロボットが施設内を巡回しながら入居者の状態を見て回ることが想定されます。人の状態を知るためには、まず姿勢の推定(立っている、座っている、倒れている等)が有用ですが、これまでの方法は画像を用いるものがほとんどでした。しかし、画像を用いる方法ではプライバシーの問題や暗いところでは適用が難しい、といった課題があります。そこで、研究グループ(平成28年度博士前期課程修了生 西佳一郎および三浦教授)は、距離データのみを用いて姿勢認識を行う手法を開発しました(図1)。




本研究成果は、平成29年6月3日にPattern Recognition誌上に掲載されました。


Share this story

Researcher Profile

Jun Miura
Name Jun Miura
Affiliation Department of Computer Science and Engineering
Title Professor
Fields of Research Intelligent Robotics / Robot Vision / Artificial Intelligence