Font Size

HOME > No.10, Sep 2017 > Feature Story :Mastering Data Science - semantic understanding of multifarious big data and its social implementation

Mastering Data Science - semantic understanding of multifarious big data and its social implementation

Masaki Aono

Nowadays it is common for large amounts of data to being stored and used on the internet via SNS and other online data repositories. Data science technology, including information retrieval and data mining (which allows you to search and extract latent but potentially important pieces of information from the data), is vital in an era flooded by so-called "big data". Professor Masaki Aono has been a leading figure in knowledge data engineering research since the 1980’s, and it is his goal to keep conducting challenging research in the area of data science. Professor Aono started with 3D shape retrieval, gradually incorporating various popular state-of-the-art technologies ranging from multivariate data analysis and traditional machine learning to deep learning.

Interview and report by Madoka Tainaka

World Champions in a Plant Identifying Contest

In 2016, a research team led by Professor Masaki Aono won first place for identification accuracy in an international contest called “PlantCLEF2016”. PlantCLEF2016 is a competition where participants use image processing technology to automatically classify 1,000 plants from photographic images.

Figure 1.
Fig.1 Guess each plant’s name from given a bunch of the crowdsourced photos.
CREDIT: Copyright © PlantCLEF 2016

Professor Aono tells us "actually our research team has been participating in the international image annotation competition, ImageCLEF since 2012. The focus of this year’s PlantCLEF contest was on plants. You had to guess each plant’s name based on some crowdsourced photos, but it’s quite a difficult task. For example, some photos include a human hand or a tripod, meaning that the data given to participants naturally includes "noise" completely unrelated to plants. Furthermore, the photos could be closeups, very tiny or even out of focus. I guess the reason we could come home with the first prize is that we were able to come up with a way of extracting the ‘features’ from metadata which helped us to gain more correct answers in spite of the formidable task."

As the amount of data used in the contests is increasing, the research lab has been developing their hand-engineered GPU (Graphics Processing Unit) machines every year, and this time they incorporated deep learning to exploit the full capacity of the GPU machines. In fact, deep learning is a technology that has been gaining attention in the field of AI.

However, the victory by Professor Aono’s team does not only come down to deep learning.
"In 2014, we also won the first prize at the ImageCLEF I mentioned earlier, but that time we used ontology and traditional machine learning rather than deep learning. Ontology is a hierarchical system that explicitly expresses a concept and systematically describes the relationship between the concepts. In other words, it is a vocabulary set that the computer uses to successfully orchestrate texts and images. By taking advantage of ontology, we managed to achieve a high level of accurate annotation to images, while other teams using deep learning could not." the professor explains.

The Key is How to Combine Features

Professor Aono says that their team’s strength lies in their ‘feature combination technology’. A feature is a numerical expression of the characteristics or attributes observed in the target data. Even in ontology, it is crucial to configure the feature correctly. Up until now, the feature was extracted manually, which is called “hand-crafted features”, but with the increase of the amount of data the team began to adopt deep learning to complement the proposed features.

"We came up with a new method for deep learning as well. Out of the 100,000 photos in the training data, we gave the computer various types of images – images that were out of focus, images that were zoomed in on different parts – so that it would be able to deal with any kind of images. We also incorporated the date and time stamps that were added to the data, which were allowed to be used as metadata. From this we could infer that images taken at almost the same time were most likely to be of the same plant. In the contest in 2016, somehow no other teams tried to use this metadata. In fact, ironically for the other teams, it was the use of this metadata that improved our performance and made it possible for us to become world champions."

"Regarding an underlying technology which aids in the extraction of features in deep learning called convolution, the team came up with some new ideas to change the internal structure, which gave them another advantage over the other teams. The credit for this goes to a clever exchange (2nd year masters) student from Malaysia. The results were even featured in Springer’s international journal ‘Multimedia Tools and Applications’."

Figure 2.
Fig.2 Proposed deep learning approach of Convolutional Neural Network (CNN) and Residual Network (ResNet)

Setting yourself apart is an important task in the competitive world of data engineering processing research. You especially cannot hope to compete with big companies such as Google, Facebook and other IT companies that have access to enormous amounts of data and resources for their research development. This is the reason Professor Aono first chose to work with the 3D shapes retrieval which, as a less popular research domain has not been substantially focused on by big companies but is yet manageable in terms of difficulty.

"Three-dimensional CAD (Computer-Aided Design) is being used in a variety of situations, from the design of machine parts to architectural design, to CG creation, but currently time and skill are two crucial elements in creating the underlying model for the geometric shapes. We developed technology that uses an existing database of 3D shapes to find models that are similar to 2D images or 3D shapes acquired by using commercially available motion sensors such as Kinect or sketches with high precision. If you can search for similar models, there is no need to configure the model from scratch which improves the efficiency and means that even novices can design complicated parts using CAD."

This system’s retrieval is currently the world’s most accurate (amongst the methods of searching similar 3D shapes with no prior supervised learning) and Professor Aono tells us he is in the middle of applying for several new patents, for some of which the technology is being transferred to companies that collaborated in the research.

I want to put my research results to good use in the real world, such as self-driving cars and agriculture.

In addition, Professor Aono is also working on research that gathers chronological data that can be collected from different types of sensors, and predicts what may happen by learning in advance.

Figure 3.

"For example, I am conducting research into controlling the environment of a greenhouse to maximize the amount of vegetables and fruits that can be harvested by equipping the greenhouse with sensors that measure humidity, moisture, sunlight, CO2 etc., then wirelessly collecting and analyzing the data. In the past, I conducted demonstration experiments on greenhouses used to suggest the best control over greenhouse environment in order to maximize the yield of tomatoes."

In regard to the processing of chronological data, the professor is also pursuing research into BCI (Brain Computer Interface), using data from the human brain to predict a person’s psychological state or to allow for communication with a computer simply by thinking. He is also beginning research that uses LSTM (Long Short-Term Memory), a type of deep learning, where both long and short-term chronological data can be handled. This technique can be used, for example, to guess an author of a book from a piece of writing, or automatically create a piece of writing in the style of a specific author.

On the other hand, his research on scene graphs for automatically adding notes to an image is also interesting. A scene graph is a method of displaying the relationship either between two objects or between an object and the attribute, with a so-called 'directed graph structure' (a graph composed of nodes and directed edges).
"For example, if we can automatically extract objects in an image such as roads, trees, cars, the sky etc. and add notes about the relationship, it would become possible for the computer to answer questions such as what kind of trees there are, whether it’s sunny, if there are any obstructions in front of us, etc.

Figure 3.
Fig.3 Example of a scene graph:. a red-green-red path indicates the relationship between two nodes, where the type of the relation is represented by the text in green node. A red-blue path indicates the relationship between an object and the attribute.

If we continue to progress with this research, we could use it in car navigation systems to give commands such as ‘turn right at the brick building in front of you’. If we could add notes to the constantly changing scenery, I think it may help develop self-driving cars in the future,” says Professor Aono.

"I’m troubled by the amount of things I want to do," adds the professor. His research includes a number of different topics, all of which are building blocks critical to the development of increasingly influential field of AI. No doubt the Professor will continue to contribute actively to this effort in the future.

Reporter's Note

Professor Aono started working at IBM after graduating from Tokyo University with a Master’s degree in CG and CAD. From the late 1990’s he started looking at information retrieval. He became a professor at Toyohashi University of Technology in 2003, and accelerated his research. “I want to construct a system that can correctly identify things even better than a human specialist.” That is the motivation behind his research.

The Professor feels that he gets his love of nature, including an obsession with classification, from his father, who was the former curator of the Kurashiki Museum of National History in Okayama Prefecture. On his days off he enjoys taking photos of flowers and birds. Recently he succeeded in taking a photo of a red-flanked bluetail, a photo he had been chasing after for a long time. "One day I would like to try and use data processing to identify birds and insects." The professor has an endless curiosity. Professor Aono’s ‘search for a bluebird’ continues inside his research.

Share this story

Researcher Profile

Dr. Dr. Taiki Saito

Dr. Masaki Aono


Dr. Masaki Aono received the BS and MS degrees from the Department of Information Science from the University of Tokyo, the PhD degree from the Department of Computer Science at Rensselaer Polytechnic Institute, New York. He was with the IBM Tokyo Research Laboratory from 1984 to 2003. In 2002 and 2003, he was a visiting teacher at Waseda University. He is currently a professor at the Graduate School of Computer Science and Engineering Department, Toyohashi University of Technology.

Reporter Profile

Madoka Tainaka

Madoka Tainaka is a freelance editor, writer and interpreter. She graduated in Law from Chuo University, Japan. She served as a chief editor of “Nature Interface” magazine, a committee for the promotion of Information and Science Technology at MEXT (Ministry of Education, Culture, Sports, Science and Technology).

ページトップへ