Keynote (Jiawei Han)

Mining Heterogeneous Information Networks: The Next Frontier

Jiawei Han
Abel Bliss Professor
Department of Computer Science
University of Illinois at Urbana-Champaign

Real world physical and abstract data objects are interconnected, forming gigantic, interconnected networks. By structuring these data objects into multiple types, such networks become semi-structured heterogeneous information networks. Most real world applications that handle big data, including interconnected social media and social networks, scientific, engineering, or medical information systems, online e-commerce systems, and most database systems, can be structured into heterogeneous information networks. For example, in a medical care network, objects of multiple types, such as patients, doctors, diseases, medication, and links such as visits, diagnosis, and treatments are intertwined together, providing rich information and forming heterogeneous information networks. Effective analysis of large-scale heterogeneous information networks poses an interesting but critical challenge.

In this talk, we present a set of data mining scenarios in heterogeneous information networks and show that mining heterogeneous information networks is a new and promising research frontier in data mining research. Departing from many existing network models that view data as homogeneous graphs or networks, the semi-structured heterogeneous information network model leverages the rich semantics of typed nodes and links in a network and can un-cover surprisingly rich knowledge from interconnected data. This heterogeneous network modeling will lead to the discovery of a set of new principles and methodologies for mining interconnected data. The examples to be used in this discussion include (1) meta path-based similarity search, (2) rank-based clustering, (3) rank-based classication, (4) meta path-based link/relationship prediction, (5) relation strength-aware mining, as well as a few other recent developments. We will also point out some promising research directions and provide convincing arguments on that mining heterogeneous information networks is the next frontier in data mining.