Asia-Pacific Track (Abstract and Bio)

Information Processing in Social Networks

Ming-Syan Chen, National Taiwan University, Taiwan

Abstract
In the current social network, a user may have hundreds of friends and find it very time consuming to categorize and tag every friend manually. When a user is going to initiate an activity by issuing a corresponding query, he/she needs to consider the relationship among candidate attendees to find a group of mutually close friends. Meanwhile, he/she also needs to consider the schedule of candidate attendees to find an activity period available for all attendees. It would certainly be desirable if the efficiency of such process is improved. In this talk, information processing in social networks will first be reviewed in three phrases, namely (i) from content to social relationship, (ii) mining on social relationship, and (iii) from social relationship to content organization. In addition, we shall present an effective procedure which helps a user to organize an event with proper attendees with minimum total social distance and commonly available time. Moreover, it is noted that the information retrieved from the social networks is also able to facilitate those user-dependent and human-centric services. In light of this, we shall explore the quality of recommendation through incorporating the notion of social filtering and collaborative filtering. Finally, it is recognized that the cloud computing has offered many new capabilities of storing and processing huge amounts of heterogeneous data in social networks. In view of this, we shall also examine how this paradigm shift will affect the information processing in social networks.

Bio
Ming-Syan Chen received the M.S. and Ph.D. degrees in Computer, Information and Control Engineering from The University of Michigan, Ann Arbor, MI, USA, in 1985 and 1988, respectively. He is now a Distinguished Research Fellow and the Director of Research Center of Information Technology Innovation (CITI) in the Academia Sinica, Taiwan, and is also a Distinguished Professor in the EE Department, National Taiwan University. He was a research staff member at IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA from 1988 to 1996, the Director of Graduate Institute of Comm. Eng. at NTU from 2003 to 2006, and also the President/CEO of Institute for Information Industry (III), from 2007 to 2008. His research interests include databases, data mining, and multimedia networking. Dr. Chen is a recipient of the Academic Award of the Ministry of Education, Taiwan, the NSC (National Science Council) Distinguished Research Award, Pan Wen Yuan Distinguished Research Award, Teco Award, Honorary Medal of Information, K.-T. Li Research Breakthrough Award, and IBM Outstanding Innovation Award. Dr. Chen is a Fellow of ACM and a Fellow of IEEE.

Experience with Discovering Knowledge by Acquiring it

Paul Compton, University of New South Wales, Australia

Abstract
Machines and people have complementary skills in Knowledge Discovery. Automated techniques can process enormous amounts of data to find new relationships, but generally these are represented by fairly simple models. On the other hand people are endlessly inventive in creating models to explain data at hand, but have problems developing consistent overall models to explain all the data that might occur in a domain; and the larger the model, the more difficult it becomes to maintain consistency. Ripple-Down Rules is a technique that has been developed to allow people to make real-time updates to a model whenever they notice some data that the model does not yet explain, while at the same time maintaining consistency. This allows an entire knowledge base to be built while it is already in use by making updates. There are now 100s of Ripple-Down-Rule knowledge bases in use and this paper presents some observations from log files tracking how people build these systems, and also outlines some recent research on how such techniques can be used to add greater specificity to the simpler models developed by automated techniques.

Bio
Paul Compton is an emeritus professor of Computer Science at the University of New South Wales, where he was head of the School of Computer Science and Engineering for 12 years. In the 1980s he was involved in the development of GARVAN-ES1, one of the first medical expert systems to go into routine clinical use. The maintenance problems with this, led to his interest in incremental knowledge acquisition and in particular the development of Ripple-Down Rules as a way of minimizing the maintenance effort. His research over the last 20 years has focused on extending this incremental approach to a range of different problems. Ripple-Down Rules have been commercialized by a number of companies with applications ranging from interpreting pathology reports (Pacific Knowledge Systems) to data cleansing for big data applications (IBM).

Developing Data Mining Applications

Geoff Holmes, University of Waikato, New Zealand

Abstract
In this talk I will review several real-world applications developed at the University of Waikato over the past 15 years. These include the use of near infrared spectroscopy coupled with data mining as an alternate laboratory technique for predicting compound concentrations in soil and plant samples, and the analysis of gas chromatography mass spectrometry (GCMS) data, a technique used to determine in environmental applications, for example, the petroleum content in soil and water samples. I will then briefly discuss how experience with these applications has led to the development of an open-source framework for application development.

Bio
Geoffrey Holmes is Dean of the Faculty of Computing and Mathematical Sciences at the University of Waikato, Hamilton, New Zealand. He received a Ph.D. in mathematics from Southampton University in 1986. Following a research position at the engineering Department, Cambridge University, UK, he joined the Computer Science Department at Waikato. His research interests include topics in machine learning, data stream mining, open-source machine learning and application development. He is a senior member of the Waikato machine learning group that have provided open-source solutions to the community such as WEKA and MOA. He has been actively involved in promoting open-source for machine learning and is an action editor for JMLR MLOSS. He has extensive experience of application development and the provision frameworks for data mining and data stream mining application development. Most recently, he has been working on instance-incremental methods for data streams across a wide range of tasks.

Building an Engine for Big Data

Masaru Kitsuregawa, University of Tokyo, Japan

Abstract
IT program in Japan to build powerful engine for big data was launched. Quite recently the initial version is commercialized. This presentation will give a brief overview of the project. Also some of the potential applications will be introduced.

Bio
Masaru Kitsuregawa is the recipient of the 2009 SIGMOD Edgar F. Codd Innovations Award for contributions to high-performance database technology. Kitsuregawa made major contributions to the development of hash-join algorithms, which significantly improved the performance of join operations in relational database systems. That work has influenced related research in areas such as query execution, plan optimization and dynamic query-workload balancing, as well as the development of commercial database products. He implemented the hash-based approach on a variety of platforms, including the Functional Disk System and multi-node PC clusters, demonstrating its substantial advantages through detailed evaluations. He has also applied hash-based strategies to parallel association mining and showed its effectiveness there. His contributions in the hardware area include a high-speed sorting system with a sophisticated memory management algorithm. That work was eventually commercialized in collaboration with colleagues, and won the Datamation sort benchmark in 2000.

Similarity Search in Real World Networks

Cuiping Li, Renmin University, China

Abstract
Recently there has been a lot of interest in graph-based analysis. One of the most important aspects of graph-based analysis is to measure similarity between nodes and to do similarity search in a graph. For example, in social networks such as Facebook, system may want to recommend potential friends to a particular user based on connections between users. In custom-product networks such as eBay, one may wish to recommend products to others based on purchases history. In this talk, I will introduce some methods on vertex similarities computations and their applications on similarity search in real world networks.

Bio
Dr. Cuiping Li is currently a Professor of the Information School, Renmin University of China. Her current research interests include profit-based data mining, large scale information network analysis, and data warehousing. She has published over 30 research papers in major international conferences and journals including SIGMOD, VLDB, and KDD.

Interaction and Collective Intelligence in Internet Computing

Deyi Li, Chinese Academy of Engineering, China

Abstract
Network interconnection, information interoperability, and crowds interaction on the Internet could inspire better computation models than Turing machine, since that human plays an important factor in Internet computing, so that the human-machine and machine-machine interactions have evolved to be the kernel of Internet computing. Internet has not been simply equivalent to a virtual huge computer, or a set of computers. On the Internet, human's behaviors are uncertain, the interactions and influence among people are also uncertain. These uncertainties cannot be described by Turing machine and traditional interaction machine. As a new computation platform, Internet computing requires new theories and methods. By combining topology in mathematics with the field theory in physics, we propose the topological potential approach, which set up a virtual field by the topological space to reflect individual activities, local effects and preferential attachment. This approach can be used to research the emergence of collective intelligence. Here, we introduce three case studies to illustrate the analysis on the collective intelligence on the Internet and discuss some potential applications of the topological potential approach.

Bio
Deyi Li is a member of the Chinese academy of engineering and the International Academy of Sciences for Europe and Asia. He is the Chairman of the Chinese Artificial Intelligence Society (http://caai.cn/). He is a professor of software engineering at Tsinghua University, China, and heads the Information Science Directorate of the Natural Science Foundation of China. His current research interests include networked data mining, artificial intelligence with uncertainty, and cloud computing.

Algorithms for Mining Uncertain Graph Data

Jianzhong Li, Harbin Institute of Technology, China

Abstract
With the rapid development of advanced data acquisition techniques such as high-throughput biological experiments and wireless sensor networks, large amount of graph-structured data, graph data for short, have been collected in a wide range of applications. Discovering knowledge from graph data has witnessed a number of applications and received a lot of research attentions. Recently, it is observed that uncertainties are inherent in the structures of some graph data. For example, protein-protein interaction (PPI) data can be represented as a graph, where vertices represent proteins, and edges represent PPI抯. Due to the limits of PPI detection methods, it is uncertain that a detected PPI exist in practice. Other examples of uncertain graph data include topologies of wireless sensor networks, social networks and so on. Managing and mining such large-scale uncertain graph data is of both theoretical and practical significance. Many solid works have been conducted on uncertain graph mining from the aspects of models, semantics, methodology and algorithms in last few years. A number of research papers on managing and mining uncertain graph data have been published in the database and data mining conferences such as VLDB, ICDE, KDD, CIKM and EDBT. This talk focuses on the data model, semantics, computational complexity and algorithms of uncertain graph mining. In the talk, some typical research work in the field of uncertain graph mining will also be introduced, including frequent subgraph pattern mining, dense subgraph detection, reliable subgraph discovery, and clustering on uncertain graph data.

Bio
Jianzhong Li is a professor and the chairman of the Department of Computer Science and Engineering at the Harbin Institute of Technology, China. He worked in the University of California at Berkeley as a visiting scholar in 1985. From 1986 to 1987 and from 1992 to 1993, he was a scientist in the Information Research Group in the Department of Computer Science at Lawrence Berkeley National Laboratory, USA. He was also a visiting professor at the University of Minnesota at Minneapolis, Minnesota, USA, from 1991 to 1992 and from 1998 to 1999. His current research interests include database systems, data intensive super-computing, and wireless sensor networks. He has published more than 200 papers in refereed journals and conference proceedings. He has been involved in the program committees of major computer science and technology conferences, including SIGMOD, VLDB, ICDE, INFOCOM, ICDCS, and WWW. He has also served on the editorial boards for distinguished journals, including Knowledge and Data Engineering, and refereed papers for varied journals and proceedings.

Cross-Media Knowledge Discovery

Zhongzhi Shi, Chinese Academy of Sciences, China

Abstract
In this talk I introduce cloud computing based cross-media knowledge discovery. We propose a framework for cross-media semantic understanding which contains discriminative modeling, generative modeling and cognitive modeling. In cognitive modeling a new model entitled CAM is proposed which is suitable for cross-media semantic understanding. We develop an agent-aid model for load balance in cloud computing environment. For quality of service we present a utility function to evaluate the cloud performance. A Cross-Media Intelligent Retrieval System (CMIRS), which is managed by ontology-based knowledge system KMSphere, will be illustrated. Finally, the directions for further researches on cloud computing based cross-media knowledge discovery will be pointed out and discussed.

Bio
Zhongzhi Shi is a professor at the Institute of Computing Technology, Chinese Academy of Sciences, leading the Intelligence Science Laboratory. His research interests include intelligence science, image processing and cognitive computing, machine learning, multi-agent systems, semantic Web and service computing. Professor Shi has published 14 monographs, 15 books and more than 450 research papers in journals and conferences. He has won a 2nd-Grade National Award at Science and Technology Progress of China in 2002, two 2nd-Grade Awards at Science and Technology Progress of the Chinese Academy of Sciences in 1998 and 2001, respectively. He is a senior member of IEEE, member of AAAI and ACM, Chair for the WG 12.2 of IFIP. He serves as Chief Editor of Series on Intelligence Science. He has served as Vice President for Chinese Association of Artificial Intelligence.

Understanding Users' Satisfaction for Search Engine Evaluation

Gordon Sun, Tencent Technology, China

Abstract
To fulfill users' search needs, the search engine must have good performance, easy-to-use functionalities, and good search result quality. Search quality evaluation becomes challenging when users' satisfaction may not be able to judge by a single search and even within a single search judgments from various sources are not consistent. In this talk, I will discuss how user's satisfaction is decomposed into different components in general, and how we measure them with various means - human judgment, automatic computation with query log, and outsourcing, and their pros and cons with operational implications. For an outlook, I will postulate potential evaluation approaches for a better user's satisfaction.

Bio
Gordon Sun has been working on algorithmic search technology since 1998 when he joined Inktomi (the leading US search engine company during 1990s) as the senior scientist and architect. He also worked for two search engine companies, WiseNut and LookSmart as the director of R&D during 2001 to 2003 before he joined Yahoo early 2004 where he was leading the Global Search Relevance team as the director of research until 2009. Gordon graduated from University of Science and Technology of China and went to the US through the CUSPEA program (sponsored by Nobel Prize Winner Prof. T.D. Lee ) in 1981 and received his Ph.D. in theoretical physics from University of Iowa, 1984. He worked as the Research faculty in University of Maryland for 9 years before he moved to Silicon Valley, 1993, and worked in Communication Intelligence Inc, a leading hand-writing recognition provider, as the Chief Scientist. He has broad work experiences, knowledge and publications in neural networks, pattern recognition, machine learning, data mining, information retrieval, speech recognition, hand-writing recognition and non-linear dynamics.

Bayesian Relational Data Analysis

Naonori Ueda, NTT Communication Lab, Japan

Abstract
Recently there have been many collections of relational data in diverse areas such as the internet, social networks, customer shopping records, bioinformatics, etc. The main goal of the relational data analysis is to discover latent structure from the data. The conventional data mining algorithms based on exhaustive enumeration have an inherent limitation for this purpose because of the combinatorial nature of the methods. In contrast, in machine learning a lot of statistical models have been proposed for the relational data analysis. In this talk, first I will review the statistical approach, especially Bayesian approach, for the relational data analysis with recent advancements in machine learning literature. Then, as a future research I will also talk about a statistical approach for combining multiple relational data.

Bio
Naonori Ueda received the B.S., M.S., and Ph D degrees in Communication Engineering from Osaka University, Osaka, Japan, in 1982, 1984, and 1992, respectively. In 1984, he joined the Electrical Communication Laboratories, NTT, Japan, where he was engaged in research on image processing, pattern recognition, and computer vision. In 1991, he joined the NTT Communication Science Laboratories, where he has invented a significant learning principle for optimal vector quantizer design and has developed some novel learning algorithms including deterministic annealing EM (DAEM) algorithm, ensemble learning, the split and merge EM (SMEM) algorithm, semi-supervised learning, variational Bayesian model search algorithm for mixture models and its application to speech recognition, and probabilistic generative models for multi-labeled text in WWW. His current research interests include parametric and non-parametric Bayesian approach to machine learning, pattern recognition, data mining, signal processing, and cyber-physical systems. From 1993 to 1994, he was a visiting scholar at Purdue University, West Lafayette, USA. Currently, he is a director of NTT Communication Science Laboratories. He is an associate editor of Neurocomputing and Journal of Neural Networks, and is a member of the Institute of Electronics, Information, and Communication Engineers (IEICE), and IEEE.

A New Challenge of Information Processing under the 21 Century

Bo Zhang, Tsinghua University, China

Abstract
In web era, we are confronted with a huge amount of raw data and a tremendous change of man-machine interaction modes. We have to deal with the content (semantics) of data rather than their form alone. Traditional information processing approaches face a new challenge since they cannot deal with the semantic meaning or content of information. But humans can handle such a problem easily. So it's needed a new information processing strategy that correlated with the content of information by learning some mechanisms from human beings. Therefore, we need (1) a set of robust detectors for detecting semantically meaningful features such as boundaries, shapes, etc. in images, words, sentences, etc. in text, and (2) a set of methods that can effectively analyze and exploit the information structures that encode the content of information. During the past 40 years the probability theory has made a great progress. It has provided a set of mathematical tools for representing and analyzing information structures. In the talk we will discuss what difficulty we face, what we can do, and how we should do in the content-based information processing.

Bio
Bo Zhang is now a professor of Computer Science and Technology Department of Tsinghua University, the fellow of Chinese Academy of Sciences. In 1958 he graduated from Automatic Control Department of Tsinghua University, and became a faculty member since then. From 1980/02 to 1982/02 he visited University of Illinois at Urbana-Champaign, USA as a scholar. He is now the chairman of steering committee of Research Institute of Information Technology, Tsinghua University, the technical advisor of Fujian government, and the member of Technical Advisory Board of Microsoft Research Asia. He was the founding director of the State Key Lab of Intelligent Technology and Systems from 1991 to 1996. From 1987 to 1994 he served as a member of specialist group of Intelligent Robots theme of National ?63?High-Tech Program. He won an ICL European Artificial Intelligence Prize, a third award of National Natural Science Prize, a third award of National Science and Technology Progress Prize, a first and second award of Science and Technology Progress Prize from the State Educational Commission, a first award of Science & Technology Progress Prize from Electronic Industry Ministry and a first award of Science & Technology Progress Prize from Committee of National Defense.

Social Media Data Analysis for Revealing Collective Behaviors

Aoying Zhou, Eastern Normal University, China

Abstract
Along with the development of Web 2.0 applications, social media services has attracted many users and become their hands-on toolkits for recording life, sharing idea, and social networking. Though social media services are essentially web or mobile applications and services, they combine user-generated content and social networks together, so that information can be created, transmitted, transformed, and consumed in the cyberspace. Thus, social media data is a kind of sensor to the real-life of users. Social media data are usually of low quality. Pieces of information in social media are usually short, with informal presentation, and in some specific context that is highly related to the physical world. Therefore, it is challenging to extract semantics from social media data. However, we argue that given sufficient social media data, collective user behaviors can be sensed, studied, and even predicted in a certain circumstance. Our study is conducted on data from two services, i.e. Twitter, and Sina Weibo, the most popular microblogging services all over the world and in China, respectively. Collective behaviors are actions of a large amount of various people, which are neither conforming nor deviant. Various collective behaviors are studied in the context of social media. Our studies show that there are various information flow patterns in social media, some of which are similar to traditional media such as newspapers, while others are embedded deep in the social network structure. The evolution of hotspots is highly affected by external stimulation, the social network structure, and individual users' activities. Furthermore, social media tends to be immune to some repeated similar external stimulations. Last but not the least, there is considerable the difference in user behavior between Twitter and Sina Weibo.

Bio
Aoying Zhou, Professor and deputy dean of School of Software Engineering at East China Normal University (ECNU), where he directs the Institute of Massive Computing. He got his master and bachelor degree in Computer Science from Sichuan University, Chengdu, in 1988 and 1985 respectively, and he won his Ph.D. degree from Fudan University in 1993. Before joining ECNU in 2008, Aoying worked for Fudan University in the Computer Science Department from 1993 to 2007, where he was the department chair from 1999 to 2002. He worked as a visiting scholar in the Berkeley Scholar Program at UC Berkeley in 2005. He is the winner of the National Science Fund for Distinguished Young Scholars supported by NSFC and the professorship appointment under Cheung Kong Scholars Program. He is the vice-director of ACM SIGMOD China and Database Technology Committee of China Computer Federation. He is serving as a member of the editorial boards of prestigious journals, such as VLDB Journal, WWW Journal. His research interests include Web data management, data management for data-intensive computing, management of uncertain data, data mining and data streams, distributed storage and P2P computing.