Industry Practice Expo

Industry Practice Expo track will comprise of technical invited talks and panel discussions / debates by leading experts in the world of applied data mining and knowledge discovery. The expo will feature highly influential speakers who have directly contributed to successful data mining applications in their respective fields. The talks and discussions will focus on innovative and leading-edge, large-scale industry or government applications of data mining in areas such as finance, health-care, bio-informatics, public policy, infrastructure (transportation, utilities, etc.), telecommunications, social media, and computational advertising. (IPE in KDD 2011)

The objective of the Industry Practice Expo track is to bring together leading industry and government practitioners to share their insights and experiences will inspire the KDD community and spread awareness of the variety of seminal, innovative, and proven applications of data mining and knowledge discovery in the industry and government. This track will complement the already established Industry and Government track at KDD that focuses on peer reviewed publications.

Confirmed Speakers

Aug. 13th (Monday), Venue: China National Convention Center (CNCC, map)

15:00-16:20, 309B, Chair: Ramasamy Uthurusamy

Yong Shi: China's National Personal Credit Scoring System: A Real-Life Intelligent Knowledge Application
Rich Holada: Maximizing Return and Minimizing Cost with the Right Decision Management Systems, (Slides)

16:40-18:00, 309B, Chair: Michael Zeller

Wei-Ying Ma: Semantic Search and a New Moore's Law Effect in Knowledge Engineering, (Slides)
Christian Posse: Key Lessons Learned Building Recommender Systems for Large-Scale Social Networks, (Slides)

Aug. 14th (Tuesday), Venue: China National Convention Center (CNCC, map)

15:00-16:20, 309B, Chair: Chid Apte

Graham Williams: Ensembles and Model Delivery for Tax Compliance, (Slides)
Seymour Douglas: Leveraging Predictive Modeling to Reduce Signal Theft in a Multi-Service Organization Environment

16:40-18:00, 309B, Chair: Rajesh Parekh

Chih-Jen Lin: Experiences and Lessons in Developing Industry-Strength Machine Learning and Data Mining Software, (Slides)
Bharat R Rao: Leveraging Data Mining to improve healthcare

TITLE: China's National Personal Credit Scoring System: A Real-Life Intelligent Knowledge Application
SPEAKER: Yong Shi, Executive Deputy Director, Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences/Strategic Advisor, China Data Technology (Suzhou) Co., Ltd

ABSTRACT: Credit Reference Centre (CRC) of People’s Bank of China (PBC) has built a big data: the largest personal credit database in the world with 800 million people’s accounts collected from all commercial banks in China since 2003. From June 2006 to Sept 2009, Research Centre on Fictitious Economy and Data Science, Chinese Academy of Sciences (CASFEDS) and CRC jointly developed China’s National Personal Credit Scoring System, known as “China Score”, which is a unique and advanced KDD application under intelligent knowledge management on this big data. The system will be eventually serving all 1.3 billion population of China for their daily financial activities, such as bank accounts, credit card application, mortgage, personal loans, etc. It can become one of the most influential events of KDD techniques to human kind. This talk will introduce the key components of China Score project that includes objectives, modeling process, KDD techniques used in the projects, intelligent knowledge management and experience of the project development. In addition, the talk will also outline a number of policy recommendations based on China Score project which has been potentially impacting Chinese Government on its strategic decision making for China’s economic developments.

Dr Yong Shi a Senior Member of IEEE, serves as the Executive Deputy Director, Chinese Academy of Sciences Research Center on Fictitious Economy & Data Science, China, since 2007. He has been the Charles W. and Margre H. Durham Distinguished Professor of Information Technology, College of Information Science and Technology, Peter Kiewit Institute, University of Nebraska, USA from 1999 to 2004. Dr. Shi's research interests include business intelligence, data mining, and multiple criteria decision making. He has published more than 17 books, over 200 papers in various journals and numerous conferences/proceedings papers. He is the Editor-in-Chief of International Journal of Information Technology and Decision Making (SCI), and a member of Editorial Board for a number of academic journals. Dr. Shi has received many distinguished awards including the Georg Cantor Award of the International Society on Multiple Criteria Decision Making (MCDM), 2009; Fudan Prize of Distinguished Contribution in Management, Fudan Premium Fund of Management, China, 2009; Outstanding Young Scientist Award, National Natural Science Foundation of China, 2001; and Speaker of Distinguished Visitors Program (DVP) for 1997-2000, IEEE Computer Society. He has consulted or worked on business projects for a number of international companies in KDD and intelligent knowledge management.

TITLE: Maximizing Return and Minimizing Cost with the Right Decision Management Systems
SPEAKER: Rich Holada, Vice President, Predictive Analytics, IBM Software Group

ABSTRACT: The ability to achieve operational efficiency, product leadership and customer intimacy still eludes many organizations due, in large part, to the chaos of business. Inconsistent prioritization and decision making; poor visibility between systems; processes that are not well controlled; and individual front-line decisions that seem small but, in totality, have a huge impact make it difficult for organizations to link strategy to execution and back. During this presentation, we will demonstrate how automating and optimizing decisions (operational efficiency) with business rules and predictive models enables better data driven results across the enterprise, and how this is implemented at the point of impact (customer intimacy) to transform an organization and support market leadership.

Rich Holada heads IBM’s predictive analytics group, which includes the SPSS line of software and solutions. Holada is responsible for setting the strategic and technological direction of the group, as well as overseeing its day-to-day operations, from global product development to sales and marketing. With Business Analytics as a key growth area for IBM, Holada is focused on helping customers solve real-world business problems through predictive analytics to create a Smarter Planet.

Previously, Holada was CTO of predictive analytics that included global responsibility for product marketing, product strategy, research and development, and technical support for the IBM SPSS predictive analytics brand. He joined SPSS, prior to its acquisition by IBM, in November 2006 as Senior Vice President of Research & Development, bringing with him nearly 20 years of deep software research and development background and diverse experience in the technology industry.

Previously, Holada has held senior level posts with Oracle Corporation and PeopleSoft, Inc., both leading CRM technology organizations. At PeopleSoft, Inc. he was the senior technical executive responsible for creating the industry-leading PeopleSoft CRM product offering.

Holada has also held senior research and development positions at Trimark Technologies, Inc., where he transformed the firm from a consulting operation to a software product company, as well as at Intelligent Trading Systems, Inc. and Sun Microsystems, Inc.

Holada received his B.S. in Computer Science from University of Illinois, Champaign, and his Juris Doctorate from John Marshall Law School in Chicago, graduating cum laude with honors.

TITLE: Semantic Search and a New Moore's Law Effect in Knowledge Engineering
SPEAKER: Wei-Ying Ma, Microsoft Research Asia

ABSTRACT: In history, Moore’s law has been used to describe the phenomenon of exponential improvement in technology - a virtuous cycle that makes technology improvement proportional to technology itself. For example, chip performance had doubled every 18-24 months because better processors support the development of better layout tools, which in turn support the development of even better processors. In this talk, I will describe a new Moore’s law that is being created in knowledge engineering and it is driven by the self-reinforcing nature of three trends and technical advancements: big data, machine learning, and crowdsourcing. I will explain how we can take advantage of this new effect to develop a new generation of semantic and knowledge-based search engines. Specifically, my presentation will cover three areas. The first area is knowledge acquisition, in which our goal is to build large and comprehensive entity and knowledge graphs to complement the web and social graphs. In support of this goal, I will introduce techniques for entity extraction and knowledgebase construction through interactive mining and crowdsourcing. The second area is knowledge management, in which our goal is to support advanced analytical queries by combining probabilistic knowledge with a distributed platform. The final area is knowledge-empowered search and applications, in which our goal is to use the knowledge we have acquired and curated to enable a wealth of applications. As the culmination of this work, I will show how we are now able to understand search queries, enable new entity-centric search experiences, and provide direct answers to natural language queries.

Dr. Wei-Ying Ma is an Assistant Managing Director at Microsoft Research Asia where he oversees multiple research groups including Web Search and Mining, Natural Language Computing, Data Management and Analytics, and Internet Economics and Computational Advertising. He and his team of researchers have developed many key technologies that have been transferred to Microsoft’s Online Services Division including Bing Search Engine and Microsoft Advertising. He has published more than 250 papers at international conferences and journals. He is a Fellow of the IEEE and a Distinguished Scientist of the ACM. He currently serves on the editorial boards of ACM Transactions on Information System (TOIS) and ACM/Springer Multimedia Systems Journal. He is a member of International World Wide Web (WWW) Conferences Steering Committee. In recent years, he served as program co-chair of WWW 2008, program co-chair of Pacific Rim Conference on Multimedia (PCM) 2007, general co-chair of Asia Information Retrieval Symposium (AIRS) 2008, and the general co-chair of ACM SIGIR 2011. Before joining Microsoft in 2001, Wei-Ying was with Hewlett-Packard Labs in Palo Alto, California where he worked in the fields of multimedia content analysis and adaptation. From 1994 to 1997, he was engaged in the Alexandria Digital Library project at the University of California, Santa Barbara. He received a bachelor of science in electrical engineering from the National Tsing Hua University in Taiwan in 1990. He earned a Master of Science degree and doctorate in electrical and computer engineering from the University of California at Santa Barbara in 1994 and 1997, respectively.

TITLE: Key Lessons Learned Building Recommender Systems for Large-Scale Social Networks
SPEAKER: Christian Posse, LinkedIn

ABSTRACT: By helping members to connect, discover and share relevant content or find a new career opportunity, recommender systems have become a critical component of user growth and engagement for social networks. The multidimensional nature of engagement and diversity of members on large-scale social networks have generated new infrastructure and modeling challenges and opportunities in the development, deployment and operation of recommender systems.
This presentation will address some of these issues, focusing on the modeling side for which new research is much needed while describing a recommendation platform that enables real-time recommendation updates at scale as well as batch computations, and cross-leverage between different product recommendations. Topics covered on the modeling side will include optimizing for multiple competing objectives, solving contradicting business goals, modeling user intent and interest to maximize placement and timeliness of the recommendations, utility metrics beyond CTR that leverage both real-time tracking of explicit and implicit user feedback, gathering training data for new product recommendations, virility preserving online testing and virtual profiling.

Dr. Christian Posse is Principal Scientist at LinkedIn Inc. where he leads the development of recommendation solutions as well as the next generation online experimentation platform. Prior to LinkedIn, Dr. Posse was a founding member and technology lead of Cisco Systems Inc. Network Collaboration Business Unit where he designed the search and advanced social analytics of Pulse, Cisco’s network-based search and collaboration platform for the enterprise. Prior to Cisco, Dr. Posse worked in a wide range of environments, from holding faculty positions in US universities, to leading the R&D at software companies and a US National Laboratory in the social networks, biological networks and behavioral analytics fields. His interests are diverse and include predictive analytics, search and recommendation engines, social networks analytics, computational social and behavioral sciences, computational linguistics, and information fusion. He has written over 40 scientific peer-reviewed publications and holds several patents in those fields. Dr. Posse has a PhD in Statistics from the Swiss Federal Institute of Technology, Switzerland.

TITLE: Ensembles and Model Delivery for Tax Compliance
SPEAKER: Graham Williams, Director of Data Mining, Australian Taxation Office

ABSTRACT: Revenue authorities characteristically have a large store of historic audit data, with outcomes, ready for analysis. The Australian Taxation Office established one of the largest data mining teams in Australia in 2004 as a foundation to becoming a knowledge-based organisation. Today, every tax return lodged in Australia is risk assessed by one or more models developed through data mining, generally based on historic data. We observe that any of the traditional modelling approaches, particularly including random forests, generally deliver similar models in terms of accuracy. We take advantage of combining different model types and modelling approaches for risk scoring, and in particular report on recent research that increases the diversity of trees that make up a random forest. We also review, in a practical context, how such models are evaluated and delivered.

Dr Graham Williams is Director of Data Mining at the Australian Taxation Office, and previously Principal Computer Scientist for Data Mining with CSIRO. He is a Senior International Expert and Visiting Professor of the Chinese Academy of Sciences at the Shenzhen Institutes of Advanced Technologies, and Adjunct Professor in Data Mining, Fraud Prevention, Security, at the University of Canberra and Australian National University. Graham is an active machine learning researcher and regularly teaches data mining courses. He is author of the freely available Rattle software for data mining and of the Rattle book published by Springer in 2011: Data Mining with Rattle and R: The Art of Excavating Knowledge from Data. Graham has been involved in data mining projects for clients from government and industry for over 25 years. His research developments include ensemble learning (1988) and hot spots discovery (1997). He is involved in numerous international artificial intelligence and data mining research activities and conferences and has edited a number of books and has authored many academic and industry papers. He is chair of the Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD), and the Australasian Conference on Data Mining (AusDM). His passion is in ensuring data mining technology is readily accessible and available to all who wish, supporting innovation and the sharing of our knowledge widely.

TITLE: Leveraging Predictive Modeling to Reduce Signal Theft in a Multi-Service Organization Environment
SPEAKER: Seymour Douglas, Director, Analytic & Forensic Technology, Deloitte Financial Advisory Services LLP

ABSTRACT: Signal theft can be defined as the interdiction, consumption or usage of carrier signal from a provider’s network without payment or payment of an amount less than the level of service consumed. High levels of signal theft can potentially reflect open technical network issues, failure of electronic countermeasures or operational gaps that are estimated to cost the cable industry providers more than$5 billion annually. This session will discuss the business challenges associated with the quantification of signal theft-related losses, outline some of the countermeasures taken by MSOs, and then provide views on the development of predictive models to help identify the potential likelihood of signal theft in a given environment. We will examine the performance of certain machine learning algorithms as well as data challenges associated with both the architecture construction and analytical efforts, and conclude with a lessons-learned discussion and views on future approaches.

Seymour Douglas is a Director in the Data Analytics group of the Analytic & Forensic Technology Consulting (“AFT”) practice within Deloitte Financial Advisory Services LLP (“Deloitte FAS”) and has more than 20 years of experience in customer, predictive and data analytics.

He specializes in designing processes for gathering data from disparate sources and merging the data to create and deploy high performance predictive modeling systems that can support fraud detection and monitoring, recommender systems, high frequency forecasting, data enrichment, text mining and revenue assurance. Seymour has led numerous projects using analytics and information technology to help identify improper payments and financial transactions.

Prior to joining Deloitte, Seymour held responsibility for developing and deploying analytic solutions for fraud detection, customer valuation, analyzing telco billing data (CABS, CDR, switch data SS7, ISUP, Cabledata) for a communications firm. In addition, Seymour was part of the CRM Analytics practice of a consulting firm, focused on developing data driven analytics systems for the financial services industry.

Seymour has consulted to the World Bank at Caribbean Development Bank focusing on developing and leveraging country risk models. He has also served on the faculty at Emory University where he taught undergraduate and graduate courses in microeconomics, international finance and economic development.

Seymour received a Bachelor of Business Administration Degree in Math and Economics from the University of the West Indies, and holds a Ph.D. in Econometrics from Temple University.

TITLE: Experiences and Lessons in Developing Industry-Strength Machine Learning and Data Mining Software
SPEAKER: Chih-Jen Lin, National Taiwan University and eBay Research Labs

ABSTRACT: Traditionally academic machine learning and data mining researchers focus on proposing new algorithms. The task of implementing these methods is often left to companies that are developing software packages. However, the gap between the two sides has caused some problems. First, the practical deployment of new algorithms still involves some challenging issues that need to be studied by researchers. Second, without further investigation after publishing their papers, researchers have neither opportunity to touch real problems nor see how their methods are used. In this talk, we discuss the experiences in developing two machine learning packages LIBSVM and LIBLINEAR, which have been widely used in both academia and industry. We demonstrate that the interaction with users leads us to identify some important research problems. For example, the decision to study and then support multi-class SVM was essential in the early stage of developing LIBSVM. The birth of LIBLINEAR was driven by the need to classify large-scale documents in Internet companies. For fast training of large-scale problems, we had to create new algorithms other than those used in LIBSVM for kernel SVM. We present some practical use of LIBLINEAR for Internet applications. Finally, we give lessons learned and future perspectives for developing industry-strength machine learning and data mining software.

Chih-Jen Lin is currently a distinguished professor at the Department of Computer Science, National Taiwan University and a visiting principal research scientist at eBay Research Labs. He obtained his B.S. degree from National Taiwan University in 1993 and Ph.D. degree from University of Michigan in 1998. His major research areas include machine learning, data mining, and numerical optimization.

Chih-Jen Lin is best known for his work on support vector machines (SVM) for data classification. His software LIBSVM is one of the most widely used and cited SVM packages. Nearly all major companies apply his software for classification and regression applications. More recently, his team developed the software LIBLINEAR for large-scale document classification. This package has quickly gained attention as Internet companies routinely use it for their applications.

Chih-Jen Lin has received many awards for his research work. A recent one is the ACM KDD 2010 best paper award. He is an IEEE fellow and an ACM distinguished scientist for his contribution to machine learning algorithms and software design. More information about him can be found at http://www.csie.ntu.edu.tw/~cjlin.

TITLE: Leveraging Data Mining to improve healthcare
SPEAKER: Bharat R Rao, Head, Center for Innovations Siemens Health Services

ABSTRACT:
Healthcare is undergoing a dramatic transformation, from an inefficient, costly, reactive and ad-hoc model of care delivery to a more efficient, outcomes-based, proactive and knowledge-driven model that aims to control sky-rocketing costs and improve patient outcomes. While the recent drive to record patient data in electronic health records (EHRs) will provide the foundation for this transformation, data mining of the EHRs will play a critical role in achieving these goals.
In this talk, we describe three major areas of healthcare where data mining has helped support marked improvements in healthcare. These include: mining medical images to help physicians detect medical abnormalities (e.g. cancer); mining the EHR for automated measurement of quality and compliance to the standard a care (an essential step for improving care); and rapid learning systems that mine large numbers of EMRs to discover key predictors of disease risk and outcome, and support personalized therapy recommendations. We will also discuss some emerging healthcare data mining challenges in the areas of population management and social networking, as healthcare changes from a (patient-) visit-centered model to a patient-centered model of care.

Bharat Rao, PhD is Senior Director and Head of the newly-formed Center for Innovations for Siemens Health Services, based in Malvern, PA. The Center for Innovations has been established with the vision to foster thought-leadership for Siemens in the dynamic field of healthcare IT. The Center’s goals are to create a continuous-innovation pipeline of new products, services and capabilities; to establish collaborations with luminary customers, academic & industry partners; and to drive an innovation agenda that impacts the entire Health Services portfolio and workforce.

Previously, Dr. Rao led the Knowledge Solutions group, Healthcare Analytics and Business Intelligence which develops and deploys data analytics solutions that analyze millions of patient records, impacting three major areas in healthcare. These include, automated quality measurement and decision-support from hospitals EMR’s, computer-aided diagnosis systems to identify suspicious lesions on medical images, and predictive models for personalized medicine. The group launched the first-to-market startup offering in healthcare quality, Soarian Quality Measures (and its cloud counterpart, the Quality Reporting Service) which is now an essential part of Siemens solution to satisfy the meaningful use requirements for US health reform.

Dr. Rao has received multiple international awards, including the ACM SIGKDD (Data Mining society) Service Award in 2011 for "service to society for pioneering data mining applications in healthcare products that reduce healthcare costs and improve patient care." He was also named the Siemens Inventor of the Year in 2005, awarded yearly to one employee in Siemens Healthcare (45,000 employees worldwide) for the REMIND data mining platform. He is the only two-time winner of the International Data Mining Case Studies & Practice Prize, for the best deployed industrial and government data mining application, awarded by IEEE & ACM respectively.

Dr. Rao is recognized as a leading international expert in machine learning, healthcare analytics and mining ‘big data.’ He has been granted 45 patents (50 more pending), received multiple best paper awards and has published over 100 scholarly publications and one book. He is currently leading an international consortium to develop a Euro-US cancer research health IT network to develop personalized therapies for lung cancer.

Dr. Rao received a B.Tech in Electronics Engineering from the Indian Institute of Technology, Madras, and an M.S. and Ph.D. focusing on machine learning from the Dept. of Electrical & Computer Engineering, University of Illinois, Urbana-Champaign, in 1993. After his PhD, he joined Siemens Corporate Research, and formed the Data Mining group. In 2002, he moved to Siemens Healthcare to help found the "Computer-Aided Diagnosis & Therapy" group.

Dr. Rao's passions outside of the sphere of Science and Business include the sport of Cricket, Classic Rock, the history of Science, and the study of Philosophy and Religion. He is married and has two children.