KDD Cup: User Modeling based on Microblog Data and Search Click Data


2012 KDDCUP has finally completed with over 900 teams competing on two tracks. Winners will be announced at the conference opening session and a KDDCUP workshop will be held on Aug 12, 2012.

Please visit the external KDD Cup Website for more details.

This year's KDD Cup is sponsored by Tencent Inc., which is China's largest Internet company in terms of active users (over 700 Million users as of Jan. 2012). Tencent Inc. owns a full portfolio of popular products including instance messaging, email, and news portal, search engine, online games, blogging and micro-blogging in China, offering a rich opportunity to build user models for highly effective user intent prediction and result recommendation. This year's KDD Cup consists of two separate tasks.

Task 1. Social Network Mining on Microblogs (Weibo)

Tencent Weibo (http://t.qq.com/) offers a wealth of social-networking information. For the 2012 KDD Cup, the released data represents a sampled snapshot of the Tencent Weibo users' preferences for various items - the recommendation to users and follow-relation history. In addition, items are tied together within a hierarchy. That is, each person, organization or group belongs to specific categories, and a category belongs to higher-level categories. In the competition, both users and items (person, organizations and groups) are represented as anonymous numbers that are made meaningless, so that no identifying information is revealed. The data consists of 10 million users and 50,000 items, with over 300 million recommendation records and about three million social-networking "following" actions. Items are linked together within a defined hierarchy, and the privacy-protected user information is very rich as well. The data has timestamps on user activities.

Task 1 is to predict which users a given user will follow, among all potential users.

Task 2. User Click Modeling based on Search Engine Log Data

Online advertising has been the financial support of the Internet industry for years. Three successful kinds of computational ad systems are search ad, contextual ad and social networking ad systems. Search ads systems retrieve and rank ads given a query, and display result ads together with results from the search engine. Once a user clicks on an ad, the advertiser pays the search engine for its help on promotion. The ranking of ads is to maximize users' satisfaction, advertisers' return-on-investment and search engine's revenue. Contextual ad systems involve an additional role, the publishers, who own Internet properties like Web sites, forums or mobile apps. Programs embedded in these properties request ads from ad systems. The ad system finds ads that semantically match content of the properties. Recently, a third kind of computational ad systems is gaining popularity, including social network ads, gained a lot of attention, where the ad system ranks ads with consideration of social relationship.

In all aforementioned systems, a key algorithmic component is to predict the click-through rate (pCTR) of ads. This is because all such systems optimize monetization under the supervision of economic rules (e.g., General Second Price auction, the one behind Google AdWords and others); and these rules require ads pCTR values to rank ads and to price clicks. The closer the pCTR to the truth, the more effective the monetization would be. The use of user information, including demographics and historical behaviors on search engines, e-business platforms, social networks, and micro-blogs, is likely valuable to improve the accuracy of ads pCTR in all above systems.

Task 2's aim is to accurately predict the ads' click-through rate in online computational ad systems.


Feb 20, 2012 Competition announcement linked to KDD official site
Mar 1, 2012 Registration opens (dataset ready for the public)
Mar 15, 2012 Competition begins
Jun 1, 2012 Competition ends (submission deadline)
Jun 5, 2012 Results compiled
Jun 8, 2012 Winners notified
Aug 12, 2012 Workshop

*Note that this is only an initial announcement. Stay tuned for more detailed announcements.

KDDCUP 2012 Organizers

  • Dr. Gordon Sun, Chief Scientist, Tencent Inc.
  • Dr. Yading Aden Yue, Expert Researcher, Tencent Inc.
  • Dr. Yi Wang, Deputy Director, Contextual Advertising Platform, Tencent Inc.
  • Mr. Yanzhi Niu, Scientist, Weibo, Tencent Inc.
  • Dr. Yong Nicky Li, Leader, Data Mining Group, Tencent Inc.
  • Mr. Leostar Zhou, Researcher, Contextual Advertising Platform, Tencent Inc.
  • Mr. Ubi Wang, Researcher, Contextual Advertising Platform, Tencent Inc.
  • Mr. Kokomo Huang, Researcher, Weibo, Tencent Inc.

Link to KDD Cup Call for Proposals.