Humane Data Mining
Search Labs, Microsoft Research, Mountain View, CA
Data Mining has made tremendous strides in the last decade. It is time to take data mining to the next level of contributions, while continuing to innovate for the current mainstream market. We postulate that a fruitful future direction could be humane data mining: applications to benefit individuals. The potential applications include personal data mining (e.g. personal health), enable people to get a grip on their world (e.g. dealing with the long tail of search, enable people to become creative (e.g. inventions arising from linking non-interacting scientific literature), enable people to make contributions to society (e.g. education collaboration networks), data-driven science (e.g. study ecological disasters, brain disorders). Rooting our future work in these (and similar) applications, will lead to new data mining abstractions, algorithms, and systems.
Rakesh Agrawal is a Microsoft Technical Fellow and heads the Search Labs in Microsoft Research. He is the recipient of the ACM-SIGKDD First Innovation Award, ACM-SIGMOD Edgar F. Codd Innovations Award, ACM-SIGMOD Test of Time Award, VLDB 10-Yr Most Influential Paper Award, ICDE Most Influential Paper Award, and Computerworld First Horizon Award. He is a Member of the National Academy of Engineering, a Fellow of ACM, and a Fellow of IEEE. Scientific American named him to the list of 50 top scientists and technologists in 2003.
Prior to joining Microsoft in March 2006, Rakesh was an IBM Fellow and led the Quest group at the IBM Almaden Research Center. Earlier, he was with the Bell Laboratories, Murray Hill from 1983 to 1989. He also worked for 3 years at India's premier company, the Bharat Heavy Electricals Ltd. He received the M.S. and Ph.D. degrees in Computer Science from the University of Wisconsin-Madison in 1983. He also holds a B.E. degree in Electronics and Communication Engineering from IIT-Roorkee, and a two-year Post Graduate Diploma in Industrial Engineering from the National Institute of Industrial Engineering (NITIE), Bombay.
Rakesh is well-known for developing fundamental data mining concepts and technologies and pioneering key concepts in data privacy, including Hippocratic Database, Sovereign Information Sharing, and Privacy-Preserving Data Mining. IBM's commercial data mining product, Intelligent Miner, grew out of his work. His research has been incorporated into other IBM products, including DB2 Mining Extender, DB2 OLAP Server and WebSphere Commerce Server, and has influenced several other commercial and academic products, prototypes and applications. His other technical contributions include Polyglot object-oriented type system, Alert active database system, Ode (Object database and environment), Alpha (extension of relational databases with generalized transitive closure), Nest distributed system, transaction management, and database machines.
Rakesh has been granted 60 patents. He has published more than 150 research papers, many of them considered seminal. He has written the 1st as well as 2nd highest cited of all papers in the fields of databases and data mining (13th and 15th most cited across all computer science as of Februray 2007 in CiteSeer). Wikipedia lists one of his papers as one of the most influential database papers. His papers have been cited more than 6500 times, with more than 15 of them receiving more than 100 citations each. He is the most cited author in the field of database systems. His work has been featured in New York Times Year in Review, New York Times Science section, and several other publications.
Unsolved Problems in Search (and how we might approach them)
W. Bruce Croft
Department of Computer Science, University of Massachusetts Amherst
Search applications have become ubiquitous and very successful. Major
advances have been made in understanding how to deliver effective results
very efficiently for a class of queries. As the range of applications
broaden to include Web search, desktop search, enterprise search, vertical
search, social search, etc., the number of new research challenges has
appeared to grow rather than shrink. Many of these challenges are variations
on underlying themes and principles that information retrieval has focused
on for more than 40 years. In this talk, the unsolved problems arising from
new search applications will be categorized and discussed in terms of
information retrieval models, and some potential paths to solutions for
these problems will be outlined.
: W. Bruce Croft is a Distinguished Professor in the Department of Computer Science at the University of Massachusetts, Amherst, which he joined in 1979. In 1992, he founded the Center for Intelligent Information Retrieval
(CIIR), which combines basic research with technology transfer to a variety
of government and industry partners. Dr. Croft was Chair of the department
He received the B.Sc.(Honors) degree in 1973, and an M.Sc. in Computer
Science in 1974 from Monash University in Melbourne, Australia. His Ph.D. in
Computer Science was from the University of Cambridge, England in 1979.
His research interests are in many areas of information retrieval, including
retrieval models, representation, Web search, query processing,
cross-lingual retrieval, and search architectures. He has published more
than 200 articles on these and other subjects, has served on numerous
program committees, and has been involved in the organization of many
workshops and conferences.
Dr. Croft was a member of the National Research Council Computer Science and
Telecommunications Board, 2000-2003, and Editor-in-Chief of ACM Transactions
on Information Systems, 1995-2002. Dr. Croft was elected a Fellow of ACM in
1997, received the Research Award from the American Society for Information
Science and Technology in 2000, and received the Gerard Salton Award from
the ACM Special Interest Group in Information Retrieval (SIGIR) in 2003.
Markov Logic: A Unifying Language for Information and Knowledge Management
Department of Computer Science and Engineering, University of Washington
Modern information and knowledge management is characterized by high
degrees of complexity and uncertainty. Complexity is well handled by
first-order logic, and uncertainty by probabilistic graphical models.
What has been sorely missing is a seamless combination of the two.
Markov logic provides this by attaching weights to logical formulas
and treating them as templates for features of Markov random fields.
I will survey Markov logic representation, inference, learning and
applications. Inference algorithms combine ideas from satisfiability
testing, resolution, Markov chain Monte Carlo and belief propagation.
Learning algorithms involve statistical weight learning and inductive
logic programming. Markov logic has been successfully applied to a wide
range of information and knowledge management problems, including
information extraction, entity resolution, ontology learning, link
prediction, heterogeneous knowledge bases, and others. It is the basis
of the open-source Alchemy system.
(Joint work with Stanley Kok, Daniel Lowd, Hoifung Poon, Matt Richardson,
Parag Singla, Marc Sumner, and Jue Wang.)
: Pedro Domingos is Associate Professor of Computer Science and
Engineering at the University of Washington. His research interests
are in artificial intelligence, machine learning and data mining. He
received a PhD in Information and Computer Science from the University
of California at Irvine, and is the author or co-author of over 100
technical publications. He is a member of the advisory board of JAIR,
a member of the editorial board of the Machine Learning journal, and a
co-founder of the International Machine Learning Society. He was
program co-chair of KDD-2003, and has served on numerous program
committees. He has received several awards, including a Sloan
Fellowship, an NSF CAREER Award, a Fulbright Scholarship, an IBM
Faculty Award, and best paper awards at KDD-98, KDD-99 and PKDD-2005.