4th MultiClust Workshop on Multiple Clusterings, Multi-view Data, and Multi-source Knowledge-driven Clustering in conjunction with the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, Illinois, USA, August 11-14 2013.
Following the success of the MultiClust workshops at KDD 2010, ECML PKDD 2011, SDM 2012, as well as the success of the 3Clust workshop at PAKDD 2012, we invite submissions to the 4th MultiClust workshop on multiple clusterings, multi-view data, and multi-source knowledge-driven clustering to be held in conjunction with SIGKDD 2013.
Multiple views and data sources require clustering techniques capable of providing several distinct analyses of the data. The cross-disciplinary research topic on multiple clustering has thus received significant attention in recent years. However, since it is relatively young, important research challenges remain. Specifically, we observe an emerging interest in discovering multiple clustering solutions from very high dimensional and complex databases. Detecting alternatives while avoiding redundancy is a key challenge for multiple clustering solutions. Toward this goal, important research issues include: how to define redundancy among clusterings; whether existing algorithms can be modified to accommodate the finding of multiple solutions; how many solutions should be extracted; how to select among far too many possible solutions; how to evaluate and visualize results; how to most effectively help the data analysts in finding what they are looking for. Recent work tackles this problem by looking for non-redundant, alternative, disparate or orthogonal clusterings. Research in this area benefits from well-established related areas, such as ensemble clustering, constraint-based clustering, frequent pattern mining, theory on result summarization, consensus mining, and general techniques coping with complex and high dimensional databases. At the same time, the topic of multiple clustering solutions has opened novel challenges in these research fields.
Overall, this cross-disciplinary research endeavor has recently received significant attention from multiple communities. In this workshop, we plan to bring together researchers from the above research areas to discuss issues in multiple clustering discovery. We solicit approaches for solving emerging issues in the areas of clustering ensembles, semi-supervised clustering, subspace/projected clustering, co-clustering, and multi-view clustering. Of particular interest will be papers that draw new and insightful connections between these areas, and papers that contribute to the achievement of a unified framework that combines two or more of these problems.
TOPICS OF INTEREST
---------------------------
The panel discussions at the last MultiClust workshops and a recent tutorial on discovering multiple clustering solutions document the research interest on this exciting topic. A non-exhaustive list of topics of interest is given below:
• Clustering Ensembles
• Co-clustering Ensembles
• Subspace/Projected Clustering
• Semi-supervised Clustering
• Multiview / Alternative Clustering
• Handling Redundancy in Clustering Results
• Bayesian Learning for Clustering
• Model Selection Issues: How Many Clusters?
• Co-clustering with External Knowledge for Relational Learning
• Probabilistic Clustering with Constraints
• Kernels for Semi-supervised Clustering
• Active Learning of Constraints in Clustering Ensembles
• Constraint-based Clustering for Uncertain Data Management and Mining
• Integration of Frequent Pattern Mining in (Semi-supervised) Multi-view Clustering
• Evaluation Criteria for Multi-view Data Clustering
• Benchmark Data for Multi-view Data Clustering
• Incorporating User Feedback in Semi-supervised Clustering
• Clustering Ensembles for Uncertain Data Management and Mining
• Multiple clusterings and multi-view data in Heterogeneous Information Networks
• Applications (e.g. document mining, health care, privacy and trustworthiness)
We encourage submissions describing innovative work in related fields that address the issue of multiplicity in data mining.
SUBMISSION GUIDELINES
---------------------------
We invite submission of unpublished original research papers that are not under review elsewhere. All papers will be peer reviewed. Papers may be up to 8 pages long. We also invite vision papers and descriptions of work-in-progress or case studies on benchmark data as short paper submissions of up to 4 pages. If accepted, at least one of the authors must attend the workshop to present the work.
Contributions should be submitted in pdf format using the submission site at https://cmt.research.microsoft.com/MULTICLUST2013/
The submitted papers must be written in English and formatted according to the SIGKDD 2013 submission guidelines.
If you are considering submitting to the workshop and have questions regarding the workshop scope or need further information, please do not hesitate to contact the PC chairs.
PROCEEDINGS
---------------------------
The workshop proceedings are available in the ACM Digital Library.
IMPORTANT DATES
---------------------------
Submission deadline (extended) : June 3, 2013
Acceptance notification: June 25, 2013
Camera-ready deadline: July 2, 2013
Workshop date: August 11, 2013
PROGRAM CHAIRS
---------------------------
Ira Assent, Aarhus University, Denmark
Carlotta Domeniconi, George Mason University, USA
Francesco Gullo, Yahoo! Research, Spain
Andrea Tagarelli, University of Calabria, Italy
Arthur Zimek, Ludwig-Maximilians-Universität München, Germany
WORKSHOP SCHEDULE
---------------------------
August 11, 2013 Sunday (2-5 p.m.)
2:00-2:30
Opening
Invited Talk
Michael Berthold, University of Konstanz, Germany
Parallel Universes
2:30-3:30
Session 1
Stochastic Subspace Search for Top-K Multi-View Clustering
Geng Li, Stephan Günnemann, Mohammed J. Zaki
Probabilistic Non-linear Distance Metric Learning For Constrained Clustering
Behnam Babagholami-Mohamadabadi, Ali Zarghami, Hojjat Abdollahi, Mohammad T. Manzuri-Shalmani
Variational Bayes Co-clustering with Auxiliary Information
Motoki Shiga, Hiroshi Mamitsuka
3:30-4:00
Coffee break
4:00-4:25
Invited Talk
Shai Ben-David, Waterloo University, Canada
A theoretical approach to the clustering selection problem
4:25-4:55
Session 2
Absolute and Relative Clustering
Toshihiro Kamishima, Shotaro Akaho
Spectral Graph Multisection Through Orthogonality
Huanyang Zheng, Jie Wu
4:55-5:00
Wrap-up
INVITED SPEAKERS
---------------------------
Shai Ben-David, Waterloo University, Canada
A theoretical approach to the clustering selection problem
Abstract: Clustering is a basic data mining task with a wide variety of applications. Not surprisingly, there exist many clustering algorithms. However, clustering is an ill defined problem - given a data set, it is not clear what a "correct" clustering for that set is. Indeed, different algorithms may yield dramatically different outputs for the same input sets. In contrast with other common learning tasks, like classification prediction, clustering does not have a well defined ground truth. Faced with a concrete clustering task, a user needs to choose an appropriate clustering algorithm (as well as a concrete setting for the tunable parameters of the chosen algorithm). Currently, such decisions are often made in a very ad hoc, if not completely random, manner. Given the crucial effect of the choice of a clustering algorithm on the resulting clustering, this state of affairs is truly regrettable. Can the research community develop effective tools for helping users make informed decisions when they come to pick a clustering tool for their data? How can we help the data analysts in finding the cluster structures are looking for?
Several paradigms have been proposed to answer that challenge. These include, semi-supervised clustering (in which the user specifies partial information about the desired clustering solution in the form of link/don't link examples) and multiple clusterings, as well as tools for visualization of clusterings. In this work, we propose a high-level approach to this challenge. The basic premise of my work is that prior domain knowledge is an indispensable component of any successful cluttering paradigm. In light of this, a major research objective is the development tools for communicating relevant prior knowledge between the data analysts, that has some understanding or intuition about the task at hand, and the tools for choosing clusterings (or clustering algorithms).
We address this objective by proposing two approaches. First, we address the choice of clustering algorithm. Our paradigm is to distill abstract properties of the input-output behaviors of different clustering paradigms. The goal is to come up with a list of such properties so that these properties can capture some of the domain knowledge that users have about their tasks, while being strong enough to distinguish between different clustering algorithms. We introduce several abstract properties of clustering functions and use them to taxonomize clustering algorithmic paradigms. Secondly, we consider requirements for defining the quality of given clusterings. Such clustering quality measures can be viewed as another way for expressing prior domain knowledge.
Michael Berthold, University of Konstanz, Germany
Parallel Universes
PROGRAM COMMITTEE
---------------------------
James Bailey, University of Melbourne, Australia
Ricardo J. G. B. Campello, University of São Paulo, Brazil
Xuan-Hong Dang, Aarhus University, Denmark
Ines Färber, RWTH Aachen University, Germany
Wei Fan, IBM T. J. Watson Research Center and IBM CRL, USA
Ana Fred, Technical University of Lisbon, Portugal
Stephan Günnemann, CMU, USA
Dimitrios Gunopulos, University of Athens, Greece
Michael E. Houle, NII, Japan
Emmanuel Müller, KIT, Germany
Erich Schubert, LMU Munich, Germany
Grigorios Tsoumakas, Aristotle University of Thessaloniki (AUTh), Greece
Giorgio Valentini, University of Milan, Italy
Jilles Vreeken, University of Antwerp, Belgium
Thomas Seidl, RWTH Aachen University, Germany
The call for this workshop also appears at http://www.kdnuggets.com/ - Analytics, Big Data, and Data Mining Resources