Student Presentations

EEF Summer School on Massive Data Sets

June 27-July 1, 2002, BRICS, University of Aarhus, Denmark

Thursday June 27

16.15	The Query Language TQL Giovanni Conforti This work shortly presents the query language TQL, a query language for semistructured data, that can be used to query XML files. TQL substitutes the standard path-based pattern-matching mechanism with a logic-based mechanism, where the programmer specifies the properties of the pieces of data she is trying to extract. As a result, TQL queries are more `declarative', or less `operational', than queries in comparable languages. This feature makes some queries easier to express, and should allow the adoption of better optimization techniques. Through a set of examples, we show that the range of queries that can be declaratively expressed in TQL is quite wide. The implementation of TQL binding mechanism requires the adoption of non-standard techniques, and some of its aspects are still open. I will implicitly report about the current status of the implementation by writing all queries using the version of TQL that has been implemented, and that can be freely downloaded from http://tql.di.unipi.it/tql.
16.25	Parallel Computing for 3D Data Visualisation and Transmission in Grid based Applications Daniele D'Agostino Scientists from academia and industries nowadays have unprecedented computing and instrumental capabilities for studying and simulating natural phenomena at greater accuracy, and this leads to the production of huge amount of data. To store and share these data collections is now emerging the use of the Grid as a very large data repository. Advantages are the enormous availability of storage resources and the share of data, a disadvantage is geographic distributions of data, and the remoteness of users and data. In particular data transmission represents a bottleneck in data visualization across the Grid, because of the amount of data and the complexity of 3D visualisation models. The actual and promised increases in phone and network bandwidth will not suffice to solve by itself this problem. For this reason there is an increasing research activity to design efficient and effective data simplification and compression algorithms in order to make the transmission of large 3D models over the network a feasible task. The use of local parallel processing to make these algorithms even more efficient and satisfy real-time requirements of visualisation of distributed 3D data over the Grid is discussed in this paper. A specific class of algorithms is considered, and problems and possibilities related to the parallelisation of these algorithms on cluster of workstation are addressed. Slides [ ppt, 16 slides ]
16.35	Sparse cycle bases in graphs Franziska Berger The problem to check efficiently the validity of the Kirchhoff Voltage Law in Electric Networks can be transformed to the solvability analysis of a system of linear equations which consists of a sparse basis of the cycle vector space of a graph. The size of practical Electric Networks requires the development of new methods that cope successfully with large data sets. We present fast heuristic methods for the construction of a sparse cycle basis in graphs.
16.45	PowerForms: Declarative Client-side Form Field Validation Sunil Kothari PowerForms is a tool for extending HTML forms with client-side input validation. The validation requirements are stated in a declarative manner, that is, one tells what should hold for certain form fields rather than telling how to check whether it holds. In my presentation, I give a demo of how the entire concept is put into practice and talk about the future extensions. References: PowerForms: Declarative Client-side Form Field Validation by Claus Brabrand, Anders Møller, Mikkel Ricky, and Michael I. Schwartzbach. in World Wide Web Journal, vol. 3, no. 4. http://www.brics.dk/bigwig/powerforms/
16.55	Break
17.10	Data Mining & Association Rules Siegfried Nijssen Data mining is the process of finding new and potentially useful knowledge from data. One popular approach to obtain this goal is to search for `association rules'. We investigate how these rules can be found (provable) efficiently and how these rules can be made more meaningful. To improve their applicability we introduce a general formalism for dealing with multi-relational databases. Although our algorithm has a better performance than related algorithms, it still has too long run-times. We are now studying more efficient ways to deal with large databases in our setup.
17.20	R-Trees with bounded query time Herman Haverkort
17.30	Analysis of micro vascular networks Steffen Prohaska The gaol of the research project is to analyse vascular networks, which are recorded using 3D confocal microscopy or other suitable 3D imaging techniques like synchrotron radiation. This includes analysis of the network's topology, morphometry of individual vessels, and statistical analysis of morphometric properties on the network. To perform such analysis a suitable discrete representation of the vascular network is to be extracted from the image data. The imaging technologies used in the project create huge data sets (synchrotron radiation: 2048^3, confocal microscopy: overlapping blocks covering an overall volume of 3000x3000x300). Algorithms for extraction of centerlines and calculation of quantities like thickness have to be adapted for out-of-core processing. Interactive 3D visualization of the processing steps and the resulting graphs are another focus of the project.
17.40	The JWIG project Aske Simon Christensen JWIG is an extension to the Java language for programming interactive web services. It is based on an explicit session model, where a series of client interactions is controlled by a sequential thread on the server. XHTML output is constructed using a flexible and efficient template mechanism. Part of the project has been aimed at developing a program analysis that is able to verify for a given JWIG service, that all documents shown to the client are valid XHTML 1.0, and that all input values expected by the program were present as form input fields in the last document shown.