Practical Research Projects - Project Description

Project Description

This is a description of a Practical Research Project associated with the PREP course

Project Title

Web data extraction frameworks

Quarter

Q2

Responsible

Jakob G. Thomsen.

Second level advisor: Erik Ernst.

Aims

The web is a big database of unstructured information. The information is meant to be read by humans, so the information is nicely structured and rendered. Unfortunately it means that it is a non-trivial task to extract information automatically by a computer. Since web pages change over time, sometimes the mechanisms for extracting information breaks down or even worse, extracts the wrong data. Hence the extracted output needs to be verified before the data can be trusted. It this PREP project the student will investigate different ways of extracting data from a web page and for verifying the extracted data.

The expected outcome is an implementation of one or more techniques for extracting information and verifying the extracted information. All of the techniques are described in the literature.

Learning Outcome

The student will gain insight into the process of research implementing described techniques in collaboration with the supervising Ph.D. student. Furthermore the student will learn to read and understand technical scientific papers.

Requirements

dWebTek, AWT or SWP/CWP (recommended)