Practical Research Projects - Project Description
Project Description
This is a description of a Practical Research Project associated with the PREP course
Project Title
Web data extraction frameworks
Quarter
Q2
Responsible
Second level advisor: Erik Ernst.
Aims
The web is a big database of unstructured information. The information is meant to be read by humans, so the information is nicely structured and rendered. Unfortunately it means that it is a non-trivial task to extract information automatically by a computer. Since web pages change over time, sometimes the mechanisms for extracting information breaks down or even worse, extracts the wrong data. Hence the extracted output needs to be verified before the data can be trusted. It this PREP project the student will investigate different ways of extracting data from a web page and for verifying the extracted data.
The expected outcome is an implementation of one or more techniques for extracting information and verifying the extracted information. All of the techniques are described in the literature.
Learning Outcome
The student will gain insight into the process of research implementing described techniques in collaboration with the supervising Ph.D. student. Furthermore the student will learn to read and understand technical scientific papers.
Requirements
dWebTek, AWT or SWP/CWP (recommended)