Algorithm project reports (working document)

When doing an algorithm project involving implementation, it is expected that the project work is documented by a report. The report should include a description of the problem and algorithms considered, describe your decisions made during the implementation, document your experimental findings, and include a discussion/explanation of the experimental results obtained.

The below is a list of (more or less random) issues to be aware of during the project work and report writing.

Report

Frontpage should state group members: name, email, årskort, date.
Use page numbers.
Spell check report.
For each plot in the report there should be a discussion why you made the experiment.
If many symbols/variables are used then add an appendix with a table of notation (this is also a help to avoid overloading symbols with different meanings, and to ensure consistent naming throughout the report).
Include a table of figures (longer reports).
Include an index of terms used in the report and the page numbers of the first appearance/definition of the terms (longer reports).
The report should include the relevant references, eg books, research papers, online resources (use eg bibtex).
The report should include a list of missing parts in the implementation. Are some parts of the code known to fail? Have some functionallity not been implemented? Are there special cases known where the implementation does not work? ...

Figures

Please avoid overlap of data plots and description within the plot.
Use clear distinguishable marks if having more data sets in one plot (should be visible both in black-and-white and color printouts).
If connecting data points in plots with lines then the data points should still be visualized in the plot.
Include screendumps to illustrate the running program (white background prefered).

Implementation

Provide a URL where the source code/executables, test-data, generated output, and report is available. For binary executable files state what machine is expected to be able to run the program.
Describe how to run the program(s) and the arguments to the program(s). Preferably provide a make -file and README-file together with the sources.
State the machine type, the machine characteristics (CPU, frequency, size of caches and RAM), operating system (and version), implementation language, compiler (and version number), and libraries used.

Code tuning

Use a program profiler (eg GNU gprof) to check where the time is spend in a program.
Check the assembler code - the compiled C++ code is perhaps not translated into the expected low level assembler code (code removed, loops turned around, tail recursion, inlined code...).
Number of cache faults and other hardware counts can be accessed using eg. the PAPI library (requires special a linux kernel).

Experiments

Repeat experiments. Use average over several runs of the program. Avoid using the time used for the first run, since this typically has an overhead, eg loading program into memory, emptying cache, etc.
Use a program like Gnuplot to plot experimental data sets.
In plots use logarithmic scales whenever appropriate, in particular to study assymptotic behaviour
To check if data points f(n) follow eg an nlog n curve (hard to identify in a plot), try to plot the data as f(n)/(nlog n) and see if it converges to a constant for large n.
Try to approximate the plotted data points with a curve. Does the curve follow the theoretical expected behavior?
Measure time, number of comparisons, tree rotations, cache faults, branch mispredictions, ... Do they support the theory? Do hardware parameters like cache size influence the running time? Try to explain the observed behavior through theoretical arguments.
Generate many plots - move similar plots to an appendix, and include in the main report a representative for a group of similar plots.

Testing

Use invariants/assertions in programs to check if the program makes the expected computations, input are as expected, ...
Use a set of tests for different parts of code, eg to test if all back-pointers are correctly set.
Test if output is correct, eg by testing if different algorithms generate the same output, use assertions in code, add code to check various conditions, visualize output, automate check if output is valid.
Automate you test cases (eg using a set of scripts) so that you can easily rerun tests and automatically update the plots generated.
Describe in the report the different test cases used.
If generating random input then describe precisely how you generate the input and what random distributions you use (and the library function you use in the implementation for generating random numbers).
Do you handle all degenerate cases, eg three points on a line, four points on a circle, identical points, overlapping line segments, ... ?

Common LaTeX typos

$O(log n)$ should be $O(\log n)$ (the functions max, min, log, exp, sin, cos, etc., should be written as \max, \min, \log, \exp, \sin, \cos).
Don't write "Århus Universitet". The name is "Aarhus Universitet" or "Aarhus University".
Use "~" to avoid unfortunate linebreaks, e.g. "Brodal~\cite{Brodal95}", "$n$~lines", "the point~$p$".
Do not start a sentence with mathematics, eg. "$S$ is the set of points" should be restated by something like "The set of points is denoted $S$".
"theorem 7", "figure 2" should be "Theorem 7", "Figure 2" (captital since name of a theorem or a figure)
When writting sets "{ .. | ... }", the "|" is achieved using \mid.

Page maintained by Gerth S. Brodal