2007.09.14 |
| Date | Tue Sep 25 |
| Time | 14:15 — 15:15 |
| Location | DI-Turing-014 |
A probabilistic study of the RESTART scheme forfailure recovery.
Søren Asmussen, IMF, AU
Lester Lipsky, Dept. of Computer Science, Storss, CT
Consider a task like the execution of a computer program
or a file transfer that may fail before being completed.
Standard schemes for failure recovery are RESUME (continue
after repair), REPLACE (start a new task of the same type)
and RESTART (start the same task from the beginning),
but also ideas like checkpointing have been discussed.
Failure times are considered random and often also ideal task times.
In both cases, the total actual task time is random.
RESUME and REPLACE have been analyzed in work by Trivedi, Kulkarni
and others, whereas RESTART has resisted detailed analysis until recently.
We present a discussion of the variousschemes and more detailed
results on the structure of thetotal task time for RESTART.
Also some applications to parallel computing are outlined.
Lester Lipsky will also give a seminar at IMF on Sept. 27, see www.thiele.au.dk/news.html