Lars Arge and Mikkel Thorup. RAM-Efficient External Memory Sorting
Abstract: In recent years a large number of problems have been considered in external memory models of computation, where the complexity measure is the number of blocks of data that are moved between slow external memory and fast internal memory (also called I/Os). In practice, however, internal memory time often dominates the total running
time once I/O-efficiency has been obtained. In this paper we study algorithms for fundamental problems that are simultaneously I/O-efficient and internal memory efficient in the RAM model of computation.
Mina Ghashami, Jeff Phillips and Feifei Li. Continues Matrix Approximation on Distributed Data
Abstract: This paper considers the problem of "tracking low-rank matrix approximation" in the distributed streaming model. In this model, there are m
distributed sites each observing a distinct stream of data and has a communication
channel with a coordinator node. The goal is to track an
epsilon-approximation to the norm of the matrix along any direction. To that end,
we present novel algorithms to address this
problem. Our algorithms maintain a smaller matrix B, as an
approximation to a distributed streaming matrix A, such that for any
unit vector x:| |A x|^2 - |B x|^2 | <= epsilon |A|_F^2. Our
algorithms work in streaming fashion and incur small communication,
which is critical for distributed computation. Our best method is
deterministic and uses only O((m epsilon) log(beta N)) communication,
where N is the size of stream (at the time of the query) and beta
is an upper-bound on the squared norm of any row of the matrix.
In addition to proving all algorithmic properties
theoretically, extensive experiments with real large datasets
demonstrate the efficiency of these protocols.
Dominik Köppl. Dynamic Skyline Computation with the Skyline Breaker Algorithm
Abstract: Given a sequential data input, we tackle parallel dynamic skyline computation of the read data by means of a spatial tree structure for indexing fine-grained feature vectors.
For this purpose, we modified the Skyline Breaker algorithm that solves skyline computation with multiple local split decision trees in a parallel manner.
With this approach, we propose an algorithm for dynamic skyline computation that inherits the robustness against the ``dimension curse'' and different data distributions.
Pooya Davoodi, Jeremy Fineman, John Iacono and Ozgur Ozkan. Cache-Oblivious Persistence
Abstract: Partial persistence is a general transformation that takes a data structure and allows queries to
be executed on any past state of the structure.
The cache-oblivious model is the leading model of a modern multi-level memory hierarchy.
We present the first general transformation for making cache-oblivious model data structures partially persistent.
Zhewei Wei and Ke Yi. On Range Summary Queries
Abstract: Database queries can be broadly classified into two categories: reporting
queries and aggregation queries. The former retrieves a collection of
records from the database that match the query's conditions, while the
latter returns an aggregate, such as count, sum, average, or max (min), of
a particular attribute of these records. Aggregation queries are
especially useful in business intelligence and data analysis applications
where users are interested not in the actual records, but some statistics
of them. They can also be executed much more efficiently than reporting
queries, by embedding properly precomputed aggregates into an index.
However, reporting and aggregation queries provide only two extremes for
exploring the data. Data analysts often need more insight into the data
distribution than what those simple aggregates provide, and yet certainly
do not want the sheer volume of data returned by reporting queries. In
this paper, we design indexing techniques that allow for extracting a
statistical summary of all the records in the query. The summaries we
support include frequent items, quantiles, various sketches, and wavelets,
all of which are of central importance in massive data analysis. Our
indexes require linear space and extract a summary with the optimal or
near-optimal query cost.
Francesco Silvestri. Subgraph Enumeration in Massive Graphs
Abstract: We consider the problem of enumerating all instances of a given sample graph in a large data graph. Our focus is on determining the input/output (I/O) complexity of this problem. Let $E$ be the number of edges in the data graph, $k=\BO{1}$ be the number of vertexes in the sample graph, $B$ be the block length, and $M$ be the main memory size. The main result of the paper is a randomized algorithm that enumerates all instances of the sample graph in $\BO{E^{k/2}/\left(BM^{k/2-1}\right)}$ expected I/Os if the maximum vertex degree of the data graph is $\sqrt{EM}$. Under some assumptions, the same bound also applies with high probability.
Our algorithm is I/O optimal, in the worst-case, when the sample graph belongs to the Alon class, which includes cliques, cycles and every graph with a perfect matching: indeed, we show that any algorithm enumerating $T$ instances must always use $\BOM{T/\left(BM^{k/2-1}\right)}$ I/Os and there are graphs for which $T=\BOM{E^{k/2}}$. Finally, we propose a parallelization of the randomized algorithm in the MapReduce framework.
Peyman Afshani and Nodari Sitchinava. I/O-efficient Range Minima Queries
Abstract: In this paper we study the offline (batched) range minima query (RMQ) problem
in the external memory (EM) and cache-oblivious (CO) models. In the
static RMQ problem, given an array "A", a query "rmq_A(i,j)" returns
the smallest element in the range "A[i,j]".
If "B" is the size of the block and "m" is the number of blocks that fit in the
internal memory in the EM and CO models, we show that "Q" range minima queries
on an array of size "N" can be answered in "O(scan{N} + Q/B log_m min{Q/B,N/B}" I/Os
in both models using linear space. Our cache-oblivious result is new and our external memory
result is an improvement of the previously known bound.
We also show that these bounds is tight by proving a matching lower bound.
Our lower bound holds even if the queries are presorted in any predefined order.
In the batched dynamic RMQ problem, the queries must be answered in the
presence of the updates (insertions/deletions) to the array. We show that in
the EM model we can solve this problem in "O(sort{N} + sort{Q}log_m N/B)" I/Os,
again improving the best previously known bound.
Erika Duriakova, Neil Hurley, Deepak Ajwani and Alessandra Sala. Analysis of the Semi-synchronous approach to Large-scale Parallel Community Finding
Abstract: Community-finding in graphs is the process of identifying highly
cohesive vertex subsets. Recently the vertex-centric approach
has been found effective for scalable graph processing and is
implemented in systems such as GraphLab and Pregel. In the
vertex-centric approach, the analysis is decomposed into a set
of local computations at each vertex of the graph, with results
propagated to neighbours along the vertex’s edges. Many community
finding algorithms are amenable to this approach as they
are based on the optimisation of an objective through a process
of iterative local update (ILU), in which vertices are successively
moved to the community of one of their neighbours in
order to achieve the highest local gain in the quality of the objective.
The sequential processing of such iterative algorithms
generally benefits from an asynchronous approach, where a vertex
update uses the most recent state as generated by the previous
update of vertices in its neighbourhood. When vertices
are distributed over a parallel machine, the asynchronous approach
can encounter race conditions that impact on its performance
and destroy the consistency of the results. Alternatively,
a semi-synchronous approach ensures that only non-conflicting
vertices are updated simultaneously. In this paper we study
the semi-synchronous approach to ILU algorithms for community
finding on social networks. Because of the heavy-tailed
vertex distribution, the order in which vertex updates are applied
in asynchronous ILU can greatly impact on both convergence
time and quality of the found communities. We study
the impact of ordering on the distributed label propagation and
modularity maximisation algorithms implemented on a sharedmemory
multicore architecture. We demonstrate that the semisynchronous
ILU approach is competitive in time and quality
with the asynchronous approach, while allowing the analyst to
maintain consistent control over update ordering. Thus, our
implementation results in a more robust and predictable performance
and provides control over the order in which the node labels
are updated, which is crucial to obtaining the correct tradeoff
between running time and quality of communities on many
graph classes.
Zengfeng Huang, Bozidar Radunovic, Milan Vojnovic and Qin Zhang. Communication Complexity of Approximate Maximum Matching in Distributed Graph Data
Abstract: We consider the problem of computing an approximate maximum matching in a graph that consists of $n$ vertices in a distributed system where edges of the graph are partitioned across $k$ sites. An important problem to scale up large-scale computations is to design efficient algorithms with respect to the communication complexity, which is of primary concern in data centers where the communication bandwidth is a precious resource. Our main result is that any algorithm for finding an alpha-approximate maximum matching has the communication complexity of Omega(alpha^2 k n) bits. We show that this lower bound is tight by showing that a simple sequential algorithm has the communication complexity that is equal to the lower bound up to a logarithmic factor. Our lower bound for approximate maximum matching also implies lower bounds for other important distributed combinatorial optimization problems such as max-flow and graph sparsification. The lower bound is established by a new approach for multi-party randomized communication complexity for graph problems that is of wide applicability and thus of interest in its own right.