Lars Arge and Mikkel Thorup. RAM-Efficient External Memory Sorting

**Abstract: **In recent years a large number of problems have been considered in external memory models of computation, where the complexity measure is the number of blocks of data that are moved between slow external memory and fast internal memory (also called I/Os). In practice, however, internal memory time often dominates the total running

time once I/O-efficiency has been obtained. In this paper we study algorithms for fundamental problems that are simultaneously I/O-efficient and internal memory efficient in the RAM model of computation.

Mina Ghashami, Jeff Phillips and Feifei Li. Continues Matrix Approximation on Distributed Data

**Abstract: **This paper considers the problem of "tracking low-rank matrix approximation" in the distributed streaming model. In this model, there are m

distributed sites each observing a distinct stream of data and has a communication

channel with a coordinator node. The goal is to track an

epsilon-approximation to the norm of the matrix along any direction. To that end,

we present novel algorithms to address this

problem. Our algorithms maintain a smaller matrix B, as an

approximation to a distributed streaming matrix A, such that for any

unit vector x:| |A x|^2 - |B x|^2 | <= epsilon |A|_F^2. Our

algorithms work in streaming fashion and incur small communication,

which is critical for distributed computation. Our best method is

deterministic and uses only O((m epsilon) log(beta N)) communication,

where N is the size of stream (at the time of the query) and beta

is an upper-bound on the squared norm of any row of the matrix.

In addition to proving all algorithmic properties

theoretically, extensive experiments with real large datasets

demonstrate the efficiency of these protocols.

Dominik Köppl. Dynamic Skyline Computation with the Skyline Breaker Algorithm

**Abstract: **Given a sequential data input, we tackle parallel dynamic skyline computation of the read data by means of a spatial tree structure for indexing fine-grained feature vectors.

For this purpose, we modified the Skyline Breaker algorithm that solves skyline computation with multiple local split decision trees in a parallel manner.

With this approach, we propose an algorithm for dynamic skyline computation that inherits the robustness against the ``dimension curse'' and different data distributions.

Pooya Davoodi, Jeremy Fineman, John Iacono and Ozgur Ozkan. Cache-Oblivious Persistence

**Abstract: **Partial persistence is a general transformation that takes a data structure and allows queries to

be executed on any past state of the structure.

The cache-oblivious model is the leading model of a modern multi-level memory hierarchy.

We present the first general transformation for making cache-oblivious model data structures partially persistent.

Zhewei Wei and Ke Yi. On Range Summary Queries

**Abstract: **Database queries can be broadly classified into two categories: reporting

queries and aggregation queries. The former retrieves a collection of

records from the database that match the query's conditions, while the

latter returns an aggregate, such as count, sum, average, or max (min), of

a particular attribute of these records. Aggregation queries are

especially useful in business intelligence and data analysis applications

where users are interested not in the actual records, but some statistics

of them. They can also be executed much more efficiently than reporting

queries, by embedding properly precomputed aggregates into an index.

However, reporting and aggregation queries provide only two extremes for

exploring the data. Data analysts often need more insight into the data

distribution than what those simple aggregates provide, and yet certainly

do not want the sheer volume of data returned by reporting queries. In

this paper, we design indexing techniques that allow for extracting a

statistical summary of all the records in the query. The summaries we

support include frequent items, quantiles, various sketches, and wavelets,

all of which are of central importance in massive data analysis. Our

indexes require linear space and extract a summary with the optimal or

near-optimal query cost.

Francesco Silvestri. Subgraph Enumeration in Massive Graphs

**Abstract: **We consider the problem of enumerating all instances of a given sample graph in a large data graph. Our focus is on determining the input/output (I/O) complexity of this problem. Let $E$ be the number of edges in the data graph, $k=\BO{1}$ be the number of vertexes in the sample graph, $B$ be the block length, and $M$ be the main memory size. The main result of the paper is a randomized algorithm that enumerates all instances of the sample graph in $\BO{E^{k/2}/\left(BM^{k/2-1}\right)}$ expected I/Os if the maximum vertex degree of the data graph is $\sqrt{EM}$. Under some assumptions, the same bound also applies with high probability.

Our algorithm is I/O optimal, in the worst-case, when the sample graph belongs to the Alon class, which includes cliques, cycles and every graph with a perfect matching: indeed, we show that any algorithm enumerating $T$ instances must always use $\BOM{T/\left(BM^{k/2-1}\right)}$ I/Os and there are graphs for which $T=\BOM{E^{k/2}}$. Finally, we propose a parallelization of the randomized algorithm in the MapReduce framework.

Peyman Afshani and Nodari Sitchinava. I/O-efficient Range Minima Queries

**Abstract: **In this paper we study the offline (batched) range minima query (RMQ) problem

in the external memory (EM) and cache-oblivious (CO) models. In the

static RMQ problem, given an array "A", a query "rmq_A(i,j)" returns

the smallest element in the range "A[i,j]".

If "B" is the size of the block and "m" is the number of blocks that fit in the

internal memory in the EM and CO models, we show that "Q" range minima queries

on an array of size "N" can be answered in "O(scan{N} + Q/B log_m min{Q/B,N/B}" I/Os

in both models using linear space. Our cache-oblivious result is new and our external memory

result is an improvement of the previously known bound.

We also show that these bounds is tight by proving a matching lower bound.

Our lower bound holds even if the queries are presorted in any predefined order.

In the batched dynamic RMQ problem, the queries must be answered in the

presence of the updates (insertions/deletions) to the array. We show that in

the EM model we can solve this problem in "O(sort{N} + sort{Q}log_m N/B)" I/Os,

again improving the best previously known bound.

Erika Duriakova, Neil Hurley, Deepak Ajwani and Alessandra Sala. Analysis of the Semi-synchronous approach to Large-scale Parallel Community Finding

**Abstract: **Community-finding in graphs is the process of identifying highly

cohesive vertex subsets. Recently the vertex-centric approach

has been found effective for scalable graph processing and is

implemented in systems such as GraphLab and Pregel. In the

vertex-centric approach, the analysis is decomposed into a set

of local computations at each vertex of the graph, with results

propagated to neighbours along the vertex’s edges. Many community

finding algorithms are amenable to this approach as they

are based on the optimisation of an objective through a process

of iterative local update (ILU), in which vertices are successively

moved to the community of one of their neighbours in

order to achieve the highest local gain in the quality of the objective.

The sequential processing of such iterative algorithms

generally benefits from an asynchronous approach, where a vertex

update uses the most recent state as generated by the previous

update of vertices in its neighbourhood. When vertices

are distributed over a parallel machine, the asynchronous approach

can encounter race conditions that impact on its performance

and destroy the consistency of the results. Alternatively,

a semi-synchronous approach ensures that only non-conflicting

vertices are updated simultaneously. In this paper we study

the semi-synchronous approach to ILU algorithms for community

finding on social networks. Because of the heavy-tailed

vertex distribution, the order in which vertex updates are applied

in asynchronous ILU can greatly impact on both convergence

time and quality of the found communities. We study

the impact of ordering on the distributed label propagation and

modularity maximisation algorithms implemented on a sharedmemory

multicore architecture. We demonstrate that the semisynchronous

ILU approach is competitive in time and quality

with the asynchronous approach, while allowing the analyst to

maintain consistent control over update ordering. Thus, our

implementation results in a more robust and predictable performance

and provides control over the order in which the node labels

are updated, which is crucial to obtaining the correct tradeoff

between running time and quality of communities on many

graph classes.

Zengfeng Huang, Bozidar Radunovic, Milan Vojnovic and Qin Zhang. Communication Complexity of Approximate Maximum Matching in Distributed Graph Data

**Abstract: **We consider the problem of computing an approximate maximum matching in a graph that consists of $n$ vertices in a distributed system where edges of the graph are partitioned across $k$ sites. An important problem to scale up large-scale computations is to design efficient algorithms with respect to the communication complexity, which is of primary concern in data centers where the communication bandwidth is a precious resource. Our main result is that any algorithm for finding an alpha-approximate maximum matching has the communication complexity of Omega(alpha^2 k n) bits. We show that this lower bound is tight by showing that a simple sequential algorithm has the communication complexity that is equal to the lower bound up to a logarithmic factor. Our lower bound for approximate maximum matching also implies lower bounds for other important distributed combinatorial optimization problems such as max-flow and graph sparsification. The lower bound is established by a new approach for multi-party randomized communication complexity for graph problems that is of wide applicability and thus of interest in its own right.