Queue-based computing represents a simple paradigm we stumbled upon in the process of work on
voidsearch* real time data analysis engine.
Let the
Q={q_1..q_n} represents a dynamic set of queues
q_i of fixed size
s_i. We define
aggregation A_i as a arbitrary convolution function of queue elements such that
A_i(t)=f(A_i(t-1),q_i), meaning that the update of value of
aggregation for new queue entry can be computed from the previous value of aggregation and value of new entry.
We now define arbitrary analysis task on set of n-field data entries
D={d_1..d_k} as a task of providing proper mapping
M = {dq_1..dq_k} | (dq : d_i -> q_i) and a proper set of aggregation functions A.
Finally,
(Q,D,A,M) represents a complete description of queue computation for given data
D, mapping via set of functions
M to the queues from the set Q, each provided with aggregation function
A. Data
D represents an input to the problem, while values of aggregates represents the
output of the problem.
What makes this paradigm different than similar parallel data processing techniques is the constraint on convolution nature of the aggregate function. This means that by providing
(Q,D,A,M) tuple, as described above, we would be immediately able to track the values of aggregates (updated as new data arrive to the queue) - and even observe interesting patterns like convergence, which can be of particular interest in data analysis tasks. Additionally, this paradigm is especially suited for analysis of continuous flows of data (
data streams), which are not particularly well tackled by the standard batch-processing approach to data analysis.
to be continued...