What are general optimization techniques for the efficient computation of data cubes?

The following are general optimization techniques for the efficient computation of data cubes.

Technique One (sorting, hashing, and grouping)β€”

Sorting, hashing, and grouping operations should be applied to the dimension attribute in order to reorder and cluster related tuples. In cube computation, aggregation is performed on the tuples that share the same set of dimension values.

Thus it is important to explore sorting, hashing, and grouping operations to access and group such data together to facilitate computation of such aggregates.

For example, to compute total sales by branch, day, and item, it is more efficient to sort tables or cells by branch, and then by day, and then group them according to the item name. Efficient implementations of such operations in large data sets have been extensively studied in the. database research community. Such implementations can be extended to data cube computation.

Technique Two (simultaneous aggregation and caching intermediate results) β€”

In cube computation, it is efficient to compute higher-level aggregates from previously computed lower-level aggregates rather than from the base fact table. Moreover, simultaneous aggregation from cached intermediate computation results may lead to the reduction of expansive disk I/O operation.

For example, to compute sales by branch, we can use the intermediate results derived from the computation of a lower-level cuboid, such as sales by branch and day. This technique can be further extended to perform amortized scans.

Technique Third (aggregation from the smallest child, when there exist multiple child cuboids) β€”

When there exist multiple child cuboids, it is usually more efficient to compute the desired parent cuboid from the smallest, previously computed child cuboid.

For example, to compute a sates cuboid, C- Branch, when there exist two previously computed cuboids, C{branch, year}, and C{branch, item), it is obviously more efficient to compute C- Branch, from the former than from the latter if there are many more distinct items than distinct years.

Technique Four (the Apriori pruning method) β€”

The Apriori property, in_ the context of data cubes, states as β€” “If a given cell does not satisfy minimum support, then no descendant (i.e., more specialized or detailed version) of the cell will satisfy minimum support either.” This property can be used to substantially reduce the computation of iceberg cubes.

Leave a Reply

%d bloggers like this: