Data Cube Basics PivotData C# examples
PivotData library implements in-memory multidimensional data structure
(OLAP cube) for fast data aggregation and grouping in C# code.
You may thought about it as lightweight embedded OLAP engine that doesn't need OLAP server or MDX queries.
SDK is divided into two parts:
- OLAP library
NReco.PivotData.dll - Generic data cube implementation (
PivotData
class), aggregate functions, in-memory OLAP operations (SliceQuery
class). - Toolkit components
NReco.PivotData.Extensions.dll - Advanced components: connectors to SQL/CSV/JSON data sources, pivot table HTML rendering, exports, formatting and calculations (percentage, difference, running total, heatmap), factory components needed for web pivot table builder functionality and many others.
What is a data cube
Formally data cube is a simple relation between complex key (represented by N dimension keys) and the aggregate function result:
This relation can be built as result of grouping tabular data by N columns and applying aggregate function to the rows inside groups:
|
→ |
|
Data cube is configured by the list of dimensions (that correspond to the input dataset columns) and aggregate function(s).
Cubes can answer on the analytical queries like "how many" / "how much"; for example, sample cube from above can answer on questions:
- How many orders in Canada? (2)
- How much orders' amount for Product1? (1060)
OLAP cubes can calculate values for pivot tables (cross-tabs), pivot charts: financial/logistics/sales/marketing reports, cohort analysis reports, KPI BI dashboards etc.
OLAP Library API Overview
Class/Interface | Purpose |
---|---|
PivotData
|
generic IPivotData implementation of in-memory data cube.
Can aggregate data from any IEnumberable source or IDataReader .
|
IAggregatorFactory |
represents summary function. Used as parameter for PivotData . |
SliceQuery |
Performs OLAP operations as transformation of input IPivotData to new PivotData . |
PivotTable |
Pivot table data model (rows x columns) based on underlying IPivotData data cube. |
PivotDataState |
Represents IPivotData instance serializable data, implements fast binary serialization/deserialization. |
How to build a data cube
PivotData
is configured with the list of dimensions and aggregate function:
var cube = new PivotData(new[] {"Product", "Country" }, new CountAggregatorFactory() );
Several measures may be collected in one cube with help of
CompositeAggregatorFactory
:
var cube = new PivotData( new[] {"Product", "Country" }, // list of input columns/fields for dimensions new CompositeAggregatorFactory( new CountAggregatorFactory(), // count is a measure #0 new SumAggregatorFactory("Amount") // sum of amount is a measure #1 ) );
It is possible to perform data aggregation with ProcessData method:
DataTable ordersTable; // this is our sample facts table cube.ProcessData( new DataTableReader(ordersTable) ); // accepts any IDataReader implementation
It processed input data as a stream, and suitable for handling datasets with millions of records.
ProcessData has an overload
that accepts any IEnumerable
facts source; in this case in addition to the collection it needs
special values accessor delegate. You can call ProcessData
several times if needed (this is useful for updating the cube with
new portion of the facts).
NOTE: it is possible to perform data aggregation outside PivotData library (say, with database query in ROLAP way) and use
aggregation results to initialize PivotData
instance: how to load pre-aggregated data.
Accessing cube values
Cube values are represented by IAggregator objects and accessed with indexer by dimension keys:
IAggregator m = cube["Product1", "USA"]; // number of keys should match cube dimensions object mValues = m.Value; // type of m.Value depends on the aggregate function var countValue = m.AsComposite().Aggregators[0].Value; // access composite aggregator value #0 var sumValue = m.AsComposite().Aggregators[1].Value; // access composite aggregator value #1
IAggregator.Value
property returns a value of the aggregator and IAggregator.Count
returns a number of aggregated objects.
Key.Empty is used to roll-up by some dimension(s) and get the sub-total value:
var product1Aggr = cube["Product1", Key.Empty]; var grandTotalAggr = cube[Key.Empty,Key.Empty];
PivotData implements IEnumerable
and all (D,V) pairs may be iterated with a simple foreach:
foreach (var entry in cube) { // entry.Key holds array of dimension keys var measure = entry.Value; // implements IAggregator interface Console.WriteLine("D1={0} D2={1} V={2}", entry.Key[0], entry.Key[1], measure.Value); }
It is possible to get all dimension keys collected by PivotData with GetDimensionKeys method:
var allKeys = cube.GetDimensionKeys(); // returns array of keys for each dimension var productKeys = allKeys[0]; // dimension at index=0 is "Product" var countryKeys = allKeys[1]; // dimension at index=1 is "Country"