Aggregation is a transformation step in data prep that lets you collapse multiple records into a single grouped record — for example, summing notional across all trades for a given broker and fund. It runs after Map and transform and before the Output step in the pipeline.
To use aggregation, your project must be running in flexible pipeline mode, which also enables data previews at each step and full control over the output schema.
⚠️ Important Enabling flexible pipeline mode is irreversible. Once turned on, the project cannot be reverted to classic mode. Existing snapshots remain accessible and unchanged, but the configuration experience will permanently change.
This article walks through:
- Enabling flexible pipeline mode
- Creating an aggregation rule with grouping and summary calculations
- Adding optional conditions using the natural-language rule builder
- Configuring the output schema
- Submitting data and viewing the aggregated result
Before you start
You need an existing data prep process with at least one input mapped to a unified output schema. The example below uses a process with two inputs — Broker_A.csv and Broker_B.xlsx — both mapped into a unified Trades schema, with a calculated Notional field (Trade Price × Quantity) added during Map and transform.
Setting up aggregation in data prep
1. Enabling flexible pipeline mode
Aggregation is only available in flexible pipeline mode. Turn it on in the project’s general settings.
- Open the process and go to Settings.
- Select General in the left sidebar.
- Scroll to the Flexible pipelines section and click the Flexible pipelines toggle.
- Read the confirmation dialog. It lists the new capabilities you are about to unlock — Data aggregation, Data previews, and Output field management — and warns that the action is irreversible.
- Click Confirm.
Once enabled, two new entries — Aggregation and Output — appear under each data type in the sidebar (Trades and Balances in this example).
2. Navigating to the aggregation step
- In the sidebar, under the relevant data type (for example, Trades), click Aggregation.
The flow in the top right shows the pipeline order: Aggregation is step 3, after Map and transform (step 2) and before Output (step 4).
3. Creating an aggregation rule
An aggregation rule groups records from one or more inputs by a set of fields, and applies one or more summary calculations to other fields.
- Click + New aggregation rule.
- In the Inputs field, select the inputs the rule should apply to — for this example, both Broker_A.csv and Broker_B.xlsx.
- In the Aggregate by field, choose the grouping fields — for example, Broker and Fund Code. All records that share the same combination of these values will be collapsed into a single row.
- Click the operation link (it defaults to sum up) and select an operation. The available operations include sum up, calculate the average of, calculate the weighted average of, count values of, count records, count unique values of, get minimum value of, and others.
- Choose the field to apply the operation to — for example, Notional for the sum.
- To add another calculation, click and at the end of the row and configure a second operation — for example, calculate the average of Trade Price.
ℹ️ Note Each input can belong to exactly one aggregation rule. If you click + New aggregation rule and the Inputs dropdown shows No options, it means every input is already assigned to an existing rule. Either remove an input from the existing rule or delete the new empty rule.
4. Adding an optional condition
By default, an aggregation rule applies to every record from the selected inputs. You can restrict it with a condition: records that match the condition are aggregated, and any records that don’t match are passed through to the output unaggregated.
- Next to Condition (Optional), click Add.
- Type the condition in plain English — for example, Broker is not XYZ. The natural-language rule builder will interpret it.
- Wait for the conversion to finish. The plain-English text is replaced with a structured rule — for example, Broker is not equal to "XYZ".
💡 Tip You can also build the condition manually with the rule editor if you prefer not to use natural language — click the pencil icon to switch into structured editing.
5. Previewing the result
Before saving, you can preview the data the rule will produce.
- At the top of the Aggregation page, click the Display preview toggle.
A panel appears on the right showing the aggregated rows — for example, the sum of Notional per Fund Code for records matching the condition.
ℹ️ Note If the preview panel shows no data immediately after toggling, refresh the page. The preview is computed on demand against a sample of the latest submissions.
6. Configuring the output schema
The Output step controls which fields are included in the final result and how they appear. This determines what flows into a downstream reconciliation, API extracts, or UI extracts.
- In the sidebar, click Output under the same data type.
- Click the Configuration tab on the right edge of the screen to open the field configuration panel
In the configuration panel you can:
- Enable or disable individual fields using the toggle at the start of each row, or use Enable all / Disable all for bulk changes.
- Reorder fields by dragging the handle on the left, or click Reorder to apply a sort.
- Rename a field by editing the text in the field name input — for example, change Aggregation rule - Notional (sum) to Sum of Notional.
- Change the output data type using the dropdown on the right of each row.
- Disable the fields you don’t need in the output. In this example, only the aggregated and grouping fields are kept: Broker, Fund Code, and Sum of Notional.
- Adjust the order so the grouping fields come first.
- Rename Aggregation rule - Notional (sum) to Sum of Notional for a cleaner column name.
ℹ️ Note Fields that are used in an aggregation rule (either for grouping or as a calculation target) display a stack icon next to their toggle. You can still disable them in the output, but they will continue to drive the aggregation behaviour upstream.
Submitting data
Once the configuration is valid, exit settings and submit data to see the aggregation in action.
- Click Exit settings at the top of the sidebar.
- Click Submit data.
- Upload the source files — Broker_A.csv to the Broker A input, Broker_B.xlsx to the Broker B input — and submit.
- Click Create snapshot to trigger the pipeline.
ℹ️ Note In flexible pipeline mode, the pending data view is no longer available. The latest submissions are processed when the next snapshot is created (manually or by a snapshot trigger).
Verifying the result
Open the snapshot to view the aggregated output.
- From the process page, navigate to Snapshots in the left sidebar.
- Click the latest snapshot to open it.
- Under Trades, click Results.
The result table shows one row per group, with the configured fields and the calculated summary values. Each aggregated row is marked with a stack icon next to its row number.
Troubleshooting / FAQ
Why is the aggregation step missing from my sidebar?
Aggregation is only available when flexible pipeline mode is enabled in the project’s general settings. If the step doesn’t appear, open Settings → General and enable the Flexible pipelines toggle. This is a one-way change.
Can I revert to classic mode after enabling flexible pipelines?
No. Enabling flexible pipeline mode is irreversible. Existing snapshots remain accessible, but the project’s configuration experience cannot be returned to classic mode.
Why does the new aggregation rule show “No options” for Inputs?
Each input can belong to exactly one aggregation rule. If you’ve already created a rule that uses every input, there is nothing left to assign. Remove an input from the existing rule first, or delete the empty new rule.
What happens to records that don’t match the rule’s condition?
They pass through to the output unaggregated, alongside the aggregated rows produced by matching records.
Can I have multiple aggregation rules in one data type?
Yes — as long as each input is used by at most one rule. This is useful when different inputs need to be aggregated by different keys or with different calculations.
What does the stack icon next to a field mean in the Output panel?
It marks fields that are used in an aggregation rule — either as a grouping key or as the target of a summary calculation. The icon is a visual reminder that disabling or renaming the field has implications upstream.
Can I still see pending data the way I could in classic mode?
No. In flexible pipeline mode, the pending-data view is replaced by the snapshot model. Submitted records are processed into the next snapshot, which can be triggered manually or by a snapshot trigger.