-
Notifications
You must be signed in to change notification settings - Fork 4
Filters
David Megginson edited this page Jun 15, 2020
·
18 revisions
On the Recipe page, you can use forms to define a series of filters for transforming HXL-tagged data. Each filter that reads the data from the previous filter (or the original source), changes it in some way, then passes it on to the next filter (or final output). The chain of filters together make up a HXL Proxy data recipe.
(These filters are also available to coders via JSON recipes.)
The following filter types are available for each step (select a filter type for typical use cases):
Filter | Description |
---|---|
Add column filter | Add a new column with a fixed value to the left or right side of the dataset. In JSON recipes, this filter is add_columns (plural) and allows adding multiple columns in a single operation. |
Append datasets filter | Combine multiple source datasets into a single output dataset (even if the columns don't exactly match). |
Append datasets (external list) filter | Combine multiple source datasets into a single output dataset, using an external list of datasets. |
Clean data filter | Perform automated cleanup of dates, numbers, whitespace, and character case. |
Count rows filter | Aggregate data to produce reports and summaries (like in a spreadsheet pivot table). |
Cut columns filter | Remove columns from a dataset. In JSON recipes, this is available as two separate filters: with_columns and without_columns |
Deduplicate rows filter | Remove duplicate rows from a dataset. |
Expand lists filter | Expand in-cell lists by duplicating rows for each value combination. |
Explode data filter | Normalise data by converting "wide" data (e.g. time series) to "narrow" data (similar to the R "reshape" command). |
Fill data filter | Fill empty cells with previous values in a column. |
Implode data filter | Denormalise data by converting "narrow" data to "wide" data. |
JSONPath filter | Extract data from a JSON expression in a cell. |
Merge columns filter | Combine data from multiple datasets (similar to a SQL "join"). |
Rename column filter | Change the hashtags and headers on a dataset column. In JSON recipes, this is available as the rename_columns filter (plural), and allows renaming multiple columns in a single operation. |
Replace data filter | Replace data selectively, using string or regular expression patterns. |
Replace data (mapping table) filter | Replace data selectively, driven by an external data table (useful for larger collections of replacements). |
Select rows filter | Filter rows out of a dataset (e.g. every row with a date before 2015, similar to a SQL "select"). In JSON recipes, this is available as two separate filters: with_rows and without_rows |
Sort rows filter | Sort the rows of a dataset based on one or more columns. |
Learn more about the HXL standard at http://hxlstandard.org