-
Notifications
You must be signed in to change notification settings - Fork 4
Merge columns filter
The Merge columns filter on the Recipe page allows you to combine columns from multiple datasets. For example, you can take a 3W report and add population data from a different dataset that uses the same geographical codes.
There are two main concepts for merging:
-
Shared keys: the two datasets must have at least one column in common. The filter will use the value(s) in the column(s) to match rows between the two datasets.
-
Copied values: the filter will add values from the second (merge) dataset into the original, either as new columns added to the right side, or as new values in existing columns.
The same value can be copied into multiple rows in the original dataset, as long as the keys match. For example, a province's population in the merge dataset could be copied into 100 rows in a 3W dataset containing activities for that province.
The merge filter is similar to a SQL "join" command.
URL of merge dataset: (required) the location of the dataset with new values to be added to the original dataset.
Tags to copy: (required) a list of one or more tag patterns for the columns to pull from the merge dataset into the original one. The filter will copy all columns matching the pattern (for example, #population might match columns tagged #population+f, #population+m, #population+total, etc.). If there are no matching columns in the merge dataset, the filter will not add any to the output.
Shared keys: (required) a list of one or more tag patterns for the columns that contain values in common between the two datasets for matching rows (e.g. "meta+id" or "org,sector,date").
Use existing columns where possible: if checked, the HXL Proxy will use an existing column in the original dataset rather than creating a new one, if there is a column with the appropriate hashtag. Otherwise, it will always add new columns to the right side of the dataset. This works only for the first matching column for each of the patterns specified under the tags option.
Overwrite existing values: if checked, and Use existing columns is also checked, the HXL Proxy will replace existing values in the original dataset; otherwise, it will add values only when the field is empty in the original dataset.
Original dataset:
#org | #sector | #adm1+name | #adm1+code | #targeted |
---|---|---|---|---|
UNICEF | Education | Coast | X001 | 5000 |
Save the Children | Education | Plains | X002 | 300 |
IOM | CCCM | Coast | X001 | 1500 |
UNICEF | Protection | Plains | X002 | 8000 |
Merge dataset:
#adm1+code | #adm1+name | #population |
---|---|---|
X001 | Coast District | 200000 |
X002 | Plains District | 450000 |
Merging using "#adm1+code" as the key, and "#population" as the tag to copy (merged values highlighted):
#org | #sector | #adm1+name | #adm1+code | #targeted | #population |
---|---|---|---|---|---|
UNICEF | Education | Coast | X001 | 5000 | 200000 |
Save the Children | Education | Plains | X002 | 300 | 450000 |
IOM | CCCM | Coast | X001 | 1500 | 200000 |
UNICEF | Protection | Plains | X002 | 8000 | 450000 |
Merging in place (use existing columns and replacing existing values) with "#adm1+code" as the shared key, and "#adm1+name" as the tag to copy. Note how the admin1 names get replaced with the standardised versions in the merge dataset (merged values highlighted):
#org | #sector | #adm1+name | #adm1+code | #targeted |
---|---|---|---|---|
UNICEF | Education | Coast District | X001 | 5000 |
Save the Children | Education | Plains District | X002 | 300 |
IOM | CCCM | Coast District | X001 | 1500 |
UNICEF | Protection | Plains District | X002 | 8000 |
Incorporate baseline or contextual data into an existing dataset (as in the first example above). Population is an obvious baseline, but others could include data from needs assessments, inventory numbers from the logistics cluster, health disease case-reporting numbers, etc.
Correct derived values (like the human-readable district name) from a standard dataset, using a code (such as a P-code) as key.
Learn more about the HXL standard at http://hxlstandard.org