Skip to content

Merge columns filter

David Megginson edited this page May 12, 2017 · 6 revisions
Merge columns filter form

The Merge columns filter on the Recipe page allows you to combine columns from multiple datasets. For example, you can take a 3W report and add population data from a different dataset that uses the same geographical codes.

There are two main concepts for merging:

  1. Shared keys: the two datasets must have at least one column in common. The filter will use the value(s) in the column(s) to match rows between the two datasets.

  2. Copied values: the filter will add values from the second (merge) dataset into the original, either as new columns added to the right side, or as new values in existing columns.

The same value can be copied into multiple rows in the original dataset, as long as the keys match. For example, a province's population in the merge dataset could be copied into 100 rows in a 3W dataset containing activities for that province.

The merge filter is similar to a SQL "join" command.

Options

URL of merge dataset: (required) the location of the dataset with new values to be added to the original dataset.

Tags to copy: (required) a list of one or more tag patterns for the columns to pull from the merge dataset into the original one. The filter will copy all columns matching the pattern (for example, #population might match columns tagged #population+f, #population+m, #population+total, etc.). If there are no matching columns in the merge dataset, the filter will not add any to the output.

Shared keys: (required) a list of one or more tag patterns for the columns that contain values in common between the two datasets for matching rows (e.g. "meta+id" or "org,sector,date").

Use existing columns where possible: if checked, the HXL Proxy will use an existing column in the original dataset rather than creating a new one, if there is a column with the appropriate hashtag. Otherwise, it will always add new columns to the right side of the dataset. This works only for the first matching column for each of the patterns specified under the tags option.

Overwrite existing values: if checked, and Use existing columns is also checked, the HXL Proxy will replace existing values in the original dataset; otherwise, it will add values only when the field is empty in the original dataset.

Examples

Input files

Original dataset:

#org #sector #adm1+name #adm1+code #targeted
UNICEF Education Coast X001 5000
Save the Children Education Plains X002 300
IOM CCCM Coast X001 1500
UNICEF Protection Plains X002 8000

Merge dataset:

#adm1+code #adm1+name #population
X001 Coast District 200000
X002 Plains District 450000

Merged datasets

Adding information

Merging using "#adm1+code" as the key, and "#population" as the tag to copy (merged values highlighted):

#org #sector #adm1+name #adm1+code #targeted #population
UNICEF Education Coast X001 5000 200000
Save the Children Education Plains X002 300 450000
IOM CCCM Coast X001 1500 200000
UNICEF Protection Plains X002 8000 450000

Correcting information

Merging in place (use existing columns and replacing existing values) with "#adm1+code" as the shared key, and "#adm1+name" as the tag to copy. Note how the admin1 names get replaced with the standardised versions in the merge dataset (merged values highlighted):

#org #sector #adm1+name #adm1+code #targeted
UNICEF Education Coast District X001 5000
Save the Children Education Plains District X002 300
IOM CCCM Coast District X001 1500
UNICEF Protection Plains District X002 8000

Use cases

Incorporate baseline or contextual data into an existing dataset (as in the first example above). Population is an obvious baseline, but others could include data from needs assessments, inventory numbers from the logistics cluster, health disease case-reporting numbers, etc.

Correct derived values (like the human-readable district name) from a standard dataset, using a code (such as a P-code) as key.

Clone this wiki locally