-
Notifications
You must be signed in to change notification settings - Fork 195
Transforming Data
Karma offers to commands to transform the values in cells to create new columns.
-
PyTransform
allows you to type expressions in Python to define data transformations -
Transform
allows you to define the tranformations by providing examples.
You invoke both commands from the menu of commands available on all columns:
##Transformations Using Python
The screenshot below illustrates the use of the PyTransform
command.
Suppose you want to transform the Measurements
column to extract the measurements of artwork frames. These are the parts of the string following the letter f
(e.g., f 14 1/2 x 32 1/4
).
You select PyTransform
in the menu of the Measurements
column, and Karma shows you the following dialogue where you specify how to perform the transformation:
The main part of the dialogue is the area where you enter your Python code. Here you enter code that specifies how to compute the value of a cell in the new column as a function of the values of the cells in the same row.
Using the getValue
function you can access the value of any cell in a row.
For example getValue("Measurements")
gets the value of the cell in the Measurements
column.
The sample code in the screenshot above shows how to find the index of the string f
and then extract all characters from that index to the end of the string.
You can test your code by clicking on the Preview Results for Top 5 Rows
button.
If your code has errors, you can click on the View Errors
button to see the errors that the Python interpreter generates for the top 5 rows.
You can access the values of cells in all columns by calling getValue
using the name of the column.
For example, you can easily combine the values of two columns, e.g., first
and last
to create a new column name
using the code return getValue("first")+" "+getValue("last")
.
Hierarchical Sources (e.g., XML or JSON): if your source has nested columns, there is a restriction of which columns you can access using the getValue
function
For example, if you invoke PyTransform
on the relatedArtworksTitle
column, you can access the value of the id
column using getValue("id")
because it is in the same nested table as relatedArtworksTitle
.
You can access the value of the nationality
column using getValue("nationality")
because it is in an enclosing table.
Now, if you invoke PyTransform
on the nationality
, you cannot access the value of the id
or relatedArtworksTitle
columns because they are in tables one level of nesting deeper than the nationality
column. You can, of course, access values in columns at the same level of nationality
or above.
getValueFromNestedColumnByIndex
can be used to collect values from a column that is not the sibling of column on which we are running the PyTransform
getValueFromNestedColumnByIndex(<ColumnName>, <Path to nested child>, <Index for combining>)
-
<ColumnName>
: Name of the column from where to start the traversal -
<Path to nested Child>
: Path down from<ColumnName>
from where the values should be collected. -
<Index for combining>
: If you need the function to copy the same value from the column to all rows of the other column, set this as 0 (or the index of the value to copy) If you need the function to combine the values row-by-row, use the in-built function:getRowIndex()
Here are some examples of getValueFromNestedColumnByIndex
function:
- Columns are siblings with the same number in each of the array. In this case we want to iterator over both arrays in the Pytransform.
Click on A/name/values and select ‘PyTransform’. Select “Name of new Column” and enter “combined”. Enter the following as the function: return getValueFromNestedColumnByIndex("A", "note/values", getRowIndex()) + ": " + getValue("values")
- Columns are not direct siblings with different length of arrays (one column has one). In the Pytransform we want to iterate over the large array but keep using the same value from the array of size one.
Click on A/note/values and select ‘PyTransform’. Select “Name of new Column” and enter “combined”. Enter the following as the function: return getValue("values") + ": " + getValueFromNestedColumnByIndex("B", "name/values", 0)
- Columns are not direct siblings with different length of arrays (one column has more than the other). In the Pytransform we want to iterator over both arrays until the smallest array reaches its end and then we can break.
Click on B/name/values and select ‘PyTransform’. Select “Name of new Column” and enter “combined”. Enter the following as the function: return getValueFromNestedColumnByIndex("A", "note/values", getRowIndex()) + ": " + getValue("values")
If you are not familiar with Python, you can learn the basics in a few minutes at http://www.learnpython.org.
##Transformations Using Examples
Karma also provides an experimental feature where you can define transformations by providing examples of how you want to transform the data (use the Transform
command in the column menu).
This capability is useful for small data sets where you can verify that your data was transformed correctly. The feature is experimental and we do not recommend using it in production settings where you are using Karma to generate RDF for large datasets.