⬅️ Back |
It must be in JSON format , inside it will have as many keys as there are tables that dataset has.
{
"Customers": {
"MetaData": [
{
"AmountOfRegisters": 15570
}
],
"Data": [
{
"mode": "NULLABLE",
"name": "customer_id",
"type": "STRING",
"ispk": "true",
"format":"??###",
"letters":"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
},
{
"mode": "NULLABLE",
"name": "customer_code",
"type": "STRING",
"faketype": "bothify",
"format":"??###",
"letters":"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
}
]
}
}
Each table key will have inside two keys: MetaData and Data.
Inside MetaData there will be a list and inside it, a dictionary with a unique key called AmountOfRegisters in which the number of records that will be generated in that table will be valued.
- Each table will have its own quantity.
"MetaData": [
{
"AmountOfRegisters": 285
}
]
Inside Data there will be a list of dictionaries, in which each of these dictionaries will have the characteristics of each column of said table
"Data": [
{
"mode": "NULLABLE",
"name": "customer_id",
"type": "STRING",
"ispk": "true",
"format":"??###",
"letters":"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
}
]
Regarding the keys within these dictionaries, we are going to focus the explanation on some of them since they are necessary for the correct functioning of the script.
If the column is Primary Key (PK from now on, to simplify terms.) MUST take the key "ispk" in true, as indicated in the image.
"ispk": "true"
If the column is Foreign Key (FK from now on, to simplify the terms) It must have the key "isfk" in "true" and also it must contain two additional keys indicating the table to the one it refers to as well as the column of the table it refers to. These fields will be:
"sourceTable": The field that refers to the table from which the pk / fk is extracted. "sourceField": The column of the aforementioned table from which this value is extracted.
{
"mode": "NULLABLE",
"name": "customer_id",
"type": "STRING",
"isfk": "true",
"sourceTable": "Customers",
"sourceField": "customer_id"
}
The column must not be PK and FK at the same time, this would generate an error so the script will fail. It is prudent to determine whether the column is PK or FK or neither.
If the column is simple data, it will not have the keys mentioned in the previous points, instead it will have a key called "faketype" where as a value, it will be indicated what type of method of the Faker module should be use to generate the data of the column in question. Some of these Faker methods carry parameters and others do not, let's see some examples to better graph the explanation.
{
"mode": "NULLABLE",
"name": "customer_city",
"type": "STRING",
"faketype": "city",
"parameters":[]
}
In this example, the "faketype" is "country", this means that the script will use the country method from the Faker library which does not receive parameters to generate that data for the column named "customer_city". The field 'parameters' with square brackets will be mandatory to always be present.
{
"mode": "NULLABLE",
"name": "customer_age",
"type": "NUMERIC",
"faketype":"pyint",
"parameters":[
0,
100,
1
]
}
In this example, the "faketype" is "word" and that method needs parameters to work correctly, so the dictionary Needs a key called "parameters" in which its value is a list of 1 to 3 parameters, as needed.
In the particular case of the method "word", it receives as its only parameter a list of words (They can also be phrases, declared as a string, but always within a list.), That is why in the image there is a list within another list.
➡️ Go to Word documentation |
"faketype": "word",
"parameters":[
[
"LOSE",
"WIN",
"TIE"
]
]
That list counts as a single parameter (the list object) of the key "parameters".
In the case of the bothify method, although it receives parameters, the correct way to use it in this script is by declaring two keys that we will detail below:
"format": this key will detail the format of the data generated by bothify, using a '?' for each letter that we want it to generate and a '#' for each number that we want it to generate, allowing the possibility of using the amount and variety that we want. "letters": In this key, in the form of a string, you must put all the letters which the script can use randomly to generate the data.
{
"mode": "NULLABLE",
"name": "internal_code",
"type": "STRING",
"faketype": "bothify",
"format":"??###",
"letters":"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
}
In this example, "format" has the value "?? ###", which corresponds to two letters ("??") and three numbers ("###") and "letters" have as value a string of uppercase letters. This means that bothify will generate a data of two capital letters followed by three numbers, i.e. "FF007".
They can play with the variety and quantity, being able to insert letters and numbers of the key "format", as well as adding more characters to the string of the key "letters"
➡️ Go to Bothify documentation |
city: It will generate data with the name of a city.
state: It will generate data with the name of a state or province.
country: It will generate data with the name of a country.
address: It will generate a data with an address.
random_uppercase_letter: It will generate data with a capital letter.
name: It will generate a name data.
first_name: It will generate a name data.
first_name_female: It will generate a female name data.
first_name_male: It will generate a male name data.
first_name_nonbinary: It will generate a non-binary name data.
language_name: It will generate a language data.
pybool: It will generate a boolean data (True or False).
pyfloat: It will generate a decimal value, in the value of your key "parameters", you must put 3 parameters, i.e. [4,2, true], the first field is for numbers on the left side of the comma (integer part), the second field is for numbers on the right side of the comma (decimal part) and the third field is to indicate if It is a positive (true) or negative (false) number.
pyint: It will generate an integer value, in the value of your key "parameters", you must put 3 parameters, i.e. [0,1000,1]. The first field is to indicate the minimum number, the second field is for the maximum number and the third field is to indicate the jumps (1 in 1, 2 in 2, etc).
For more information, you can visit the official website of Faker
⬅️ Back | ➡️ Go to Faker WebPage |