EMLdataset2SQL #1

gastil · 2019-10-22T19:29:37Z

I suggest adding another tool, EMLdataset2SQL, that would compose the SQL to load data from an EML dataset into a relational database. Generically any SQL database, but postgres in particular.

I suggest these DDL items:

CREATE TABLE for each entity
attributeName for column names
data type from storageType if present or MeasurementScaleDomain
enumerations could go into a CHECK constraint (just codes) or even a parent table.

And this DML item:

bulk upload with COPY for each table

The DML can write a CREATE TABLE from EML and this may be used for the Quality Engine's data load check. However, the column definitions could be a lot tighter data type. The entityName could be made into an appropriate table name (by substitution of any non alphanumeric character to underscore).

The EML <constraint> element is not a common-pattern-of-usage for LTER datasets. However, some EML datasets do include CONSTRAINTs, for example

knb-lter-mcr.10
knb-lter-mcr.1034
knb-lter-mcr.1037
knb-lter-mcr.1038
knb-lter-mcr.1039
knb-lter-mcr.12
knb-lter-mcr.13
knb-lter-mcr.19
knb-lter-mcr.2
knb-lter-mcr.2002
knb-lter-mcr.2003
knb-lter-mcr.2004
knb-lter-mcr.2006
knb-lter-mcr.21
knb-lter-mcr.3
knb-lter-mcr.4
knb-lter-mcr.4001
knb-lter-mcr.4003
knb-lter-mcr.4005
knb-lter-mcr.5003
knb-lter-mcr.5004
knb-lter-mcr.5005
knb-lter-mcr.6
knb-lter-mcr.6001
knb-lter-mcr.7
knb-lter-mcr.8

When present, <constraint> EML could be used to create primary keys or even foreign keys between tables.

Bulk data load with COPY is much faster than an INSERT statement for each row of data, as the Quality Engine does. There, the INSERT has the purpose of checking each row. Bulk COPY is all or nothing.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EMLdataset2SQL #1

EMLdataset2SQL #1

gastil commented Oct 22, 2019

EMLdataset2SQL #1

EMLdataset2SQL #1

Comments

gastil commented Oct 22, 2019