incorporate Bloomfield's texts #24

dwhieb · 2021-02-10T00:44:08Z

Add Bloomfield's texts as an additional corpus.

dwhieb · 2021-03-17T15:33:09Z

@katieschmirler informs me these are ready for import into Korp!

aarppe · 2024-07-30T18:57:42Z

@fbanados A Korp version of the Bloomfield texts can be found here: altlab/crk/generated/bloomfield_fst+cg+gloss.vrt

This is created with the following invocation:

cat corpora/bloomfield.korp-vrt | gawk '{ if(match($0, ".+~$")!=0) sub("~$",""); print; }' | bin/fst-cg-analyze-vrt.sh analyser-gt-strict.hfstol /Users/arppe/gt/lang-crk/src/cg3/disambiguator.cg3 analyser-gt-relaxed.hfstol /Users/arppe/gt/lang-crk/src/cg3/functions.cg3 generator-gt-strict.hfstol | bin/vrt2korp.sh > generated/bloomfield_fst+cg+gloss.vrt

This is largely the same as the Ahenakew-Wolfart corpus, except it has only three levels: <corpus>, <subcorpus> (2 values), and <text> (the tens of individual texts). The lang field is defined at the corpus level, whereas in the A-W corpus that is defined for each text, which may need changing.

I probably would eventually want to make more use of the underlying XML sources (e.g. the word-specific as well as sentence-specific translations, which would add fields to the linguistic analyses), but incorporating this could be a good start.

dwhieb added enhancement User-facing features or improvements corpora Changes to the corpora settings and removed enhancement User-facing features or improvements labels Feb 10, 2021

fbanados mentioned this issue Jul 29, 2024

Word attributes missing in .vrt file UAlbertaALTLab/korp-config#1

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incorporate Bloomfield's texts #24

incorporate Bloomfield's texts #24

dwhieb commented Feb 10, 2021

dwhieb commented Mar 17, 2021

aarppe commented Jul 30, 2024 •

edited

Loading

incorporate Bloomfield's texts #24

incorporate Bloomfield's texts #24

Comments

dwhieb commented Feb 10, 2021

dwhieb commented Mar 17, 2021

aarppe commented Jul 30, 2024 • edited Loading

aarppe commented Jul 30, 2024 •

edited

Loading