Problems in Serbian #11

LinguList · 2022-02-15T20:23:56Z

There are some mismappings, as they have like 6 words for DEER in the data. We were informed by somebody who wrote to Joshua Jackson, who then wrote to me:

Cow would be what is translated as Deer. 
Krava = Cow
Vo = Ox
Bik = Bull
Jelen = Deer
Jelena is one of the common names in Serbia (likely related to Helen rather than Jelen, though)
Right beneath is KRV, which is BLOOD and definitely not the meat. MESO is meat. 
Above that KORA = it is a bark, but leather is KOŽA
Am I missing something important? 
Jegulja is an EEL, not a snake, ZMIJA is a snake. 
Konj is a male horse, kobila is a mare. 
Jare is NOT a lamb. Jagnje (janje) is a lamb, jare is a baby goat, not a sheep. 
Jagoda is a strawberry, not a grape. Grožđe is a term for the grapes, grozd is singular.”

I suggest we manually correct these cases via Lexemes. I would also inform the DIACL editors about this.

Or, @chrzyki, @xrotwang, is it possible that the error (something swapped here) is on the side of the pylexibank script?

The text was updated successfully, but these errors were encountered:

LinguList · 2022-02-16T01:43:36Z

BTW: checking with German, we have the same problems for DEER.

https://clics.clld.org/languages/diacl-41700

LinguList · 2022-02-16T01:49:29Z

If one checks diacl, it becomes clear that they have mapped a huge number of partly related terms to one master concept.

https://diacl.ht.lu.se/WordList/Index

This problem is also but less problematically present in the Swadesh collection.

The problem is that DIACL did in some sense some Concepticon mapping, however, one to their internal concept lists, which are often much broader than what we'd do in Concepticon. Since all words in the database have meaning strings, one could circumvent this by making a master list of all meaning glosses we find in the data.

In the current form, however, it is unclear if the data is well aggregated into CLICS.

chrzyki · 2022-02-16T05:07:51Z

Good catch and thanks for relaying the issue. Given the relatively specific relations I would hope that there isn't too much of an effect on CLICS-based analyses (i.e. most of the mappings will be very rare), but I fully agree: In this state it's not something that should be used in CLICS & Co. I think your suggestion (i.e. list of all meaning glosses, map) sounds good!

LinguList · 2022-02-16T10:34:32Z

So for CLICS4, we would either have fixed this issue by doing a re-mapping, or we'd not include it there, since this kind of mapping makes people who know the languages get upset, and we would like to avoid that. DIACL has the meaning glosses, so they use the concepts differently than we do in CLICS, so we do well in only aggregating from DIACL when we know that it corresponds to our models.

FredericBlum · 2024-07-12T08:50:31Z

In addition to the concept problems, we also have no segmentation because there are no orthography profiles. Since it is unlikely that we can do a full remapping until the LB 2.0 release and there are no capacities of student assistants at the moment, I'd vote to retire the dataset from Lexibank. @LinguList @chrzyki Would you agree with this?

FredericBlum · 2024-07-12T08:59:27Z

I didnt realize that diacl was never part of the LB release in the first place. Thanks @chrzyki for clarifying

LinguList · 2024-07-12T09:07:31Z

We skipped it after we found too many problems in CLICS3. They just link any concept to any gloss. So they may end up having a term "butterfly" and link it to "insect" in their internal concepticon (!).

LinguList added the bug Something isn't working label Feb 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems in Serbian #11

Problems in Serbian #11

LinguList commented Feb 15, 2022

LinguList commented Feb 16, 2022

LinguList commented Feb 16, 2022

chrzyki commented Feb 16, 2022

LinguList commented Feb 16, 2022

FredericBlum commented Jul 12, 2024

FredericBlum commented Jul 12, 2024

LinguList commented Jul 12, 2024

Problems in Serbian #11

Problems in Serbian #11

Comments

LinguList commented Feb 15, 2022

LinguList commented Feb 16, 2022

LinguList commented Feb 16, 2022

chrzyki commented Feb 16, 2022

LinguList commented Feb 16, 2022

FredericBlum commented Jul 12, 2024

FredericBlum commented Jul 12, 2024

LinguList commented Jul 12, 2024