I just copied this from my JSdict NOTES.md
file, since it uses the same parser.
<entry>
<ent_seq>1000000</ent_seq>
<r_ele>
<reb>ヽ</reb>
</r_ele>
<r_ele>
<reb>くりかえし</reb>
</r_ele>
<sense>
<pos>&n;</pos>
<gloss>repetition mark in katakana</gloss>
</sense>
</entry>
becomes...
{
"ent_seq": "1000000",
"r_ele": [
{
"reb": "ヽ"
},
{
"reb": "くりかえし"
}
],
"sense": {
"pos": "&n;",
"gloss": "repetition mark in katakana"
}
}
{
"dictionary": [
[...]
]
}
<entry>
<ent_seq>1000000</ent_seq>
becomes...
{
"ent_seq": "1000000",
If children tags contain children tags, the value is converted to an object with its own children tags (just like entry objects).
<entry>
[...]
<sense>
<pos>&n;</pos>
<gloss>repetition mark in katakana</gloss>
</sense>
becomes...
{
[...]
"sense": {
"pos": "&n;",
"gloss": "repetition mark in katakana"
}
<entry>
[...]
<r_ele>
<reb>ヽ</reb>
</r_ele>
<r_ele>
<reb>くりかえし</reb>
</r_ele>
[...]
</entry>
becomes...
{
[...]
"r_ele": [
{
"reb": "ヽ"
},
{
"reb": "くりかえし"
}
],
[...]
}
<r_ele>
<reb>あかん</reb>
</r_ele>
<r_ele>
<reb>アカン</reb>
<re_nokanji/>
</r_ele>
becomes
"r_ele": [
{
"reb": "あかん"
},
{
"reb": "アカン",
"re_nokanji": ""
}
],
Because there's really no 1:1 way to represent a tag with attributes and a body in JSON, I've opted to represent all tags with attributes as full blown JSON objects no matter what the body of the tag is. Each attribute will be a member of the object. If there is any body to the tag, it will be added to the object as its "body" member. If the body contains further children tags, it will be fully parsed (though this is irrelevant to this project).
<lsource xml:lang="por">espada</lsource>
becomes...
"lsource": {
"lang": "por",
"body": "espada"
},