Skip to content
This repository has been archived by the owner on Jan 9, 2019. It is now read-only.

Latest commit

 

History

History
152 lines (133 loc) · 2.49 KB

NOTES.md

File metadata and controls

152 lines (133 loc) · 2.49 KB

I just copied this from my JSdict NOTES.md file, since it uses the same parser.

Characters (<character></character>) are generic JSON objects that populate the dictionary array.

<entry>
<ent_seq>1000000</ent_seq>
<r_ele>
<reb>ヽ</reb>
</r_ele>
<r_ele>
<reb>くりかえし</reb>
</r_ele>
<sense>
<pos>&n;</pos>
<gloss>repetition mark in katakana</gloss>
</sense>
</entry>

becomes...

  {
   "ent_seq": "1000000",
   "r_ele": [
    {
     "reb": ""
    },
    {
     "reb": "くりかえし"
    }
   ],
   "sense": {
    "pos": "&n;",
    "gloss": "repetition mark in katakana"
   }
  }

Dictionary array is kept in "dictionary" field of root JSON file.

{
 "dictionary": [
  [...]
 ]
}

Children tags are converted into named members in the character object.

<entry>
<ent_seq>1000000</ent_seq>

becomes...

{
 "ent_seq": "1000000",

If children tags contain children tags, the value is converted to an object with its own children tags (just like entry objects).

<entry>
[...]
<sense>
<pos>&n;</pos>
<gloss>repetition mark in katakana</gloss>
</sense>

becomes...

{
 [...]
   "sense": {
    "pos": "&n;",
    "gloss": "repetition mark in katakana"
   }

If duplicate children tags are present, the object's member is converted to an array.

<entry>
[...]
<r_ele>
<reb>ヽ</reb>
</r_ele>
<r_ele>
<reb>くりかえし</reb>
</r_ele>
[...]
</entry>

becomes...

{
 [...]
 "r_ele": [
  {
   "reb": ""
  },
  {
   "reb": "くりかえし"
  }
 ],
 [...]
}

Self-closing tags are added as an empty member (which is represented as a blank string).

<r_ele>
<reb>あかん</reb>
</r_ele>
<r_ele>
<reb>アカン</reb>
<re_nokanji/>
</r_ele>

becomes

   "r_ele": [
    {
     "reb": "あかん"
    },
    {
     "reb": "アカン",
     "re_nokanji": ""
    }
   ],

Tags with attributes are weird.

Because there's really no 1:1 way to represent a tag with attributes and a body in JSON, I've opted to represent all tags with attributes as full blown JSON objects no matter what the body of the tag is. Each attribute will be a member of the object. If there is any body to the tag, it will be added to the object as its "body" member. If the body contains further children tags, it will be fully parsed (though this is irrelevant to this project).

<lsource xml:lang="por">espada</lsource>

becomes...

     "lsource": {
      "lang": "por",
      "body": "espada"
     },