I just copied this from my JSdict NOTES.md file, since it uses the same parser.

Characters (`<character></character>`) are generic JSON objects that populate the dictionary array.

<entry>
<ent_seq>1000000</ent_seq>
<r_ele>
<reb>ヽ</reb>
</r_ele>
<r_ele>
<reb>くりかえし</reb>
</r_ele>
<sense>
<pos>&n;</pos>
<gloss>repetition mark in katakana</gloss>
</sense>
</entry>

becomes...

  {
   "ent_seq": "1000000",
   "r_ele": [
    {
     "reb": "ヽ"
    },
    {
     "reb": "くりかえし"
    }
   ],
   "sense": {
    "pos": "&n;",
    "gloss": "repetition mark in katakana"
   }
  }

Dictionary array is kept in "dictionary" field of root JSON file.

{
 "dictionary": [
  [...]
 ]
}

Children tags are converted into named members in the character object.

<entry>
<ent_seq>1000000</ent_seq>

becomes...

{
 "ent_seq": "1000000",

If children tags contain children tags, the value is converted to an object with its own children tags (just like entry objects).

<entry>
[...]
<sense>
<pos>&n;</pos>
<gloss>repetition mark in katakana</gloss>
</sense>

becomes...

{
 [...]
   "sense": {
    "pos": "&n;",
    "gloss": "repetition mark in katakana"
   }

If duplicate children tags are present, the object's member is converted to an array.

<entry>
[...]
<r_ele>
<reb>ヽ</reb>
</r_ele>
<r_ele>
<reb>くりかえし</reb>
</r_ele>
[...]
</entry>

becomes...

{
 [...]
 "r_ele": [
  {
   "reb": "ヽ"
  },
  {
   "reb": "くりかえし"
  }
 ],
 [...]
}

Self-closing tags are added as an empty member (which is represented as a blank string).

<r_ele>
<reb>あかん</reb>
</r_ele>
<r_ele>
<reb>アカン</reb>
<re_nokanji/>
</r_ele>

becomes

   "r_ele": [
    {
     "reb": "あかん"
    },
    {
     "reb": "アカン",
     "re_nokanji": ""
    }
   ],

Tags with attributes are weird.

Because there's really no 1:1 way to represent a tag with attributes and a body in JSON, I've opted to represent all tags with attributes as full blown JSON objects no matter what the body of the tag is. Each attribute will be a member of the object. If there is any body to the tag, it will be added to the object as its "body" member. If the body contains further children tags, it will be fully parsed (though this is irrelevant to this project).

<lsource xml:lang="por">espada</lsource>

becomes...

     "lsource": {
      "lang": "por",
      "body": "espada"
     },

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NOTES.md

NOTES.md

Characters (`<character></character>`) are generic JSON objects that populate the dictionary array.

Dictionary array is kept in "dictionary" field of root JSON file.

Children tags are converted into named members in the character object.

If children tags contain children tags, the value is converted to an object with its own children tags (just like entry objects).

If duplicate children tags are present, the object's member is converted to an array.

Self-closing tags are added as an empty member (which is represented as a blank string).

Tags with attributes are weird.

Files

NOTES.md

Latest commit

History

NOTES.md

File metadata and controls

Characters (<character></character>) are generic JSON objects that populate the dictionary array.

Dictionary array is kept in "dictionary" field of root JSON file.

Children tags are converted into named members in the character object.

If children tags contain children tags, the value is converted to an object with its own children tags (just like entry objects).

If duplicate children tags are present, the object's member is converted to an array.

Self-closing tags are added as an empty member (which is represented as a blank string).

Tags with attributes are weird.

Characters (`<character></character>`) are generic JSON objects that populate the dictionary array.