The main architectural differences are the following:
- structure
- the document in JSON is a single value that can be an atom or a
data structure; data structures can be heterogenous lists or
mappings from names to values (and complex structures use nested
mappings and arrays)
- BSON forces the top-level value to be a mapping
- the document in MsgPack and CBOR is a length-carrying data structre
- a BULK stream is zero, one or several values (and complex structures use nested lists)
- the document in JSON is a single value that can be an atom or a
data structure; data structures can be heterogenous lists or
mappings from names to values (and complex structures use nested
mappings and arrays)
- identifiers
- JSON uses text names, effectively creating a tree of directories
with named entries
- true for mappings in BSON and MsgPack
- CBOR and BULK don’t mandate the use of text names
- a stream could be made almost entirely of data, except for a few marker bytes for nesting and primitive types, when a prior agreement on a schema exists
- names in BULK are usually 136bits collision-free words, represented in the stream as 16bits markers
- JSON uses text names, effectively creating a tree of directories
with named entries
Now, why don’t JSON users already use number as identifiers for compactness? I see two reasons.
First, it’d make JSON data far less readable, although you could easily use a schema to transform any number-id JSON document to a text-id equivalent. But an error in the use of the schema in the document makes it impossible to render correctly (this is related to the next issue).
Second and probably more important: naming collisions. You cannot store number-id JSON documents and keep the ability to insert a document within another, because their ids are likely to clash, thus yielding an ambiguous sturcture. Naming collisions are also possible with text, but they are far less likely and immensely easier to debug.
It is important to note that naming collisions are possible and are bound to happen if you try to use JSON to store arbitatray complex aggregated data. But it is neither a capability nor an intended use case of JSON. JSON itself cannot store arbitrary binary data without some encoding like base64. BSON is actually even limited in the size of documents (to 4Gb).
Contrary to JSON, BSON, MsgPack or CBOR, a BULK stream doesn’t need a schema to be transformed to a human-readable structure, because number-ids are unique (you just need a mapping between those numbers and readable names, but that mapping is of no concern to applications using the data for anything else than presenting the wire format to humans). So you actually have both compactness and the potential readability of JSON.
JSON | BULK | CBOR | MsgPack | BSON | |
---|---|---|---|---|---|
decentralized | no | yes | no | no | no |
streamable | no | yes | yes* | no | no |
file size limit | no | no | no | no | 4G bytes |
container size limit | no | no | 4G items | 4G items | 4G bytes |
smallest padding | 1 byte | 1 byte | 1 byte | 1 byte | 2 bytes |
most probable padding | 0x20 | 0x00 | 0x00? | 0x00? | ? |
future extension slots | n/a | unlimited | 254* | 128 | 235 |
- streamable means being able to be sent before all data is known
and that it can be parsed before the end of stream is reached,
without inspecting partially sent structures
- CBOR was not initially streamable, but RFC 8742 adds a new parsing rule that makes it streamable
- CBOR has actually 1.8e19 names avalaible for extension, but only if the size of wire data is allowed to get bigger. Some names will take 3, 5 or 9 bytes and the first extensions will get the shorter names.
- MsgPack and CBOR may use 0x00 as padding, but it represents the zero integer
Let’s compare a few simple examples. The data in the table is the size of the example in bytes.
format | 1 | 2 | 3 |
---|---|---|---|
BULK | 58 | 26 | 16n+8 |
CBOR | 100 | 49 | 27n+18 |
MsgPack | 106 | 49 | 33n+19 |
Naive BULK | 123 | 52 | 36n+21 |
JSON | 145 | 60 | 39n+24 |
BSON | 178 | 65 | 47n+27 |
- 1
{"status":"ok","data":[{"foo":"bar","baz":"quux","value":12.5},{"foo":"bar","baz":"quuux","value":0},{"foo":"fubar","baz":"quuuux","value":-40}]}
- 2
{"status":"error","code":735,"message":"delegation expired"}
- 3
{"status":"ok","data":[{"foo":"bar","baz":"quux","value":12.5} repeated n times]}
- 1
( example:ok ( ( example:foobaz/f "bar" "quux" #[4] 0x41480000 ) ( example:foobaz/i "bar" "quuux" 0 ) ( example:foobaz/i "fubar" "quuuux" #[1] 0xD8 ) ) )
- 2
( example:error 735 "delegation expired" )
- 3
( example:ok ( bulk:prefix example:foobaz/f "bar" "quux" #[4] 0x41480000 example:foobaz/f "bar" "quux" #[4] 0x41480000 … ) )
These examples are intended to be used with the following profile (this is not a schema, as it is not needed to parse the data):
( bulk:ns 0x1800 #[16] 0x4C89D12F-1D71-4132-92E9-6335797659EE )
( bulk:ns-mnemonic 0x1800 "example" )
( bulk:mnemonic/def 0x1800 "ok" "successful message containing data list" )
( bulk:mnemonic/def 0x1801 "error" "error message with code and message" )
( bulk:mnemonic/dev 0x1802 "foobaz" "foobaz data item" )
( bulk:mnemonic/dev 0x1803 "foobaz/i" "foobaz data item with int" ( bulk:subst ( example:foobaz ( bulk:arg 0 ) ( bulk:arg 1 ) ( bulk:signed-int ( bulk:arg 2 ) ) ) ) )
( bulk:mnemonic/dev 0x1804 "foobaz/f" "foobaz data item with float" ( bulk:subst ( example:foobaz ( bulk:arg 0 ) ( bulk:arg 1 ) ( bulk:binfloat ( bulk:arg 2 ) ) ) ) )
( bulk:arity 3 example:foobaz example:foobaz/i example:foobaz/f )
If the message format must be flexible to removing and adding fields, this example could be used with the following profile where the forms are evaluated into JSON-like maps:
( bulk:ns 0x1800 #[16] 0x4C89D12F-1D71-4132-92E9-6335797659EE )
( bulk:ns-mnemonic 0x1800 "example" )
( bulk:mnemonic/def 0x1800 "ok" "successful message containing data list" ( bulk:subst ( example:map "status" "ok" "data" ( bulk:arg 0 ) ) ) )
( bulk:mnemonic/def 0x1801 "error" "error message with code and message" ( bulk:subst ( example:map "status" "error" "code" ( bulk:arg 0 ) "message" ( bulk:arg 1 ) ) ) )
( bulk:mnemonic/dev 0x1802 "foobaz" "foobaz data item" ( bulk:subst ( example:map "foo" ( bulk:arg 0 ) "baz" ( bulk:arg 1 ) "value" ( bulk:arg 2 ) ) ) )
( bulk:mnemonic/dev 0x1803 "foobaz/i" "foobaz data item with int" ( bulk:subst ( example:foobaz ( bulk:arg 0 ) ( bulk:arg 1 ) ( bulk:signed-int ( bulk:arg 2 ) ) ) ) )
( bulk:mnemonic/dev 0x1804 "foobaz/f" "foobaz data item with float" ( bulk:subst ( example:foobaz ( bulk:arg 0 ) ( bulk:arg 1 ) ( bulk:binary-float ( bulk:arg 2 ) ) ) ) )
( bulk:mnemonic/dev 0x1805 "map" "like JSON objects" )
( bulk:arity 3 example:foobaz example:foobaz/i example:foobaz/f )
This makes no use of BULK’s ability to abstract common patterns, just encodes JSON maps and arrays as-is.
- 1
( example:map "status" "ok" "data" ( ( example:map "foo" "bar" "baz" "quux" "value" ( bulk:binary-float #[4] 0x41480000 ) ) ( example:map "foo" "bar" "baz" "quuux" "value" 0 ) ( example:map "foo" "fubar" "baz" "quuuux" "value" -40 ) ) )
- 2
( example:map "status" "error" "code" 735 "message" "delegation expired" )
- 3
( example:map "status" "ok" "data" ( ( example:map "foo" "bar" "baz" "quux" "value" ( bulk:binary-float #[4] 0x41480000 ) ) ( example:map "foo" "bar" "baz" "quux" "value" ( bulk:binary-float #[4] 0x41480000 ) ) … ) )