Comparison of JSON, BULK, CBOR, MsgPack and BSON

Architecture

The main architectural differences are the following:

structure
- the document in JSON is a single value that can be an atom or a data structure; data structures can be heterogenous lists or mappings from names to values (and complex structures use nested mappings and arrays)
  - BSON forces the top-level value to be a mapping
- the document in MsgPack and CBOR is a length-carrying data structre
- a BULK stream is zero, one or several values (and complex structures use nested lists)
identifiers
- JSON uses text names, effectively creating a tree of directories with named entries
  - true for mappings in BSON and MsgPack
- CBOR and BULK don’t mandate the use of text names
  - a stream could be made almost entirely of data, except for a few marker bytes for nesting and primitive types, when a prior agreement on a schema exists
  - names in BULK are usually 136bits collision-free words, represented in the stream as 16bits markers

Now, why don’t JSON users already use number as identifiers for compactness? I see two reasons.

First, it’d make JSON data far less readable, although you could easily use a schema to transform any number-id JSON document to a text-id equivalent. But an error in the use of the schema in the document makes it impossible to render correctly (this is related to the next issue).

Second and probably more important: naming collisions. You cannot store number-id JSON documents and keep the ability to insert a document within another, because their ids are likely to clash, thus yielding an ambiguous sturcture. Naming collisions are also possible with text, but they are far less likely and immensely easier to debug.

It is important to note that naming collisions are possible and are bound to happen if you try to use JSON to store arbitatray complex aggregated data. But it is neither a capability nor an intended use case of JSON. JSON itself cannot store arbitrary binary data without some encoding like base64. BSON is actually even limited in the size of documents (to 4Gb).

Contrary to JSON, BSON, MsgPack or CBOR, a BULK stream doesn’t need a schema to be transformed to a human-readable structure, because number-ids are unique (you just need a mapping between those numbers and readable names, but that mapping is of no concern to applications using the data for anything else than presenting the wire format to humans). So you actually have both compactness and the potential readability of JSON.

Feature grid

	JSON	BULK	CBOR	MsgPack	BSON
decentralized	no	yes	no	no	no
streamable	no	yes	yes*	no	no
file size limit	no	no	no	no	4G bytes
container size limit	no	no	4G items	4G items	4G bytes
smallest padding	1 byte	1 byte	1 byte	1 byte	2 bytes
most probable padding	0x20	0x00	0x00?	0x00?	?
future extension slots	n/a	unlimited	254*	128	235

streamable means being able to be sent before all data is known and that it can be parsed before the end of stream is reached, without inspecting partially sent structures
- CBOR was not initially streamable, but RFC 8742 adds a new parsing rule that makes it streamable
CBOR has actually 1.8e19 names avalaible for extension, but only if the size of wire data is allowed to get bigger. Some names will take 3, 5 or 9 bytes and the first extensions will get the shorter names.
MsgPack and CBOR may use 0x00 as padding, but it represents the zero integer

Wire data

Let’s compare a few simple examples. The data in the table is the size of the example in bytes.

format	1	2	3
BULK	58	26	16n+8
CBOR	100	49	27n+18
MsgPack	106	49	33n+19
Naive BULK	123	52	36n+21
JSON	145	60	39n+24
BSON	178	65	47n+27

JSON

1: {"status":"ok","data":[{"foo":"bar","baz":"quux","value":12.5},{"foo":"bar","baz":"quuux","value":0},{"foo":"fubar","baz":"quuuux","value":-40}]}
2: {"status":"error","code":735,"message":"delegation expired"}
3: {"status":"ok","data":[{"foo":"bar","baz":"quux","value":12.5} repeated n times]}

BULK

1: ( example:ok ( ( example:foobaz/f "bar" "quux" #[4] 0x41480000 ) ( example:foobaz/i "bar" "quuux" 0 ) ( example:foobaz/i "fubar" "quuuux" #[1] 0xD8 ) ) )
2: ( example:error 735 "delegation expired" )
3: ( example:ok ( bulk:prefix example:foobaz/f "bar" "quux" #[4] 0x41480000 example:foobaz/f "bar" "quux" #[4] 0x41480000 … ) )

Profile

These examples are intended to be used with the following profile (this is not a schema, as it is not needed to parse the data):

( bulk:ns 0x1800 #[16] 0x4C89D12F-1D71-4132-92E9-6335797659EE )
( bulk:ns-mnemonic 0x1800 "example" )
( bulk:mnemonic/def 0x1800 "ok" "successful message containing data list" )
( bulk:mnemonic/def 0x1801 "error" "error message with code and message" )
( bulk:mnemonic/dev 0x1802 "foobaz" "foobaz data item" )
( bulk:mnemonic/dev 0x1803 "foobaz/i" "foobaz data item with int" ( bulk:subst ( example:foobaz ( bulk:arg 0 ) ( bulk:arg 1 ) ( bulk:signed-int ( bulk:arg 2 ) ) ) ) )
( bulk:mnemonic/dev 0x1804 "foobaz/f" "foobaz data item with float" ( bulk:subst ( example:foobaz ( bulk:arg 0 ) ( bulk:arg 1 ) ( bulk:binfloat ( bulk:arg 2 ) ) ) ) )
( bulk:arity 3 example:foobaz example:foobaz/i example:foobaz/f )

If the message format must be flexible to removing and adding fields, this example could be used with the following profile where the forms are evaluated into JSON-like maps:

( bulk:ns 0x1800 #[16] 0x4C89D12F-1D71-4132-92E9-6335797659EE )
( bulk:ns-mnemonic 0x1800 "example" )
( bulk:mnemonic/def 0x1800 "ok" "successful message containing data list" ( bulk:subst ( example:map "status" "ok" "data" ( bulk:arg 0 ) ) ) )
( bulk:mnemonic/def 0x1801 "error" "error message with code and message" ( bulk:subst ( example:map "status" "error" "code" ( bulk:arg 0 ) "message" ( bulk:arg 1 ) ) ) )
( bulk:mnemonic/dev 0x1802 "foobaz" "foobaz data item" ( bulk:subst ( example:map "foo" ( bulk:arg 0 ) "baz" ( bulk:arg 1 ) "value" ( bulk:arg 2 ) ) ) )
( bulk:mnemonic/dev 0x1803 "foobaz/i" "foobaz data item with int" ( bulk:subst ( example:foobaz ( bulk:arg 0 ) ( bulk:arg 1 ) ( bulk:signed-int ( bulk:arg 2 ) ) ) ) )
( bulk:mnemonic/dev 0x1804 "foobaz/f" "foobaz data item with float" ( bulk:subst ( example:foobaz ( bulk:arg 0 ) ( bulk:arg 1 ) ( bulk:binary-float ( bulk:arg 2 ) ) ) ) )
( bulk:mnemonic/dev 0x1805 "map" "like JSON objects" )
( bulk:arity 3 example:foobaz example:foobaz/i example:foobaz/f )

Naive BULK

This makes no use of BULK’s ability to abstract common patterns, just encodes JSON maps and arrays as-is.

1: ( example:map "status" "ok" "data" ( ( example:map "foo" "bar" "baz" "quux" "value" ( bulk:binary-float #[4] 0x41480000 ) ) ( example:map "foo" "bar" "baz" "quuux" "value" 0 ) ( example:map "foo" "fubar" "baz" "quuuux" "value" -40 ) ) )
2: ( example:map "status" "error" "code" 735 "message" "delegation expired" )
3: ( example:map "status" "ok" "data" ( ( example:map "foo" "bar" "baz" "quux" "value" ( bulk:binary-float #[4] 0x41480000 ) ) ( example:map "foo" "bar" "baz" "quux" "value" ( bulk:binary-float #[4] 0x41480000 ) ) … ) )

BSON

https://euandreh.github.io/cl-BSON/api.html

MsgPack

https://msgpack.org/

CBOR

https://cbor.me/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

comparison.org

comparison.org

Comparison of JSON, BULK, CBOR, MsgPack and BSON

Architecture

Feature grid

Wire data

JSON

BULK

Profile

Naive BULK

BSON

MsgPack

CBOR

Files

comparison.org

Latest commit

History

comparison.org

File metadata and controls

Comparison of JSON, BULK, CBOR, MsgPack and BSON

Architecture

Feature grid

Wire data

JSON

BULK

Profile

Naive BULK

BSON

MsgPack

CBOR