This is the result of a series of benchmarks on each implementation to validate the assumptions about design decision of each. Bellow is a formatted version of the results with my own conclusions, you can see the raw results at bench.txt file.
To run the benchmarks your self, just do
make bench
You can generate the formatted output as bellow with [scripts/benchtable.go], the command is provided
in the Makefile as well (must run after generating the bench.txt
with previous command):
make doc-bench
In the following subsections it is shown the results of each benchmark. The tables are autogenerated from bench.txt file, and the implementation with better performance is marked with a 🏆 in front of its name. It is recommended that you run the benchamarks for yourself though, since the results may vary, see "Conclusion" bellow each table for the authors opinion.
Just put many values in the map, outside of benchmark, and then iterate through the map to check time taken for full iteration.
Implemenation | Nruns | ns/op | B/op | allocs/op | % perf relative |
---|---|---|---|---|---|
map | 3,936 | 1,399,521 | 0 | 0 | baseline |
Builtin | 303 | 19,250,757 | 201 | 4 | -92.73 % |
Simple | 500 | 12,606,482 | 16 | 1 | -88.90 % |
Linked | 7,786 | 848,627 | 16 | 1 | 64.92 % |
LinkedHash 🏆 | 7,083 | 844,225 | 16 | 1 | 65.78 % |
Sync | 2,134 | 2,827,290 | 48 | 2 | -50.50 % |
Conclusion: omap implementations using linked list (Linked and LinkedHash) are faster to iterate than builtin map, since they have a data struct well design and optimized for that.
Go to BenchmarkIteration
source code.
Raw output:
BenchmarkIteration/map-8 3936 1399521 ns/op 0 B/op 0 allocs/op
BenchmarkIteration/Builtin-8 303 19250757 ns/op 201 B/op 4 allocs/op
BenchmarkIteration/Simple-8 500 12606482 ns/op 16 B/op 1 allocs/op
BenchmarkIteration/Linked-8 7786 848627 ns/op 16 B/op 1 allocs/op
BenchmarkIteration/LinkedHash-8 7083 844225 ns/op 16 B/op 1 allocs/op
BenchmarkIteration/Sync-8 2134 2827290 ns/op 48 B/op 2 allocs/op
Calls json.Marshal to convert a single map of (string, int) to JSON.
Implemenation | Nruns | ns/op | B/op | allocs/op | % perf relative |
---|---|---|---|---|---|
map | 1,164 | 5,449,427 | 1,010,914 | 20,006 | baseline |
Builtin | 962 | 6,347,832 | 1,134,340 | 20,010 | -14.15 % |
Simple | 1,009 | 6,163,324 | 857,160 | 39,762 | -11.58 % |
Linked 🏆 | 1,328 | 4,773,561 | 860,148 | 39,761 | 14.16 % |
LinkedHash | 1,280 | 5,021,098 | 865,523 | 39,762 | 8.53 % |
Sync | 1,123 | 5,801,274 | 990,457 | 39,764 | -6.06 % |
Conclusion: the results here are similar to BenchmarkIteration, since all cases have to iterate while building the final JSON.
Go to BenchmarkMarshalJSON
source code.
Raw output:
BenchmarkMarshalJSON/map-8 1164 5449427 ns/op 1010914 B/op 20006 allocs/op
BenchmarkMarshalJSON/Builtin-8 962 6347832 ns/op 1134340 B/op 20010 allocs/op
BenchmarkMarshalJSON/Simple-8 1009 6163324 ns/op 857160 B/op 39762 allocs/op
BenchmarkMarshalJSON/Linked-8 1328 4773561 ns/op 860148 B/op 39761 allocs/op
BenchmarkMarshalJSON/LinkedHash-8 1280 5021098 ns/op 865523 B/op 39762 allocs/op
BenchmarkMarshalJSON/Sync-8 1123 5801274 ns/op 990457 B/op 39764 allocs/op
Put values into the map with a short key length, pre-generating the keys before the benchmarks, so key size is not accounted in memory.
Implemenation | Nruns | ns/op | B/op | allocs/op | % perf relative |
---|---|---|---|---|---|
map | 219 | 29,021,513 | 8,088,043 | 3,999 | baseline |
Builtin 🏆 | 214 | 27,240,396 | 8,087,120 | 3,997 | 6.54 % |
Simple | 100 | 52,066,508 | 17,028,371 | 4,018 | -44.26 % |
Linked | 132 | 55,644,086 | 12,887,104 | 103,997 | -47.84 % |
LinkedHash | 78 | 65,659,897 | 16,530,445 | 404,046 | -55.80 % |
Sync | 96 | 57,953,905 | 12,888,188 | 104,003 | -49.92 % |
Conclusion: since all implementations of omap have to build a separate data structure on Put, it is expected that they are slower than builtin map, the trade-off seems acceptable if you you need to iterate (or serialize) the map or if have few keys.
Go to BenchmarkShortStrKeysPut
source code.
Raw output:
BenchmarkShortStrKeysPut/map-8 219 29021513 ns/op 8088043 B/op 3999 allocs/op
BenchmarkShortStrKeysPut/Builtin-8 214 27240396 ns/op 8087120 B/op 3997 allocs/op
BenchmarkShortStrKeysPut/Simple-8 100 52066508 ns/op 17028371 B/op 4018 allocs/op
BenchmarkShortStrKeysPut/Linked-8 132 55644086 ns/op 12887104 B/op 103997 allocs/op
BenchmarkShortStrKeysPut/LinkedHash-8 78 65659897 ns/op 16530445 B/op 404046 allocs/op
BenchmarkShortStrKeysPut/Sync-8 96 57953905 ns/op 12888188 B/op 104003 allocs/op
Put large string keys in map of int value, pre-generating the keys before the benchmarks, so key size is not accounted in memory.
Implemenation | Nruns | ns/op | B/op | allocs/op | % perf relative |
---|---|---|---|---|---|
map 🏆 | 6 | 915,548,530 | 8,090,290 | 4,010 | baseline |
Builtin | 6 | 933,213,966 | 8,093,778 | 4,029 | -1.89 % |
Simple | 5 | 1,148,190,670 | 17,030,152 | 4,027 | -20.26 % |
Linked | 5 | 1,033,555,280 | 12,891,305 | 104,017 | -11.42 % |
LinkedHash | 6 | 1,009,584,296 | 16,533,028 | 404,057 | -9.31 % |
Sync | 5 | 1,106,828,341 | 12,884,448 | 103,985 | -17.28 % |
Conclusion: the trade-off here is very similar to BenchmarkShortStrKeysPut, with the advantage that using a large key actually improve the relative performance, compared to short key.
Go to BenchmarkLargeStrKeysPut
source code.
Raw output:
BenchmarkLargeStrKeysPut/map-8 6 915548530 ns/op 8090290 B/op 4010 allocs/op
BenchmarkLargeStrKeysPut/Builtin-8 6 933213966 ns/op 8093778 B/op 4029 allocs/op
BenchmarkLargeStrKeysPut/Simple-8 5 1148190670 ns/op 17030152 B/op 4027 allocs/op
BenchmarkLargeStrKeysPut/Linked-8 5 1033555280 ns/op 12891305 B/op 104017 allocs/op
BenchmarkLargeStrKeysPut/LinkedHash-8 6 1009584296 ns/op 16533028 B/op 404057 allocs/op
BenchmarkLargeStrKeysPut/Sync-8 5 1106828341 ns/op 12884448 B/op 103985 allocs/op
Put large string keys in map of int value, but unlike BenchmarkShortStrKeysPut, this benchmark generates the key inside the benchmark, so both key generation time and key memory is accounted in the result.
Implemenation | Nruns | ns/op | B/op | allocs/op | % perf relative |
---|---|---|---|---|---|
map | 33 | 172,837,803 | 401,134,506 | 40,082 | baseline |
Builtin 🏆 | 30 | 171,893,459 | 401,133,821 | 40,080 | 0.55 % |
Simple | 30 | 205,032,312 | 401,819,511 | 40,101 | -15.70 % |
Linked | 28 | 179,232,175 | 401,614,090 | 50,081 | -3.57 % |
LinkedHash | 28 | 183,706,741 | 401,999,293 | 80,112 | -5.92 % |
Sync | 32 | 181,802,519 | 401,614,492 | 50,084 | -4.93 % |
Conclusion: when the time of large keys generation is accounted in the benchmark, the relative performance loss compared to BenchmarkLargeStrKeysPut is actually better.
Go to BenchmarkLargeStrKeysPutGen
source code.
Raw output:
BenchmarkLargeStrKeysPutGen/map-8 33 172837803 ns/op 401134506 B/op 40082 allocs/op
BenchmarkLargeStrKeysPutGen/Builtin-8 30 171893459 ns/op 401133821 B/op 40080 allocs/op
BenchmarkLargeStrKeysPutGen/Simple-8 30 205032312 ns/op 401819511 B/op 40101 allocs/op
BenchmarkLargeStrKeysPutGen/Linked-8 28 179232175 ns/op 401614090 B/op 50081 allocs/op
BenchmarkLargeStrKeysPutGen/LinkedHash-8 28 183706741 ns/op 401999293 B/op 80112 allocs/op
BenchmarkLargeStrKeysPutGen/Sync-8 32 181802519 ns/op 401614492 B/op 50084 allocs/op
Generate a map of large string keys, same as BenchmarkShortStrKeysPut, and then run the benchmark only to get the values of a random key. All sub-benchmarks use same random seed.
Implemenation | Nruns | ns/op | B/op | allocs/op | % perf relative |
---|---|---|---|---|---|
map | 1,470,026 | 3,609 | 0 | 0 | baseline |
Builtin | 1,769,174 | 3,978 | 0 | 0 | -9.28 % |
Simple | 1,911,117 | 3,221 | 0 | 0 | 12.05 % |
Linked | 1,826,122 | 3,221 | 0 | 0 | 12.05 % |
LinkedHash | 1,000,000 | 5,294 | 16 | 1 | -31.83 % |
Sync 🏆 | 1,747,059 | 3,164 | 0 | 0 | 14.06 % |
Conclusion: except for LinkedHash, the implementations basically map the Get operation to a builtin map, so it is expected that the difference is minor, and due to random factors. LinkedHash is more complex, so it is expected to be slower. All good here.
Go to BenchmarkLargeStrKeysGet
source code.
Raw output:
BenchmarkLargeStrKeysGet/map-8 1470026 3609 ns/op 0 B/op 0 allocs/op
BenchmarkLargeStrKeysGet/Builtin-8 1769174 3978 ns/op 0 B/op 0 allocs/op
BenchmarkLargeStrKeysGet/Simple-8 1911117 3221 ns/op 0 B/op 0 allocs/op
BenchmarkLargeStrKeysGet/Linked-8 1826122 3221 ns/op 0 B/op 0 allocs/op
BenchmarkLargeStrKeysGet/LinkedHash-8 1000000 5294 ns/op 16 B/op 1 allocs/op
BenchmarkLargeStrKeysGet/Sync-8 1747059 3164 ns/op 0 B/op 0 allocs/op
Generate a map of large string keys, same as BenchmarkShortStrKeysPut, and then run the benchmark to iterate over all key/value pairs.
Implemenation | Nruns | ns/op | B/op | allocs/op | % perf relative |
---|---|---|---|---|---|
Builtin | 345 | 17,452,293 | 200 | 4 | baseline |
Simple | 31 | 200,420,724 | 16 | 1 | -91.29 % |
Linked | 17,079 | 332,097 | 16 | 1 | 5155.18 % |
LinkedHash 🏆 | 21,812 | 265,814 | 16 | 1 | 6465.60 % |
Sync | 2,899 | 1,939,563 | 48 | 2 | 799.81 % |
Conclusion: the performance iteration with large keys is even better than short keys.
Go to BenchmarkLargeStrKeysIterate
source code.
Raw output:
BenchmarkLargeStrKeysIterate/Builtin-8 345 17452293 ns/op 200 B/op 4 allocs/op
BenchmarkLargeStrKeysIterate/Simple-8 31 200420724 ns/op 16 B/op 1 allocs/op
BenchmarkLargeStrKeysIterate/Linked-8 17079 332097 ns/op 16 B/op 1 allocs/op
BenchmarkLargeStrKeysIterate/LinkedHash-8 21812 265814 ns/op 16 B/op 1 allocs/op
BenchmarkLargeStrKeysIterate/Sync-8 2899 1939563 ns/op 48 B/op 2 allocs/op
Generate a map of large strings keys and int value, and get all values one by one.
Implemenation | Nruns | ns/op | B/op | allocs/op | % perf relative |
---|---|---|---|---|---|
map 🏆 | 9 | 597,743,002 | 8,089,458 | 4,006 | baseline |
Builtin | 8 | 704,752,021 | 8,082,226 | 3,973 | -15.18 % |
Simple | 7 | 813,976,451 | 17,035,262 | 4,051 | -26.57 % |
Linked | 7 | 817,070,427 | 12,883,924 | 103,981 | -26.84 % |
LinkedHash | 6 | 925,364,101 | 18,131,789 | 504,052 | -35.40 % |
Sync | 7 | 786,297,455 | 12,886,646 | 103,995 | -23.98 % |
Go to BenchmarkLargeStrKeysPutGet
source code.
Raw output:
BenchmarkLargeStrKeysPutGet/map-8 9 597743002 ns/op 8089458 B/op 4006 allocs/op
BenchmarkLargeStrKeysPutGet/Builtin-8 8 704752021 ns/op 8082226 B/op 3973 allocs/op
BenchmarkLargeStrKeysPutGet/Simple-8 7 813976451 ns/op 17035262 B/op 4051 allocs/op
BenchmarkLargeStrKeysPutGet/Linked-8 7 817070427 ns/op 12883924 B/op 103981 allocs/op
BenchmarkLargeStrKeysPutGet/LinkedHash-8 6 925364101 ns/op 18131789 B/op 504052 allocs/op
BenchmarkLargeStrKeysPutGet/Sync-8 7 786297455 ns/op 12886646 B/op 103995 allocs/op
Generate a map of large strings keys and int value, and get all values one by one.
Implemenation | Nruns | ns/op | B/op | allocs/op | % perf relative |
---|---|---|---|---|---|
map 🏆 | 36 | 157,263,414 | 410,275,740 | 10,211 | baseline |
Builtin | 24 | 241,444,763 | 410,358,049 | 10,217 | -34.87 % |
Simple | 9 | 612,520,670 | 1,935,593,816 | 10,237 | -74.33 % |
Linked | 16 | 340,107,150 | 819,876,043 | 20,215 | -53.76 % |
LinkedHash | 28 | 198,635,101 | 820,759,129 | 40,308 | -20.83 % |
Sync | 14 | 381,237,194 | 819,876,630 | 20,221 | -58.75 % |
Conclusion: this test is designed specifically for LinkedHash implementation, and is actually the only use-case where this implementation is a good fit, and as expected it is the fastest of omap implementations, although still slower than builtin map. Albeit, it seems a very specific and unusual use case.
Go to BenchmarkLargeObjectKey
source code.
Raw output:
BenchmarkLargeObjectKey/map-8 36 157263414 ns/op 410275740 B/op 10211 allocs/op
BenchmarkLargeObjectKey/Builtin-8 24 241444763 ns/op 410358049 B/op 10217 allocs/op
BenchmarkLargeObjectKey/Simple-8 9 612520670 ns/op 1935593816 B/op 10237 allocs/op
BenchmarkLargeObjectKey/Linked-8 16 340107150 ns/op 819876043 B/op 20215 allocs/op
BenchmarkLargeObjectKey/LinkedHash-8 28 198635101 ns/op 820759129 B/op 40308 allocs/op
BenchmarkLargeObjectKey/Sync-8 14 381237194 ns/op 819876630 B/op 20221 allocs/op