Skip to content

Latest commit

 

History

History
319 lines (240 loc) · 17.2 KB

benchmarks.md

File metadata and controls

319 lines (240 loc) · 17.2 KB

Benchmarks

This is the result of a series of benchmarks on each implementation to validate the assumptions about design decision of each. Bellow is a formatted version of the results with my own conclusions, you can see the raw results at bench.txt file.

To run the benchmarks your self, just do

make bench

You can generate the formatted output as bellow with [scripts/benchtable.go], the command is provided in the Makefile as well (must run after generating the bench.txt with previous command):

make doc-bench

In the following subsections it is shown the results of each benchmark. The tables are autogenerated from bench.txt file, and the implementation with better performance is marked with a 🏆 in front of its name. It is recommended that you run the benchamarks for yourself though, since the results may vary, see "Conclusion" bellow each table for the authors opinion.

Benchmark Iteration

Just put many values in the map, outside of benchmark, and then iterate through the map to check time taken for full iteration.

Implemenation Nruns ns/op B/op allocs/op % perf relative
map 3,936 1,399,521 0 0 baseline
Builtin 303 19,250,757 201 4 -92.73 %
Simple 500 12,606,482 16 1 -88.90 %
Linked 7,786 848,627 16 1 64.92 %
LinkedHash 🏆 7,083 844,225 16 1 65.78 %
Sync 2,134 2,827,290 48 2 -50.50 %

Conclusion: omap implementations using linked list (Linked and LinkedHash) are faster to iterate than builtin map, since they have a data struct well design and optimized for that.

Go to BenchmarkIteration source code.

Raw output:
BenchmarkIteration/map-8         	    3936	   1399521 ns/op	       0 B/op	       0 allocs/op
BenchmarkIteration/Builtin-8     	     303	  19250757 ns/op	     201 B/op	       4 allocs/op
BenchmarkIteration/Simple-8      	     500	  12606482 ns/op	      16 B/op	       1 allocs/op
BenchmarkIteration/Linked-8      	    7786	    848627 ns/op	      16 B/op	       1 allocs/op
BenchmarkIteration/LinkedHash-8  	    7083	    844225 ns/op	      16 B/op	       1 allocs/op
BenchmarkIteration/Sync-8        	    2134	   2827290 ns/op	      48 B/op	       2 allocs/op

Benchmark MarshalJSON

Calls json.Marshal to convert a single map of (string, int) to JSON.

Implemenation Nruns ns/op B/op allocs/op % perf relative
map 1,164 5,449,427 1,010,914 20,006 baseline
Builtin 962 6,347,832 1,134,340 20,010 -14.15 %
Simple 1,009 6,163,324 857,160 39,762 -11.58 %
Linked 🏆 1,328 4,773,561 860,148 39,761 14.16 %
LinkedHash 1,280 5,021,098 865,523 39,762 8.53 %
Sync 1,123 5,801,274 990,457 39,764 -6.06 %

Conclusion: the results here are similar to BenchmarkIteration, since all cases have to iterate while building the final JSON.

Go to BenchmarkMarshalJSON source code.

Raw output:
BenchmarkMarshalJSON/map-8       	    1164	   5449427 ns/op	 1010914 B/op	   20006 allocs/op
BenchmarkMarshalJSON/Builtin-8   	     962	   6347832 ns/op	 1134340 B/op	   20010 allocs/op
BenchmarkMarshalJSON/Simple-8    	    1009	   6163324 ns/op	  857160 B/op	   39762 allocs/op
BenchmarkMarshalJSON/Linked-8    	    1328	   4773561 ns/op	  860148 B/op	   39761 allocs/op
BenchmarkMarshalJSON/LinkedHash-8         	    1280	   5021098 ns/op	  865523 B/op	   39762 allocs/op
BenchmarkMarshalJSON/Sync-8               	    1123	   5801274 ns/op	  990457 B/op	   39764 allocs/op

Benchmark ShortStrKeysPut

Put values into the map with a short key length, pre-generating the keys before the benchmarks, so key size is not accounted in memory.

Implemenation Nruns ns/op B/op allocs/op % perf relative
map 219 29,021,513 8,088,043 3,999 baseline
Builtin 🏆 214 27,240,396 8,087,120 3,997 6.54 %
Simple 100 52,066,508 17,028,371 4,018 -44.26 %
Linked 132 55,644,086 12,887,104 103,997 -47.84 %
LinkedHash 78 65,659,897 16,530,445 404,046 -55.80 %
Sync 96 57,953,905 12,888,188 104,003 -49.92 %

Conclusion: since all implementations of omap have to build a separate data structure on Put, it is expected that they are slower than builtin map, the trade-off seems acceptable if you you need to iterate (or serialize) the map or if have few keys.

Go to BenchmarkShortStrKeysPut source code.

Raw output:
BenchmarkShortStrKeysPut/map-8            	     219	  29021513 ns/op	 8088043 B/op	    3999 allocs/op
BenchmarkShortStrKeysPut/Builtin-8        	     214	  27240396 ns/op	 8087120 B/op	    3997 allocs/op
BenchmarkShortStrKeysPut/Simple-8         	     100	  52066508 ns/op	17028371 B/op	    4018 allocs/op
BenchmarkShortStrKeysPut/Linked-8         	     132	  55644086 ns/op	12887104 B/op	  103997 allocs/op
BenchmarkShortStrKeysPut/LinkedHash-8     	      78	  65659897 ns/op	16530445 B/op	  404046 allocs/op
BenchmarkShortStrKeysPut/Sync-8           	      96	  57953905 ns/op	12888188 B/op	  104003 allocs/op

Benchmark LargeStrKeysPut

Put large string keys in map of int value, pre-generating the keys before the benchmarks, so key size is not accounted in memory.

Implemenation Nruns ns/op B/op allocs/op % perf relative
map 🏆 6 915,548,530 8,090,290 4,010 baseline
Builtin 6 933,213,966 8,093,778 4,029 -1.89 %
Simple 5 1,148,190,670 17,030,152 4,027 -20.26 %
Linked 5 1,033,555,280 12,891,305 104,017 -11.42 %
LinkedHash 6 1,009,584,296 16,533,028 404,057 -9.31 %
Sync 5 1,106,828,341 12,884,448 103,985 -17.28 %

Conclusion: the trade-off here is very similar to BenchmarkShortStrKeysPut, with the advantage that using a large key actually improve the relative performance, compared to short key.

Go to BenchmarkLargeStrKeysPut source code.

Raw output:
BenchmarkLargeStrKeysPut/map-8            	       6	 915548530 ns/op	 8090290 B/op	    4010 allocs/op
BenchmarkLargeStrKeysPut/Builtin-8        	       6	 933213966 ns/op	 8093778 B/op	    4029 allocs/op
BenchmarkLargeStrKeysPut/Simple-8         	       5	1148190670 ns/op	17030152 B/op	    4027 allocs/op
BenchmarkLargeStrKeysPut/Linked-8         	       5	1033555280 ns/op	12891305 B/op	  104017 allocs/op
BenchmarkLargeStrKeysPut/LinkedHash-8     	       6	1009584296 ns/op	16533028 B/op	  404057 allocs/op
BenchmarkLargeStrKeysPut/Sync-8           	       5	1106828341 ns/op	12884448 B/op	  103985 allocs/op

Benchmark LargeStrKeysPutGen

Put large string keys in map of int value, but unlike BenchmarkShortStrKeysPut, this benchmark generates the key inside the benchmark, so both key generation time and key memory is accounted in the result.

Implemenation Nruns ns/op B/op allocs/op % perf relative
map 33 172,837,803 401,134,506 40,082 baseline
Builtin 🏆 30 171,893,459 401,133,821 40,080 0.55 %
Simple 30 205,032,312 401,819,511 40,101 -15.70 %
Linked 28 179,232,175 401,614,090 50,081 -3.57 %
LinkedHash 28 183,706,741 401,999,293 80,112 -5.92 %
Sync 32 181,802,519 401,614,492 50,084 -4.93 %

Conclusion: when the time of large keys generation is accounted in the benchmark, the relative performance loss compared to BenchmarkLargeStrKeysPut is actually better.

Go to BenchmarkLargeStrKeysPutGen source code.

Raw output:
BenchmarkLargeStrKeysPutGen/map-8         	      33	 172837803 ns/op	401134506 B/op	   40082 allocs/op
BenchmarkLargeStrKeysPutGen/Builtin-8     	      30	 171893459 ns/op	401133821 B/op	   40080 allocs/op
BenchmarkLargeStrKeysPutGen/Simple-8      	      30	 205032312 ns/op	401819511 B/op	   40101 allocs/op
BenchmarkLargeStrKeysPutGen/Linked-8      	      28	 179232175 ns/op	401614090 B/op	   50081 allocs/op
BenchmarkLargeStrKeysPutGen/LinkedHash-8  	      28	 183706741 ns/op	401999293 B/op	   80112 allocs/op
BenchmarkLargeStrKeysPutGen/Sync-8        	      32	 181802519 ns/op	401614492 B/op	   50084 allocs/op

Benchmark LargeStrKeysGet

Generate a map of large string keys, same as BenchmarkShortStrKeysPut, and then run the benchmark only to get the values of a random key. All sub-benchmarks use same random seed.

Implemenation Nruns ns/op B/op allocs/op % perf relative
map 1,470,026 3,609 0 0 baseline
Builtin 1,769,174 3,978 0 0 -9.28 %
Simple 1,911,117 3,221 0 0 12.05 %
Linked 1,826,122 3,221 0 0 12.05 %
LinkedHash 1,000,000 5,294 16 1 -31.83 %
Sync 🏆 1,747,059 3,164 0 0 14.06 %

Conclusion: except for LinkedHash, the implementations basically map the Get operation to a builtin map, so it is expected that the difference is minor, and due to random factors. LinkedHash is more complex, so it is expected to be slower. All good here.

Go to BenchmarkLargeStrKeysGet source code.

Raw output:
BenchmarkLargeStrKeysGet/map-8            	 1470026	      3609 ns/op	       0 B/op	       0 allocs/op
BenchmarkLargeStrKeysGet/Builtin-8        	 1769174	      3978 ns/op	       0 B/op	       0 allocs/op
BenchmarkLargeStrKeysGet/Simple-8         	 1911117	      3221 ns/op	       0 B/op	       0 allocs/op
BenchmarkLargeStrKeysGet/Linked-8         	 1826122	      3221 ns/op	       0 B/op	       0 allocs/op
BenchmarkLargeStrKeysGet/LinkedHash-8     	 1000000	      5294 ns/op	      16 B/op	       1 allocs/op
BenchmarkLargeStrKeysGet/Sync-8           	 1747059	      3164 ns/op	       0 B/op	       0 allocs/op

Benchmark LargeStrKeysIterate

Generate a map of large string keys, same as BenchmarkShortStrKeysPut, and then run the benchmark to iterate over all key/value pairs.

Implemenation Nruns ns/op B/op allocs/op % perf relative
Builtin 345 17,452,293 200 4 baseline
Simple 31 200,420,724 16 1 -91.29 %
Linked 17,079 332,097 16 1 5155.18 %
LinkedHash 🏆 21,812 265,814 16 1 6465.60 %
Sync 2,899 1,939,563 48 2 799.81 %

Conclusion: the performance iteration with large keys is even better than short keys.

Go to BenchmarkLargeStrKeysIterate source code.

Raw output:
BenchmarkLargeStrKeysIterate/Builtin-8    	     345	  17452293 ns/op	     200 B/op	       4 allocs/op
BenchmarkLargeStrKeysIterate/Simple-8     	      31	 200420724 ns/op	      16 B/op	       1 allocs/op
BenchmarkLargeStrKeysIterate/Linked-8     	   17079	    332097 ns/op	      16 B/op	       1 allocs/op
BenchmarkLargeStrKeysIterate/LinkedHash-8 	   21812	    265814 ns/op	      16 B/op	       1 allocs/op
BenchmarkLargeStrKeysIterate/Sync-8       	    2899	   1939563 ns/op	      48 B/op	       2 allocs/op

Benchmark LargeStrKeysPutGet

Generate a map of large strings keys and int value, and get all values one by one.

Implemenation Nruns ns/op B/op allocs/op % perf relative
map 🏆 9 597,743,002 8,089,458 4,006 baseline
Builtin 8 704,752,021 8,082,226 3,973 -15.18 %
Simple 7 813,976,451 17,035,262 4,051 -26.57 %
Linked 7 817,070,427 12,883,924 103,981 -26.84 %
LinkedHash 6 925,364,101 18,131,789 504,052 -35.40 %
Sync 7 786,297,455 12,886,646 103,995 -23.98 %

Go to BenchmarkLargeStrKeysPutGet source code.

Raw output:
BenchmarkLargeStrKeysPutGet/map-8         	       9	 597743002 ns/op	 8089458 B/op	    4006 allocs/op
BenchmarkLargeStrKeysPutGet/Builtin-8     	       8	 704752021 ns/op	 8082226 B/op	    3973 allocs/op
BenchmarkLargeStrKeysPutGet/Simple-8      	       7	 813976451 ns/op	17035262 B/op	    4051 allocs/op
BenchmarkLargeStrKeysPutGet/Linked-8      	       7	 817070427 ns/op	12883924 B/op	  103981 allocs/op
BenchmarkLargeStrKeysPutGet/LinkedHash-8  	       6	 925364101 ns/op	18131789 B/op	  504052 allocs/op
BenchmarkLargeStrKeysPutGet/Sync-8        	       7	 786297455 ns/op	12886646 B/op	  103995 allocs/op

Benchmark LargeObjectKey

Generate a map of large strings keys and int value, and get all values one by one.

Implemenation Nruns ns/op B/op allocs/op % perf relative
map 🏆 36 157,263,414 410,275,740 10,211 baseline
Builtin 24 241,444,763 410,358,049 10,217 -34.87 %
Simple 9 612,520,670 1,935,593,816 10,237 -74.33 %
Linked 16 340,107,150 819,876,043 20,215 -53.76 %
LinkedHash 28 198,635,101 820,759,129 40,308 -20.83 %
Sync 14 381,237,194 819,876,630 20,221 -58.75 %

Conclusion: this test is designed specifically for LinkedHash implementation, and is actually the only use-case where this implementation is a good fit, and as expected it is the fastest of omap implementations, although still slower than builtin map. Albeit, it seems a very specific and unusual use case.

Go to BenchmarkLargeObjectKey source code.

Raw output:
BenchmarkLargeObjectKey/map-8             	      36	 157263414 ns/op	410275740 B/op	   10211 allocs/op
BenchmarkLargeObjectKey/Builtin-8         	      24	 241444763 ns/op	410358049 B/op	   10217 allocs/op
BenchmarkLargeObjectKey/Simple-8          	       9	 612520670 ns/op	1935593816 B/op	   10237 allocs/op
BenchmarkLargeObjectKey/Linked-8          	      16	 340107150 ns/op	819876043 B/op	   20215 allocs/op
BenchmarkLargeObjectKey/LinkedHash-8      	      28	 198635101 ns/op	820759129 B/op	   40308 allocs/op
BenchmarkLargeObjectKey/Sync-8            	      14	 381237194 ns/op	819876630 B/op	   20221 allocs/op