-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmain.tex
1575 lines (1041 loc) · 119 KB
/
main.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[11pt]{book}
\usepackage[a4paper, hmargin={4cm, 4cm}]{geometry}
\usepackage{eso-pic} % \AddToShipoutPicture
\usepackage{graphicx} % \includegraphics
\usepackage{minted}
\usepackage{listings} % Source code printer for LaTeX
\usepackage{caption}
\usepackage{parcolumns}
\usepackage{fancyhdr}
\usepackage{titlesec}
\usepackage{multirow}
\usepackage{biblatex}
\usepackage{appendix}
\usepackage{tocbibind}
\usepackage{hyperref}
\hypersetup{
colorlinks,
citecolor=black,
filecolor=black,
linkcolor=black,
urlcolor=black
}
% Here is where you add your own .bib file
\addbibresource{sources.bib}
\pagestyle{fancy}
\fancyhf{}
\fancyhead[RO, LE]{\thepage}
\fancyhead[LO]{\textsl{\leftmark}}
\fancyhead[RE]{\textsl{\rightmark}}
\renewcommand{\sectionmark}[1]{%
\markright{\thesection \ #1}{}}
\renewcommand{\chaptermark}[1]{%
\markboth{\thechapter \ #1}{}}
\titleformat{\chapter}[display]
{\normalsize \huge \color{black}}%
{\flushleft\normalsize %
\MakeUppercase{\fontfamily{}\fontsize{24}{24}\selectfont\chaptertitlename}\hspace{2ex}%
{\fontfamily{}\fontsize{40}{40}\selectfont\thechapter}}%
{10 pt}%
{\hrule\vspace{10pt}\raggedleft\bfseries\huge}%
%% Change `ku-farve` to `nat-farve` to use SCIENCE's old colors or
%% `natbio-farve` to use SCIENCE's new colors and logo.
\def \ColourPDF {include/natbio-farve}
%% Change `ku-en` to `nat-en` to use the `Faculty of Science` header
\def \TitlePDF {include/nat-en} % University of Copenhagen
\title{
\vspace{3cm}
\Huge{A WebAssembly Backend for Futhark} \\
\Large{Msc Thesis}
}
\author{
\Large{Philip Lassen}
\\ \texttt{philiplassen@gmail.com} \\
}
\date{
\today
}
\usepackage{afterpage}
\newcommand\blankpage{%
\null
\thispagestyle{empty}%
\addtocounter{page}{-1}%
\newpage}
\newenvironment{longlisting}{\captionsetup{type=listing}}{}
\newcommand{\textBf}[1]{\textbf{#1}}
\AtBeginEnvironment{longlisting}{%
\renewcommand{\fcolorbox}[4][]{#4}}
\begin{document}
\AddToShipoutPicture*{\put(0,0){\includegraphics*[viewport=0 0 700 600]{\ColourPDF}}}
\AddToShipoutPicture*{\put(0,602){\includegraphics*[viewport=0 600 700 1600]{\ColourPDF}}}
\AddToShipoutPicture*{\put(0,0){\includegraphics*{\TitlePDF}}}
\clearpage\maketitle
\thispagestyle{empty}
\setcounter{page}{0}
\pagenumbering{roman}
\afterpage{\blankpage}
\chapter*{Abstract}
\addcontentsline{toc}{chapter}{Abstract}
Futhark is a high-performance purely functional data-parallel array programming language targeting parallel compute hardware. Futhark has backends for several compute architectures and this thesis adds browsers by targeting WebAssembly and threaded WebAssembly. These are browser technologies which map better to the underlying hardware of devices, including multicore CPUs.
A JavaScript API is developed for easily calling compiled Futhark Web\-Assembly libraries in the browser.
The implementation and generated Web\-Assembly code is benchmarked for both browsers and Node.js, against the Futhark sequential C and multicore C backends. The sequential Web\-Assembly performs close to sequential C speeds. The parallel execution of threaded Web\-Assembly speeds up some example programs by a factor equal to the number of physical CPU cores.
%A large percentage of today's software is run in the browser, and the underlying hardware of the laptops and mobile phones that run these browser have increasingly parallel compute with GPU's and multicore CPU's. As such there is heavy development of standardized cross browser Web API's to harness these parallel architectures. One of these technologies is threaded WebAssembly. This thesis develops WebAssembly and threaded WebAssembly backends for the Futhark programming language, leveraging both parallel computing and browsers as a target architecture.
\tableofcontents
\csname @openrightfalse\endcsname
\chapter{Introduction}
\setcounter{page}{1}
\pagenumbering{arabic}
%\section{FIRST ATTEMPT}
%Futhark is a high-performance purely functional data-parallel array programming language targeting parallel compute hardware.
%The last two decades have seen the proliferation of consumer devices---laptops, tablets, phones---with parallel computing capabilities with GPUs and multicore CPUs. This is an interesting application space for Futhark.
%A large fraction of the software that runs on these devices runs in the browser. This presents an avenue for executing Futhark on consumer devices if the Futhark compiler can generate code that can run in browsers.
%The WebAssembly browser technology has emerged as a particularly effective compilation target that runs efficiently in the browser.
%Furthermore, WebAssembly has an experimental multithreaded extension called threaded WebAssembly that supports parallel execution on multicore CPUs.
%This thesis develops backends to compile Futhark to WebAssembly and threaded WebAssembly for efficient, parallel execution in web browsers on multicore CPUs, together with a convenient and efficient JavaScript API for Futhark interoperation with browser applications.
%In this way this thesis contributes towards extending high level and high performance parallel programming to modern consumer devices.
%This thesis is a contribution towards extending high level and high performance parallel programming to modern consumer devices by compiling the functional data-parallel programming language Futhark to WebAssembly and threaded WebAssembly for efficient execution in web browsers on multicore CPUs.
%\section{SECOND ATTEMPT}
%As such there is heavy development of standardized browser APIs to harness these parallel architectures more efficiently than JavaScript. One of the technologies that is getting traction and adoption is WebAssembly which offers near-native single threaded execution speeds in browsers. It is supported by all modern browsers and is the target of compilers for many major programming languages. WebAssembly has experimental support for multithreaded execution with the parallel extension called threaded WebAssembly. It has working compiler and browser support.
The last two decades have seen a proliferation of consumer devices---laptops, tablets, phones---with parallel computing capabilities with GPUs and multicore CPUs. A large portion of the software running on these devices runs in the browser.
%, with JavaScript being the standard.
%Emerging technologies are improving how programmers can utilize devices' parallel computing capabilities in the browser.
%Emerging technologies are augmenting standard JavaScript to provide programming tools to utilize devices' parallel computing capabilities in the browser.
%However, the programming tools to utilize devices' parallel computing capabilities in the browser are only just emerging.
%JavaScript is the standard way to
For decades, JavaScript has been the standard tool for programming in the browser.
JavaScript is ubiquitous but it suffers from inefficient execution. Moreover, there is an increasing interest in compiling other programming languages to run in the browser and JavaScript is not optimal as a compilation target in terms of code size and execution speed. Attempts to solve these issues have led to the development of first asm.js and then WebAssembly, which acts as a kind of assembly language for the browser. WebAssembly is compact and executes at near native speeds and is now supported as a compilation target for many major programming languages, including C, C++, and Rust.
However, WebAssembly is single threaded and doesn't utilize parallel computing capabilites of the underlying hardware.
Development into utilizing parallelism is ongoing with efforts being made both in the context of GPUs and multicore CPUs. One is WebGPU, a proposed web standard and JavaScript API for calling GPUs in the browser. Another is threaded WebAssembly, an experimental extension to WebAssembly that supports parallel execution on multicore CPUs using Web Workers and provides an interesting compilation target for parallel programming languages to run in the browser.
%It should be noted that WebAssembly is not intended to be used by developers, rather as a compilation target for higher level languages.
Futhark is a high-performance purely functional data-parallel array programming language targeting parallel compute hardware, primarily GPUs. More recently, the Futhark compiler has gained an additional backend targeting multicore CPUs.
%As a high level programming language, Futhark hides the intricacies of implementing and handling the threads, which would need to be explicitly addressed in a lower level language like C.
% Parallel is hard
% Futhark makes easy
% Compiling to browser allows you to compile futhark to any device
%The emerging support for parallel execution in the browser provides an avenue to extend the types of devices that can run futhark code.
This thesis develops backends for compiling Futhark to WebAssembly and threaded WebAssembly for efficient, parallel execution in web browsers, together with a convenient and efficient JavaScript API for Futhark interoperation with browser applications. In this way this thesis contributes towards extending high level and high performance parallel programming to modern consumer devices.
\subsubsection*{Objectives}
The objectives of this thesis are to:
\begin{itemize}
\item Survey technologies for high performance and parallel programming in the browser.
\item Design an API for interoperation between Futhark and browser applications.
\item Implement Futhark backends that generate libraries that run efficiently and in parallel in browsers.
\item Evaluate the performance of these backends.
\end{itemize}
%The last two decades have seen the proliferation of devices---laptops, tablets, phones---with parallel computing capabilities with GPUs and multicore CPUs.
%A large fraction of today's software runs in the browser.
%{\color{red} TODO}: discuss the challenge of programming browsers to take advantage of the hw
%{\color{red} TODO}: mention the ubiquity and inefficiency of JS
%TODD: mention that WebAssembly is low level, not meant for coding by hand, and therefore a higher level PL is required
%{\color{red} TODO}: argue why Futhark is an interesting source language
%GOALS
%The goal of this thesis is to extend high level and high performance parallel programming to modern consumer devices. (TODO...)
%by compiling the functional data-parallel programming language Futhark to WebAssembly and threaded WebAssembly for efficient execution in web browsers on multicore CPUs.
%This thesis is a contribution towards extending high level and high performance parallel programming to modern consumer devices by compiling the functional data-parallel programming language Futhark to WebAssembly and threaded WebAssembly for efficient execution in web browsers on multicore CPUs.
%A large percentage of today's software is run in the browser, and the underlying hardware of the laptops and mobile phones that run these browser have increasingly parallel compute with GPU's and multicore CPU's. As such there is heavy development of standardized cross browser Web API's to harness these parallel architectures. One of these technologies is threaded WebAssembly. %This thesis develops WebAssembly and threaded WebAssembly backends for the Futhark programming language, leveraging both parallel computing and browsers as a target architecture.
\subsubsection*{Thesis Structure}
This thesis is organized with the following structure:
\begin{itemize}
\item Chapter 2, Background:
Overview of the related work that this thesis builds on.
\item Chapter 3, WebAssembly:
Explanation and analysis of WebAssembly as a programming language and as a target language for the browser.
\item Chapter 4, API Design:
We develop an API for calling Futhark Web\-Assembly libraries from JavaScript.
\item Chapter 5, WebAssembly Backend:
Description of the implementation of the WebAssembly backend. As well as a overview of the performance bench-marked against the native C backend.
\item Chapter 6, Parallel Execution in the Browser:
Analysis of the facilities and paradigms for parallel programming in the browser through JavaScript and WebAssembly.
\item Chapter 7, WebAssembly Multicore Backend:
Description of the implementation of the multicore WebAssembly backend. As well as an overview of the performance benchmarked against the native multicore C, and WebAssembly backend developed in Chapter 5.
\item Chapter 8, Conclusion:
Summary of the implementations and performance of the backends and API developed. As well as a brief discussion of future developments for Futhark targeting parallel compute in the browser.
\end{itemize}
\chapter{Background}
This chapter describes relevant background for the work in this thesis, namely browser programming facilities and the Futhark programming language.
\section{Programming Languages in the Browser}
%\section{Futhark}
%\section{Browsers}
%The first web browsers started as Graphical User Interfaces for rendering static web pages. Web Browsers have evolved to keep up with the high paced innovation in web technologies and with modern day browsers having strong similarities with full blown operating systems.
%\subsection{JavaScript}
%JavaScript for a long period of time was the only supported programming language for the browser. As such it became one of the largest programming languages in the world. With browsers being one of the most ubiquitous coding platforms, programmers were restricted to JavaScript for any functionality in their websites that run client side. As a result many languages added backends to target JavaScript to that programmers could write code in their respective languages and generate JavaScript that would run in the browser. The problem is that JavaScript is not a particularly good target language, as it was not designed with this use case in mind.
%http://www.observationalhazard.com/2018/06/history-of-web-programming.html
% ^ This is about the programming languages in the web in the early days vbscript and java applets
In the early days of web browsers, web pages would render differently across browsers as web APIs were not standardized. Different browsers supported different programming languages, e.g. Java applets and VBScript, creating headaches for programmers who wanted their websites to render identically across browsers. The first popular language for the browser was JavaScript, but the implementations across browsers differed. Eventually the big vendors converged on JavaScript releasing a standardized version called ECMAScript.
Huge investments have been made to increase the execution speed of JavaScript in the browser.
All the major browser vendors have optimized the performance of their JavaScript engines, in particular V8 in Chrome and SpyderMonkey in Mozilla. Many approaches have been taken to make JavaScript faster but they have all been fundamentally limited by the language design.
%\subsection{ASM.js}
One of the approaches taken by the browser vendors was to define a subset of the language asm.js and a convention for type hints, which were designed for efficient execution by leveraging types and compiler tricks to allow ahead-of-time compilation. It was intended as a target language for compilation of statically typed programming languages. Emscripten \cite{emcc} was developed to be a C/C++ to asm.js compiler.
Google also introduced Google Native Client (NaCl) \cite{nacl} as a way to bridge the speed gap between running code in the browser, and natively. NaCl is a sandbox for running compiled C and C++ code in the browser efficiently and securely, independent of the user’s operating system \cite{nacl}. However it struggled to gain adoption due to its lack of portability as it was only supported by Chromium based browsers.
%\subsection{WebAssembly}
The major browser vendors collaboratively designed WebAssembly (Wasm) \cite{wasm_og} to more comprehensively address the limitations of JavaScript as a target language for the web. It is a portable low level byte code, designed for compact representation, efficient compilation, and near native execution speeds. Web\-Assembly is gaining adoption and has been used for a variety of applications especially as a target for compilation from C, C++ and Rust.
Google Earth \cite{google-earth-history} is an example of a major application that is adopting WebAssembly. Google Earth renders a 3D representation of Earth based primarily on satellite imagery. It started out as a desktop application, but in 2013 was ported to the web. It originally only ran in Chrome, as it was built on NaCl. The developers tried to build it with asm.js, but found the binary sizes of compiling over a million lines of code with Emscripten to be infeasibly large. However with the creation of WebAssembly they were able to make a high performance cross browser implementation of Google Earth due to the speed and small binary sizes of WebAssembly \cite{google-earth}.
Another example is TensorFlow \cite{tensorflow}, an open source machine learning library, originally written in C++. Due to the large eco-system of JavaScript developers and its ability to run on the browser, the developers introduced TensorFlow.js \cite{tensorflowjs}. They have multiple backends including a WebAssembly backend, which is 10-30x faster than their plain JavaScript backend \cite{tf-speed}. What is interesting about TensorFlow.js is that it shows how WebAssembly can be used to create high performance libraries that can be called from the web.
%\section{LLVM}
One of the technologies that has greatly helped the adoption of WebAssembly as a target language is the LLVM \cite{llvm} compiler tool chain. Writing a full compiler from scratch that supports multiple targets is a huge undertaking. In order to have high performance backends for different target architectures such x86 and ARM requires knowledge of many of the low level details of each respective target. An alternative approach is for the compiler frontend of the source language to take the source code and translate it to the LLVM internal representation (IR). The LLVM compiler tool-chain can then generate high performance code on all the most common computer architectures. Many of the biggest languages are currently built with or have compiler implementations using LLVM.
LLVM compiles from its IR to WebAssembly and therefore languages that generate LLVM IR have an easy path to WebAssembly code generation.
Emscripten uses LLVM and can generate WebAssembly in addition to asm.js. It is a widely used compiler for generating WebAssembly.
\section{Parallel Programming in the Browser}
While WebAssembly has progressed the state of the art of single threaded computation speed in browsers another avenue for execution speed is parallelism. Browsers have facilities for parallel programming. Javascript supports two different paradigms with web workers. Message passing enables parallel programming without shared memory. SharedArrayBuffer and atomics enable shared memory multithreading with thread synchronization. There is a threaded WebAssembly proposal that adds atomic operations to the language, and adds support for SharedArrayBuffers while relying on JavaScript’s web workers to create and join threads. Emscripten can currently compile C with POSIX threads to threaded WebAssembly. The Chrome and Firefox browsers along with Node.js have experimental support for threaded WebAssembly.
%{\color{red} TODO} mention GPU API's for the browser
\section{Futhark}
Futhark \cite{futhark-og} \cite{futhark_phd} is a data parallel programming languages that can generate high performance parallel code for both the CPU and GPU . Writing GPU code and multicore CPU code is difficult as there are many low level details required to make correct and optimal implementations. Futhark is a high level functional programming language, which aims to do the heavy lifting for the user.
Futhark programs are generally written with Second-Order Array Combinators (SOACs), which are similar to the filter, map, and reduce functions commonly found in many functional programming languages. These functions can be optimized to efficient parallel code. These combinators are expressive and be combined to encode code complex programs.
%
%This thesis is not going to explain the Futhark language. Most of the Futhark code snippets scattered throughout the thesis are short and easy to understand without prior familiarity to the language.
%
To see this in action below is a Futhark implementation of matrix multiply:
\begin{verbatim}
let matmul [n][p][m] (xss: [n][p]f64) (yss: [p][m]f64): [n][m]f64 =
let dotprod xs ys = reduce (+) 0 (map2 (*) xs ys)
in map (\xs -> map (dotprod xs) (transpose yss)) xss
\end{verbatim}
Here \texttt{dotprod xs ys} computes the dot product of two vectors \texttt{xs} and \texttt{ys} by computing the pairwise products \texttt{zs = map2 (*) xs ys} and then summing the products with \texttt{reduce (+) 0 zs}.
In the last line the innermost \texttt{map} in \texttt{map (dotprod xs) (transpose yss)} computes a row vector of the product matrix and the outermost \texttt{map} generates all the rows.
This serves as an illustration of how a relatively involved operation can be written using SOACs. Not only is the implementation short, it's also fast.
This thesis is not going to explain the Futhark language in further detail. Only very short Futhark functions will be used in examples and do not require deeper understanding of the language.
Currently the futhark compiler has C backends generating Cuda, OpenCL, and sequential C code. Recently a multicore C backend was added that generates parallel code using POSIX threads (pthreads) \cite{multicore}. Futhark also has two Python backends, one sequential and one using PyOpenCL. All the backends can be compiled to libraries, making it possible to call Futhark from C or Python applications. For this this thesis we build off of the sequential C backend and the multicore C backend.
%The plain C code backend comes in two variants, sequential C and multicore C. The latter is a recent addition and generates parallel code using POSIX threads (pthreads) \cite{multicore}. These backends are important for this thesis. Emscripten compiles sequential C to plain WebAssembly, and C with pthreads to threaded WebAssembly. This allows us to bootstrap these two backends to generate efficient code that can run in the browser.
%TODO: explain which of the alternatives we chose
\chapter{WebAssembly}
This chapter gives an overview of the design and structure of the WebAssembly programming language. We illustrate the instruction set with two simple hand written WebAssembly modules. We show how functions and memory work, and how to call a module from JavaScript. Most importantly we show how to generate WebAssembly and JavaScript glue code with Emscripten and how to work with Emscripten's JavaScript API.
%This will be driven through an example program, and some performance evaluations and literature review.
%Its important to note that WebAssembly doesn't aim to be a complete replacement for JavaScript. WebAssembly cannot access the DOM for example. WebAssembly was instead designed to have good interoperation with JavaScript. In practice a developer wanting to get high performance code running in the browser would first write their library in a language such as C or Rust. Then they would compile to WebAssembly, and use JavaScript to call their WebAssembly module in the browser.
%WebAssembly defines a portable binary format that can be run in the web with high performance.
\section{WebAssembly Module Structure}
This section describes some of the lower level details of the WebAssembly programming language. They illustrate some of WebAssembly's characteristics but they are not critical for understanding the rest of the thesis.
A WebAssembly file is commonly referred to as a module, and given a \textit{.wasm} file extension. WebAssembly also defines a text format that serves to be a human readable version of the underlying binary format, much in the same way assembly provides a human readable format for machine code. The \texttt{wat2wasm} program from The WebAssembly Binary Toolkit\footnote{https://github.com/WebAssembly/wabt}
compiles the textual format to a binary module.
WebAssembly modules are segmented into sections. The segmentation of these sections is done so that loading a WebAssembly file is as efficient as possible. The sections are structured such that the byte code can be compiled in a single pass, and in parallel. Furthermore the code can be parsed and compiled before the complete WebAssembly file has been downloaded, reducing the instantiation time of a WebAssembly module.
WebAssembly supports the 4 number types of 32-bit and 64-bit integers, \texttt{i32} and \texttt{i64}, and floats, \texttt{f32} and \texttt{f64}. These don't map cleanly to JavaScript's number types but are useful for supporting number types for languages like C/C++ and Rust, the languages it aims to be a target language for.
A WebAssembly function has the following structure.
\begin{verbatim}
( func <signature> <locals> <body> )
\end{verbatim}
The signature gives the function name, parameter types, and return types. The locals are the local variables that will be used in the execution of the function, and the body is the actual implementation of the function. A simple WebAssembly example is the add function.
\begin{verbatim}
(module
(export "add" (func $add))
(func $add (param $a i32) (param $b i32) (result i32)
local.get $a
local.get $b
i32.add
)
)
\end{verbatim}
The function signature of add specifies the two arguements a and b as \texttt{i32} and specifies the return type result as \texttt{i32}. There are no locals. The function body has three instructions. The instructions \texttt{local.get \$a} and \texttt{local.get \$b} push the two arguments to onto the stack. The instruction \texttt{i32.add} pops the two elements off the stack and pushes their sum. The function returns the number on the stack.
The body of the function could be replaced with:
\begin{verbatim}
(i32.add (local.get $a) (local.get $b))
\end{verbatim}
That is, WebAssembly allows the programmer to use a notation where arguments to instructions are passed as parameters instead of manually being placed on the stack.
%{\color{red} TODO} link and ref mdn https://developer.mozilla.org/en-US/docs/WebAssembly/Understanding_the_text_format
\section{Memory}
A notion of memory is needed for writing more complex programs. In a language like C, it is common practice to use pointers to locations in memory, or for writing an array of values. Memory from the perspective of WebAssembly is just an array of bytes that can be read from and written to. WebAssembly has two essential functions for interacting with this array, namely the \texttt{load.i32} and \texttt{store.i32}, for reading and writing to the array of bytes respectively.
As a motivating example, the following C code gives a simple implementation of a place prefix sum. (Again, this example is just for illustration and not needed to understand the rest of the thesis.)
\begin{verbatim}
void prefix_sum(int* arr, int size) {
for (i = 1; i < size; i++) {
arr[i] += arr[i-1];
}
}
\end{verbatim}
The following code is a WebAssembly implementation of in-place prefix sum.
\begin{verbatim}
1 (module
2 (import "env" "memory" (memory $memory 1))
3 (export "prefixSum" (func $prefixSum))
4 (func $prefixSum (param $size i32)
5 (local $offs i32)
6 (local $acc i32)
7 (local $last i32)
8 (local.set $offs (i32.const 4))
9 (local.set $last (i32.mul (local.get $size) (i32.const 4)))
10 (local.set $acc (i32.load (i32.const 0)))
11 loop $forloop
12 (local.set $acc (i32.add (local.get $acc)
(i32.load (local.get $offs))))
13 (i32.store (local.get $offs) (local.get $acc))
14 (local.set $offs (i32.add (local.get $offs) (i32.const 4)))
15 (br_if $forloop (i32.ne (local.get $offs)
(local.get $last)))
16 end $forloop
17 )
18 )
\end{verbatim}
% {\color{red} TODO} brief description of WebAssembly
% {\color{red} TODO} talk about 64k pages
For the implementation, a local variable accumulator is set to the first value of the array and an offset is set to 0. A loop is then entered, where the element at offset in the array is loaded with \texttt{i32.load} and added to the accumulator, \texttt{\$acc}. The result is stored with \texttt{i32.store}. The offset is increased by 4 bytes, and then compared to the local variable \texttt{\$last}. If it is not equal the loop goes back to the loop on line 12, and repeats.
The most important details of the function implementation for understanding WebAssembly's interaction with memory are the load and store operations, which use byte offsets to address memory. The memory is never explicitly referenced in the function because WebAssembly modules only have one declaration of memory in the memory section, making the array of memory implicit. Memory can either be imported from JavaScript, in which case the memory is created in JavaScript and passed to WebAssembly on instantiation, or, alternatively, the memory can be exported from WebAssembly, in which case the memory is created in WebAssembly on instantiation and can be accessed in JavaScript afterwards. The memory is imported in line 2:
\begin{verbatim}
(import "env" "memory" (memory $memory 1))
\end{verbatim}
The ending, 1, sets the size of the memory heap to be 1 page of memory. For WebAssembly 1 page corresponds to 64 kilobytes. Memory is always set to an integer number of pages.
\section{WebAssembly and JavaScript Interaction}
%WebAssembly defines a small set of instructions, which runtimes can translate to native code quickly. However because of this it is unable to complete many of the basic functions of JavaScript.
Listing \ref{lst:add.html} shows how to load, instantiate, and call the \textit{add.wasm} module from JavaScript in a web page.
\begin{listing}[h]
\inputminted[fontsize=\small,baselinestretch=0.5,linenos]{HTML}{code/examples/wasm/add.html}
\caption{Invocation of \textit{add.wasm} in web page}
\label{lst:add.html}
\end{listing}
By design a WebAssembly module only interfaces with its environment through function inputs and outputs, and through memory.
The most clear cut example of this is the lack of access to the DOM.
WebAssembly does not have direct access to Web APIs. However it can call imported JavaScript functions. This facility allows it to interact with Web APIs. This simple design feature is powerful. It is this feature that enables threaded WebAssembly as it can leverage JavaScript's ability to spawn web workers. This will be discussed in greater detail in Chapter 6. %OBS
%For most practical use cases WebAssembly modules are used as libraries.
\section{Emscripten}
This section describes how to generate WebAssembly and JavaScript glue code with Emscripten and how to work with Emscripten's JavaScript API. This material is critical for understanding the backend implementation in chapter 5.
As described in chapter 2, the popular
Emscripten toolchain compiles C/C++ to a WebAssembly module. It also generates JavaScript "glue code", a phrase we use with the specific meaning of code that Emscripten generates to encapsulate module instantiation and access through Emscripten's JavaScript API.
Technically the library user can load the WebAssembly module directly as in listing \ref{lst:add.html}. However this is impractical for a couple of reasons. The module is constructed by Emscripten to import a series of system library functions to access files and allocate memory etc. Emscripten's glue code supplies all these at instantiation time and hides this from the library user.
%When compiling C to WebAssembly as a library the Emscripten compiler generates two files. One is a WebAssembly module and the other is JavaScript glue code. This allows web programmers to access all the important elements of WebAssembly from a set of JavaScript functions that the Emscripten runtime defines. This is beneficial for multiple reasons. Firstly working directly with WebAssembly modules is a layer of detail most web developers would like to avoid. Secondly this approach allows a programmer to simply compile their C library and call it from JavaScript.
%It is impractical to work with WebAssembly as a modules directly. There are many low level details in terms of interacting with memory and instantiating a WebAssembly module that programmers would like to avoid. For this reason Emscripten also generates JavaScript glue code.
For many use cases, compiling code to a library is desirable. This way a programmer can write source code in C/C++ that contains functions for compute intensive parts of their application. They can then generate WebAssembly modules that can be called from JavaScript, and offload the computation to WebAssembly.
It is relatively straightforward to call simple C functions from JavaScript when they are compiled with Emscripten. We just need to explicitly export the functions at compile time.
If we have a simple C file \textit{add.c} that contains an add function.
\begin{verbatim}
int add(int a, int b) { return a + b; }
\end{verbatim}
We compile with Emscripten to a library while making sure to export the function.
\begin{verbatim}
emcc add.c -o add.js -s MODULARIZE -s EXPORTED_FUNCTIONS=[_add]
\end{verbatim}
We also use the MODULARIZE flag, to make the WebAssembly Module easier to import. This is helpful because WebAssembly is loaded asynchronously. With the MODULARIZE flag we are able to get the module as a promise. The following code runs in Node.js:
\begin{verbatim}
var load_module = require('./add.js');
load_module().then((instance) => {
console.log(instance._add(4, 6));
});
\end{verbatim}
The \texttt{load\_module} is a factory function. Once the WebAssembly Module is loaded we run the code in the callback. The actual logic for running the C function is just the one line:
\begin{verbatim}
console.log(instance._add(4, 6));
\end{verbatim}
The example library can also be loaded as a script in a web page and invoked from JavaScript in the browser. This will be shown later in the Mandelbrot example in Section \ref{sec:mandelbrot}.
Library functions compiled with Emscripten are easy to use when they take integer arguments and have integer return types. However lots of C/C++ works with pointers. Emscripten models that with locations in a single memory region attached to the WebAssembly module. This memory region is called the heap. Emscripten offers multiple views into the heap with a typed array for each primitive element type, namely those supported by JavaScript typed arrays. See Table \ref{table:heapviews}. These typed array all share the same underlying ArrayBuffer memory.
%Note that pointers default to 32-bit signed integers in WebAssembly and are therefore written to and read from memory using the HEAP32 heap view.\footnote{{\color{red} TODO}: wasm64}
\begin{table}[H]
\centering
\begin{tabular}{|l|r|}
\hline
\textBf{Heap view} & \textBf{JavaScript type} \\ \hline
\texttt{HEAP8} & \texttt{Int8Array} \\ \hline
\texttt{HEAP16} & \texttt{Int16Array} \\ \hline
\texttt{HEAP32} & \texttt{Int32Array} \\ \hline
\texttt{HEAP64} & \texttt{BigInt64Array} \\ \hline
\texttt{HEAPU8} & \texttt{Int8Array} \\ \hline
\texttt{HEAPU16} & \texttt{Uint16Array} \\ \hline
\texttt{HEAPU32} & \texttt{Uint32Array} \\ \hline
\texttt{HEAPU64} & \texttt{BigUint64Array} \\ \hline
\texttt{HEAPF32} & \texttt{Float32Array} \\ \hline
\texttt{HEAPF64} & \texttt{Float64Array} \\ \hline
\end{tabular}
\caption{Emscripten's heap views and their JavaScript types}
\label{table:heapviews}
\end{table}
Typed arrays will play an important role in this thesis, both for interacting with the Emscripten heap and for array parameters to compiled Futhark functions. Typed arrays are useful because they represent number arrays compactly and efficiently, and the ability for typed arrays to act as different views of the same underlying ArrayBuffer can be used to avoid memory copies for parameter passing in many cases.
To illustrate how to pass pointer arguments, consider the simple C string function in listing \ref{lst:concat-c} which takes two strings and return their concatenation into allocated memory that the caller must free.
\begin{listing}[h!]
\inputminted[fontsize=\small,baselinestretch=0.5,linenos]{C}{code/examples/emcc/concat.c}
\caption{C concat function }
\label{lst:concat-c}
\end{listing}
To use this function the caller will need to use the malloc and free system library functions, so we export those alongside concat when we compile to a library with Emscripten:
\begin{verbatim}
emcc concat.c -o concat.js -s MODULARIZE \
-s EXPORTED_FUNCTIONS=[_malloc,_free,_concat]
\end{verbatim}
The Node.js program in listing \ref{lst:hello-world.js} calls concat.
\begin{listing}[H]
\inputminted[fontsize=\small,baselinestretch=0.5,linenos]{C}{code/examples/emcc/hello-world.js}
\caption{Node.js call to Emscripten compiled concat function }
\label{lst:hello-world.js}
\end{listing}
The pointers a and b passed to \texttt{\_concat} are locations in the Emscripten heap, so the caller must first copy the strings to the heap for \texttt{\_concat} to access them. To find a place in the heap to place them, we call \texttt{malloc} twice and then use the \texttt{HEAPU8} heap view to write the strings into the allocated memory.
We zero terminate the a string with:
\begin{verbatim}
instance.HEAPU8[a + alen] = 0;
\end{verbatim}
and write into the first \texttt{alen} bytes through:
\begin{verbatim}
instance.HEAPU8.subarray(a, a + alen)
\end{verbatim}
It is another Uint8Array which is a view into the subarray of the heap that holds the first alen bytes, up to and excluding the terminating zero.
The returned pointer, \texttt{c}, again points into the heap and we find the heap location of the terminating zero with:
\begin{verbatim}
var cend = instance.HEAPU8.indexOf(0, c);
\end{verbatim}
and then we get a view of the concatenated string with:
\begin{verbatim}
instance.HEAPU8.subarray(c, cend)
\end{verbatim}
Finally, all the allocated strings are freed with calls to \_free.
Emscripten's function and heap API and compiler flags will play important roles in the implementation of the WebAssembly backends for Futhark in chapter 5 and 7.
%\section{WebAssembly outside the browser}
%There are also a number of applications for WebAssembly outside of the browser. The most standard engine for running WebAssembly outside the browser is Node. Node also uses the V8 engine that is present in Chrome, and other browsers to support WebAssembly. As a result it is a high degree of performance parity with WebAssembly that is run in the browser.
%Running Node locally is convenient for rapid prototyping as well as testing. Making tests that run in the browser takes a deal of configuration, and isn't trivial. It is possible to run Chrome, and Firefox as headless browsers for testing purposes, but often it is desirable to test part of WebAssembly functionality
%WebAssembly executes within a sand-boxed stack-based environment, which means that running foreign code can only effect the virtual environment that executes the WebAssembly module. This compounded with its competitive performance to native code makes it an ideal candidate for any applications that need to run foreign code.
\chapter{API Design}
The WebAssembly backends that we are going to develop will compile Futhark functions to libraries for use in the browser. This section designs a JavaScript API for programs in the browser to call the compiled library functions.
%to call the WebAssembly modules from the browser, an efficient, convenient, and concise JavaScript API was developed and illustrated with examples
%This chapter describes the implementation of an additional Futhark backend that generates WebAssembly and JavaScript code such that Futhark programs can be compiled to libraries that can be run in the browser. The implementation includes an API to invoke the compiled libraries from JavaScript. The implementation is benchmarked in the Chrome browser, and in Node.js and against the C backend.
%Firstly we design the JavaScript API. We take inspiration from the Futhark C and Python APIs. Our design factors in the characteristics and limitations of JavaScript and WebAssembly. The API is illustrated with an example.%OBS
%Secondly we implement the backend. We take outset in the existing futhark Sequential C backend and pass the generated C code to the Emscripten compiler for the final compiler pass. The Emscripten generates a WebAssembly module along with JavaScript glue code. We generate additional JavaScript code to wrap our API around the JavaScript code generated by Emscripten.
%To illustrate the API and backend implementation working in practice we use the WebAssembly backend to efficiently generate graphics for the Mandelbrot set in a webpage.
%Finally we benchmark the Futhark backend to quantify its performance. The benchmarks come from the futhark benchmark suite\footnote{https://github.com/diku-dk/futhark-benchmarks}, which are standardized benchmarks to test the performance of Futhark backends against industry standards. The benchmarks show the WebAssembly, both when run in the Browser and run on Node locally, performs between 12\% to 65\% slower than the generated C running locally.
%\section{API Motivation}
In order to make Futhark a practical language for writing libraries that can be called in the browser it is important that it has a simple and efficient API. As discussed in chapter 3 code that is compiled to WebAssembly modules typically comes packaged with a JavaScript file. This file contains the library functions that are exposed to the programmer, where internally these functions handle the interaction with WebAssembly. This allows users of the library to be oblivious of the fact that WebAssembly is being used under the hood, much in the same way that some Python developers are oblivious that numpy is running compiled C under the covers.
A successful API for calling Futhark in the browser will have the following properties
\begin{enumerate}
\item It should be convenient, seamlessly integrating with the most commonly used JavaScript number and array data types.
\item The API should be efficient both with respect to memory usage and runtime speed.
\item Finally the API should minimize boiler plate code.
\end{enumerate}
\section{Comparing Futhark APIs}
It is insightful to look at the APIs of other languages so that we can copy the elements of the APIs that are effective and avoid the parts that are cumbersome or inefficient. These decisions are constrained by the limitations and capabilities of the target language, which for us is JavaScript and WebAssembly.
An additional reason why it is important to observe the C API, is that the WebAssembly backend builds off the sequential C backend. For this reason our design choices for the API are limited to how we can wrap the C API function calls behind JavaScript classes and functions.
%% MAYBE Section not CHAPTER BE MINDFUL YOUNG CHILD
The exposition will be example based.
The first example is the minimal program in listing \ref{lst:simp} that takes a 32 bit integer and returns the successor.
\begin{listing}[H]
\begin{minted}{Haskell}
entry increment (a : i32) = a + 1
\end{minted}
\caption{Futhark increment function, \textit{increment.fut}}
\label{lst:simp}
\end{listing}
The second example in listing \ref{lst:scale} works with multidimensional array inputs and outputs as well as a scalar input:
\begin{listing}[H]
\begin{minted}{Haskell}
entry scale (scalar : f32) (matrix : [][]f32) =
map (map (scalar *)) matrix
\end{minted}
\caption{Futhark scale function, \textit{scale.fut}}
\label{lst:scale}
\end{listing}
The scale function multiplies each element of a matrix by a scalar, and returns the scaled matrix.
%The implementation isn't important as this code is generated by Futhark.
%What is important is the way we can pass arguments and capture the return values from this function in the target language.
%{\color{red} TODO}: also introduce simple increment example
\subsection{C API}
%{\color{red} TODO}: do increment example first
When \textit{increment.fut} is compiled as a C library, the files \textit{increment.c} and \textit{increment.h} are generated. The implementation and header files respectively. Listing \ref{lst:c-api-incr} is a C program that interfaces with the generated library.
\begin{listing}[h!]
\inputminted[fontsize=\small,baselinestretch=0.5,linenos]{C}{code/compiler/api_examples/increment.c}
\caption{C code for interacting with the C API of the compiled program \textit{increment.fut} }
\label{lst:c-api-incr}
\end{listing}
At the top level the API works off of a context and a configuration, \texttt{futhark\_context}, and \texttt{futhark\_context\_config}. The configuration stores choices like debugging. The context manages global information and bookkeeping. Including logging and profiling, and a mutex to guarantee thread safety when used from multiple threads. The API has functions to create these at the beginning and free them at the end.
\begin{verbatim}
futhark_context_config_new
futhark_context_new
futhark_context_free
futhark_context_config_free
\end{verbatim}
Futhark's primitive types (bool, integers and floats) are represented by corresponding C types.
For arrays futhark generates a C type for every type that appears as an argument or result type in any entry point function in the library. It also generates functions to create, access, and free arrays. For example for the futhark array type \texttt{[][]f32}, the generated header file \textit{scale.h} declares a type \texttt{futhark\_f32\_2d} and functions \texttt{futhark\_new\_f32\_2d} and \texttt{futhark\_free\_f32\_2d} for creating and freeing arrays of that type. Moreover the API generates functions \texttt{futhark\_shape\_f32\_2d} and \texttt{futhark\_values\_f32\_2d} for getting the shape (dimensions) and values of the futhark array respectively.
For each entry point function in the library a C function is generated, which takes the futhark context, an output parameter for each type in the result tuple, as well as an input parameter for each argument. The function signature for the generated C function for \textit{scale.fut} can be seen below.
\begin{verbatim}
int futhark_entry_scale(
struct futhark_context *ctx,
struct futhark_f32_2d **out0,
const float in0, const struct futhark_f32_2d *in1);
\end{verbatim}
Listing \ref{lst:c-api} is a C program that calls \texttt{futhark\_entry\_scale}.
\begin{listing}[h!]
\inputminted[fontsize=\small,baselinestretch=0.5,linenos]{C}{code/compiler/api_examples/example.c}
\caption{C code for interacting with the C API of the compiled program \textit{scale.fut} }
\label{lst:c-api}
\end{listing}
Memory management is manual. Both the input array created with \texttt{futhark\_new\_f32\_2d} and the output array returned from \texttt{futhark\_entry\_scale} are freed manually.
\subsection{Python API}
When \textit{scale.fut} is compiled as a Python libray it generates a single Python file \textit{scale.py}. It contains a class that can be used to interact with the Python methods.
\begin{listing}[H]
\inputminted[fontsize=\small,baselinestretch=0.5,linenos]{Python}{code/compiler/api_examples/example.py}
\caption{64 bit multiplication with 128 bit casting}
\label{lst:int128}
\end{listing}
In order to invoke the python library we import the file. Then we instantiate a \texttt{scale\_class} by calling the \texttt{scale()} method. Finally we invoke the method by calling the entry point function with a scalar input and a 2-dimensional numpy array. All primitive Futhark types conveniently map to primitive numpy types. Numpy ndarray are made up of these primitive types so passing them in as arguments in the API provides sufficient information. We make sure that the numpy array has values of the correct type by calling the \texttt{astype} function of numpy with \texttt{f32}, the numpy equivalent of the futhark primitive \texttt{f32}. The entry point function logically returns a numpy ndarray whose elements have type \texttt{float32}. Interestingly numpy scalars are returned by the Python API if the return time is a scalar. However the entry point function can both accept scalars as normal Python numbers or as numpy scalars.
\subsection{API Comparison}
One of the features of the Python API that make it simpler is that it wraps the context and configuration in a class. For the C backend the context needs to be passed to every function call that interacts with Futhark specific functions. The other big advantage of the Python API is that it utilizes the numpy library which provides convenient classes for representing multidimensional arrays. C does not have a clean representation of multidimensional arrays. This makes the C API more verbose.
\section{JavaScript API}
When designing the JavaScript API we consider both usability and performance.
%\subsection{Futhark Types to JavaScript Types}
JavaScript has one standard number, which is a 64 bit floating point number. This is typically used to encode all numbers. All the Futhark types \texttt{u8}, \texttt{u16}, \texttt{u32}, \texttt{i8}, \texttt{i16}, \texttt{i32}, \texttt{f32}, \texttt{f64}, and \texttt{bool} except for \texttt{u64} and \texttt{i64} can be encoded by a 64 bit float. Fortunately JavaScript recently introduced \texttt{BigInt} in ES2020\footnote{ES2020 is the 2020 version of the JavaScript language specification \cite{ecma}} to address this shortcoming of the language. With this there is a way to represent every futhark primitive type with either standard JavaScript \texttt{number}, \texttt{boolean}, or \texttt{BigInt}. %OBS
One problem with JavaScript is that it doesn't have a standard package with widespread adoption for scientific computing that gives an efficient encoding of n-dimensional arrays. The standard JavaScript array are more reminiscent of lists and have similar semantics to the list type of Python. A problem is that there is no way to efficiently validate the shape for n-dimensional arrays and the types of its elements. %OBS
\subsection*{TypedArrays}
The approach that many scientific JavaScript libraries take is to accept typed arrays as arguments, as well as arguments for specifying the shape and dimensions of the typed arrays. Typed arrays are briefly discussed in chapter 3, but as a reminder typed arrays are standard JavaScript types. There are 10 different typed arrays types, one for each of the 10 common numeric types. These happen to correspond to Futhark's scalar types, with the exception of booleans. Typed arrays have a number of advantages:
\begin{itemize}
\item Type Enforcement : typed arrays only store the type that is specified by their constructor.
\item Memory Efficiency : typed arrays can store types with less than 64 bit precision more efficiently as they don't need to use 64bit float to also store high order bits. As an example a typed array of 8-bit integers takes an eighth of the space compared to a regular array storing numbers.
\item Ubiquity : Typed arrays are present in all standard JavaScript runtimes, such as Node.js, and Chrome. They don't need to be imported and are often the return types of scientific libraries such as for image or graphics processing
\end{itemize}
In order to maintain the most flexibility with our API design we choose to both have a way to work with the API through simple JavaScript Arrays, as well as through typed arrays. Typed arrays will have a little extra baggage to carry along shapes.
%\section{JavaScript API}
%\subsection{Types}
Futhark primitive types for all types except \texttt{i64}, \texttt{u64} and \texttt{bool} are represented by the standard JavaScript number type. \texttt{i64} and \texttt{u64} are represented by the \texttt{BigInt} number type. \texttt{bool} is represented by the standard JavaScript \texttt{boolean} type. The type conversion between Futhark types and JavaScript types can be seen in listing \ref{table:types}.
All array values share the same structure API. They can be passed in as n-dimensional arrays of their associated JavaScript type. Note that these arrays must be regular, it is on the caller to enforce this.
Alternatively we provide a FutharkArray class which can be instantiated from typed arrays. FutharkArray instances are also accepted by entry point functions. Users who have a focus on (space and speed) efficiency should elect to use this over regular JavaScript arrays. Users using a more efficient representation of their data with typed arrays aren't forced to convert into a less efficient representation.
Finally there are Opaque types, which don't have a clean mapping from Futhark to JavaScript. Instead they are represented with a FutharkOpaque class. This class doesn't have meaning outside of the context of being passed from the output of one entry point to the input to another entry point with the same type.
\begin{listing}[H]
\begin{table}[H]
\centering
\begin{tabular}{|c|c|}
\hline
\textBf{Futhark Type} & \textBf{JavaScript Type} \\ \hline
\texttt{i8} & \texttt{number} \\ \hline
\texttt{i16} & \texttt{number} \\ \hline
\texttt{i32} & \texttt{number} \\ \hline
\texttt{i64} & \texttt{BigInt} \\ \hline
\texttt{u8} & \texttt{number} \\ \hline
\texttt{u16} & \texttt{number} \\ \hline
\texttt{u32} & \texttt{number} \\ \hline
\texttt{u64} & \texttt{BigInt} \\ \hline
\texttt{f32} & \texttt{number} \\ \hline
\texttt{f64} & \texttt{number} \\ \hline
\texttt{bool} & \texttt{boolean} \\ \hline
\texttt{[]i32} & \texttt{FutharkArray} \\ \hline
\texttt{[][]f64} & \texttt{FutharkArray} \\ \hline
opaque & \texttt{FutharkOpaque} \\ \hline
\end{tabular}
\end{table}
\caption{Type conversion table from Futhark types to primitive JavaScript types}
\label{table:types}
\end{listing}
\subsection*{FutharkContext}
FutharkContext is a class that contains information about the context and configuration from the C API. It has methods for invoking the Futhark entry points and creating FutharkArrays on the WebAssembly heap.
It is instantiated with the empty constructor.
\begin{verbatim}
var fc = new FutharkContext();
\end{verbatim}
\texttt{FutharkContext} has methods for creating FutharkArrays, one method for each futhark array type that appears as an argument to any entry point function. Each method has the following signature
\begin{verbatim}
new_<type>_<n>d(typedArray, dim_1, ..., dim_n)
\end{verbatim}
Working off of the example \textit{scale.fut}, to create FutharkArray for a matrix of \texttt{[][]f32}, we would make the following call.
\begin{verbatim}
var fc = new FutharkContext();
var typedArray = new Float32Array([0.5, 0.4, 0.3, 0.2, 0.1, 0.0]);
var futharkArray = fc.new_f32_2d(typedArray, 3, 2);
\end{verbatim}
\subsubsection*{Entry Points}
Each entry point function in the compiled module has a method on the FutharkContext. Scalar parameters are either \texttt{BigInt} numbers, regular JavaScript \texttt{numbers}, or booleans. Array parameters can be either FutharkArrays or, for convenience, JavaScript Arrays.
The method returns an array if the futhark return type is a tuple. Otherwise it returns a single value.
Invoking the entry points of \textit{scale.fut} looks something like this:
\begin{verbatim}
var fc = new FutharkContext();
var typed_array = new Float32Array([0.5, 0.4, 0.3, 0.2, 0.1, 0.0]);
var fut_arr = fc.new_f32_2d(typed_array, 3, 2);
var fut_arr_result = fc.scale(0.5, fut_arr);
console.log(fut_arr_result.toTypedArray(), fut_arr_result.shape());
\end{verbatim}
As stated earlier entry points also allow JavaScript Arrays as inputs, provided they have the correct types and dimensions and are regular. So an alternative approach with JavaScript arrays would look like this:
\begin{verbatim}
var fc = new FutharkContext();
var matrix = [[0.5, 0.4],
[0.3, 0.2],
[0.1, 0.0]];
var fut_arr_result = fc.scale(0.5, matrix);
console.log(fut_arr_result.toArray());
\end{verbatim}
The second implementation is simpler but also less efficient as regular JavaScript arrays require multiple levels of type conversion under the hood.
The \texttt{fut\_arr\_result} in both the examples is a FutharkArray. FutharkArray has methods for getting the underlying data out into basic JavaScript as either a TypedArray or JavaScript Array.
We have omitted calls to free. Memory management will be discussed in the next section.
\subsubsection*{FutharkArray}
The FutharkArray allows us to get the underlying data from the WebAssembly Heap to JavaScript. It provides the following methods:
\begin{itemize}
\item \textBf{toArray()} :
Returns a JavaScript array with the correct dimensions.
\item \textBf{toTypedArray()} :
Returns a flat typed array of the underlying data.
\item \textBf{shape()} :
Gives the shape of the FutharkArray as an array of \texttt{BigInt}.
\item \textBf{futharkType()} :
Returns the futhark type of the elements as a string, i.e. \texttt{bool}, \texttt{u32}, \texttt{i8}, etc.
\end{itemize}
\subsubsection*{FutharkOpaques}
For handling the opaque types we introduce the FutharkOpaque class. This class has no methods, other than free, which will be discussed in the next section. The opaque class does not have any meaning to the user outside of the FutharkContext it is defined in. It is only useful for cases where an entry point function on the context returns an opaque type and another entry point function on the context accepts the opaque as input. For this reason there are no additional methods on the class.
\section{Memory Management}
Unfortunately WebAssembly does not have garbage collection. This means that the programmer is responsible for doing this manually. The JavaScript API introduced 3 classes, FutharkContext, FutharkArray, and FutharkOpaque. Each has a free method, which the API user must call manually to free underlying memory on the Emscripten memory heap.
With this last information about memory management we have the full picture of the JavaScript API. Our final look at the \textit{scale.fut} example gives us:
\begin{verbatim}
var fc = new FutharkContext();
var matrix = [[0.5, 0.4],
[0.3, 0.2],
[0.1, 0.0]];
var fut_arr_result = fc.scale(0.5, matrix);
console.log(fut_arr_result.toArray());
fut_arr_result.free();
fc.free();
\end{verbatim}
\section{Summary}
%We provide an efficient way of using the API through typed arrays. This adds some complexity. The motivating reason for this, is the API user can always introduce their own layers of abstraction on top of our functions. However if we
Our API is relatively concise but we do make some tradeoffs between conciseness and efficiency. In particular we return arrays from entry points as FutharkArrays rather than nested JavaScript arrays. We also provide an additional factory method to create FutharkArrays from flat typed arrays. These design choices favored efficiency at the expense of some complexity. The justification is that an efficient and complex API can be made simpler, whereas a simple and inefficient API cannot be made efficient.
One drawback with our API design is that the caller has to free memory, by calling free methods on the FutharkContext, FutharkArray, and FutharkOpaque objects. However this is a pain point of working with WebAssembly as it does not provide garbage collection.
%The API we developed in this chapter guides our implementation in chapter 5. However designing and implementing an API is a partially circular process. The implementation also guides the design by
%This motivates our decision to provide two methods of passing in array arguments. Namely by wrapping typed arrays with FutharkArrays, and passing in regular JavaScript arrays. Programmers with a focus on efficiency would use the former. Others can use the latter.
\chapter{WebAssembly Backend}
This chapter describes the implementation of an additional Futhark backend that generates WebAssembly and JavaScript code such that Futhark programs can be compiled to libraries that can be run in the browser. e implementation is benchmarked in the Chrome browser, and in Node.js and against the C backend.
Firstly we implement the backend. We take outset in the existing futhark Sequential C backend and pass the generated C code to the Emscripten compiler for the final compiler pass. The Emscripten generates a WebAssembly module along with JavaScript glue code. We generate additional JavaScript code to wrap our API around the JavaScript code generated by Emscripten.
To illustrate the backend implementation working in practice we use the WebAssembly backend to efficiently generate graphics for the Mandelbrot set in a web page.
Finally we benchmark the Futhark backend to quantify its performance. The benchmarks come from the futhark benchmark suite\footnote{https://github.com/diku-dk/futhark-benchmarks}, which are standardized benchmarks to test the performance of Futhark backends against industry standards. The benchmarks show the WebAssembly, both when run in the Browser and run on Node locally, performs between 12\% to 65\% slower than the generated C running locally.
%This chapter details the design and implementation of a new backend for Futhark with WebAssembly as the target. We begin by describing WebAssembly code generation. We then describe the implementation of library generation, using the example of \textit{scale.fut} for illustration. Next the compiler pipeline is described. The WebAssembly backend is then demonstrated by generating and visualizing a Mandelbrot set in the browser. Finally the WebAssembly Backend is benchmarked against the sequential C backend.
\section{WebAssembly Code Generation}
We will now describe the implementation of the WebAssembly backend for Futhark.
We considered three choices for generating WebAssembly:
%The most used WebAssembly backends, namely Rust and C/C++ use LLVM to generate WebAssembly code. The reason they are able to do this successfully is because the languages already have compiler frontends that go from the source language to the LLVM IR. With respect to the Futhark compiler, instead of emitting a standard compiler intermediate representation, it elects to generate C code that can in turn be compiled with any standard C compiler. This has the advantage of portability in the sense that it doesn't depend on LLVM which doesn't come standard on all Windows, and Linux distributions. ({\color{red} TODO}: rework this paragraph to deal with overlap with item 2 below.)
%For our implementation we have three main choices for generating WebAssembly:
\begin{itemize}
\item Generating WebAssembly directly:
This is the least attractive of all the options. Writing low level code is time consuming and also inefficient, in the sense that it would be a huge investment to capture the optimizations expected of a modern compiler.
\item LLVM IR: This is the most standard approach among compilers. However since Futhark doesn't already generate LLVM IR utilizing this approach would effectively require a redesign of the compiler architecture. This would be an interesting project, with possible positive implications in more places than just a WebAssembly backend. However this would be quite a large task, as the Futhark frontend wasn't designed with this specific intermediate representation in mind.
\item Emscripten: Emscripten can simply take C source code generated from the Futhark compiler. This approach would require the fewest modifications to the compiler. Futhark also aims to emit high performance C code, and Emscripten aims to translate C to high performance WebAssembly code. Connecting the two compilers together is the approach that is most logical.%OBS
\end{itemize}
We chose to go with the Emscripten approach. We started by running Futhark's generated C code through Emscripten and surfaced a handful of issues that we describe in the following.
Futhark's C backend generates over 2200 lines of C code when compiling even a minimal function like the increment function from listing \ref{lst:simp}. This is because it contains logic for option parsing, logic for parsing input and formatting output, and library functions for mathematical operations.
One issue was in platform specific code used for timing but on further inspection it turned out that this code was a relic and not actually used anywhere in the compiler, and therefore could just be deleted.
Listing \ref{lst:int128} gives the other function implementation that Emscripten couldn't compile.
\begin{listing}[h]
\inputminted[fontsize=\small,baselinestretch=0.5,linenos]{C}{code/compiler/int128_c.c}
\caption{64 bit multiplication with 128 bit casting}
\label{lst:int128}
\end{listing}
The issue is that Emscripten and WebAssembly don't support \texttt{uint128} types. Instead an alternate implementation for calculating the high order bits of 64 bit multiplication that doesn't cast to \texttt{uint128} numbers is used. ({\color{red} TODO} say where we get this implementation from)
\begin{listing}[H]
\inputminted[fontsize=\small,baselinestretch=0.5,linenos]{C}{code/compiler/int128_wasm.c}
\caption{64 bit multiplication without 128 bit casting}
\label{lst:integrate-js}
\end{listing}
With the small modifications described to get the generated C to a state where it can be compiled with the Emscripten compiler, Futhark programs can be compiled to WebAssembly and run with Node.js as executables. Note that this is specifically for executables and not C libraries. However we are primarily concerned with getting Futhark running in the browser as a library such that web developers can design high performance libraries in Futhark and then call them from JavaScript in the browser.
%\subsection{JavaScript and WebAssembly number types}
\section{Library Implementation}
In order use the JavaScript and WebAssembly code that is generated by Emscripten as a library, the C library functions need to be exported during the Emscripten compilation process. The library functions that need to be exported are the ones that the Futhark C backends emit into the C header file when the futhark program is compiled as a library. Again this whole process works almost out of the box when connecting Futhark's C library code generator with the appropriate Emscripten command. However an issue is that \texttt{int64} types are hard to work with when they are provided argument types to functions. numbers in JavaScript are 64-bit floating-point values. This means that all 32 bit numbers can be represented in JavaScript, but not all 64 bit numbers can be represented with full precision. By default Emscripten solves this by passing two arguments to JavaScript functions, one for the low order 32 bits, and another for the high order 32 bits. Instead we take an alternative approach offered by Emscripten with the \texttt{WASM\_BIGINT} compiler flag. This gets 64-bit integers working in C with the caveat that when the function is called from JavaScript, the argument must be provided as a JavaScript BigInt.
With these fixes a programmer familiar with the semantics of the Emscripten heap has a workable WebAssembly library. Emscripten generates two files when compiling C code. One is a WebAssembly module, which contains the instructions for computing the functions described in the C source code. The other file is JavaScript glue code, which is used to take care of loading the WebAssembly file, as well as handling any JavaScript calls that the WebAssembly module might use.
The WebAssembly module has methods for each function that appears in our .c file, plus some system library functions like malloc and free. To call these functions from JavaScript we must tell Emscripten with the \texttt{-s EXPORTED\_FUNCTIONS} compiler flag to export them. The method names are all prefixed with an underscore.
\subsection{Interacting with C functions}
Given all this we can now work with Futhark from JavaScript and do the equivalent of listing \ref{lst:c-api-incr}. Listing \ref{lst:c-api-incr} showed how to call Futhark's generated C library from the \textit{increment.fut} example. After compiling the C library to WebAssembly with Emscripten, the library can be called with the JavaScript equivalent shown in listing \ref{lst:raw_incr}.
\begin{listing}[H]
\inputminted[fontsize=\small,baselinestretch=0.5,linenos]{JavaScript}{code/compiler/api_examples/raw_incr.js}
\caption{Working with raw Emscripten for the increment function}
\label{lst:raw_incr}
\end{listing}
This is quite cumbersome for a function that simply takes a single integer argument and returns a single integer.
The program is basically a one to one match of the C equivalent from listing \ref{lst:c-api-incr}, including passing around pointers, which is unsafe and not idiomatic in JavaScript.
The allocation and freeing of \texttt{out\_ptr} in lines 5 and 8 are even worse than the equivalent C. They are needed because the output 32-bit integer parameter can only write to the Emscripten heap and therefore the caller must explicitly allocate and free the 4 bytes, and copy out the result from the heap using the HEAP32 heap view.
This isn't the case in C because of the shared address space between the caller and the library function, allowing the caller to pass a pointer to a stack allocated result variable.
The picture gets even worse when the Futhark entry points accept and return array arguments.
These necessitate copying array contents to and from the Emscripten heap, exacerbating the C like manual memory management which is a poor fit in JavaScript.
We now illustrate how to work with the more complex \textit{scale.fut} from listing \ref{lst:c-api}. As a reminder the futhark function takes a 2d array of floats and a scalar float as input and returns a 2d array of floats that is element wise scaled by the scalar.
\begin{listing}[H]
\inputminted[fontsize=\small,baselinestretch=0.5,linenos]{JavaScript}{code/compiler/api_examples/raw.js}
\caption{Working with raw Emscripten }
\label{lst:raw}
\end{listing}
The JavaScript is as tedious as the original C code with respect to manual memory management and, like in the previous example, needs to allocate and free memory on the Emscripten heap to retrieve the \texttt{out\_ptr} output parameter. In this example the output parameter is an array pointer. Pointers default to 32-bit signed integers in WebAssembly.
%\footnote{{\color{red} TODO}: wasm64}
Yet another complication is error handling. In listings \ref{lst:raw_incr} and \ref{lst:raw} we omitted checking the return error code from the entry point functions and extracting the error message.
These pain points were motivating reasons for the design of the API. Working with the raw functions from Emscripten is only accessible to programmers with a strong understanding of memory in WebAssembly. This isn't reasonable as inter-operation between Futhark and JavaScript is a determining factor in whether Futhark is a good language for writing code to run in the browser. For Futhark to be practical the implementation details of the raw pointers to the heap should hidden by a layer of abstraction. This is precisely what the API designed in Chapter 4 accomplished.
In the following we will illustrate with examples how our Javascript API wraps the WebAssembly module and JavaScript glue code generated by Emscripten.
The simplest case is when all entry point inputs and outputs are scalar as in the increment function. In this case the following FutharkContext class meets the specification of the API.
\begin{listing}[H]
\inputminted[fontsize=\small,baselinestretch=0.5,linenos]{JavaScript}{code/compiler/api_examples/FutharkContext_incr.js}
\caption{FutharkContext class for the futhark increment function}
\label{lst:FutharkContext_incr}
\end{listing}
The class has two fields \texttt{cfg} and \texttt{ctx}, that point to the config and context and are created in the constructor by calling functions from the C API. Observe how the increment methods successfully hides all the tedious details needed to call the underlying exported WebAssembly function \texttt{\_futhark\_entry\_increment}. The reason we can do this is because it has access to the \texttt{ctx} class field. Without a class this was something that the programmer was forced to pass around. This is why the API was designed to access Futhark entry point functions as methods on the FutharkContext class. The other important benefit of the wrapper is that the results are simply returned instead of the cumbersome heap allocation and copying required to call the WebAssembly function.
\subsection{FutharkArray}
The FutharkArray class represents any array regardless of dimension and element type, somewhat like Python ndarrays. Listing \ref{lst:FutharkArray} below shows our implementation.