These are the collections of research projects
and software portfolios
that are categorized based on the topics relevant in the field of artificial intelligence and machine learning. Also included are coding exercises
that demonstrates algorithmic problem solving skills.
- About Me
- Mostly Used Languages
- AI/ML Research Projects
- Extrapolative behavior of ML models
- Automated end-to-end causal inference application
- Human extrapolative behavioral experiment via web application
- The human part of the extrapolation project to take comparative measures on investigating human like behaviors from various ML algorithms such as random forest and deep neural networks. In order to deliver the behavioral experiments to 150 participants recruited on AWS, the web app was created and hosted on heroku.
-
- Probabilistic linkage of real-world clinical data
- In order to link multiple datasets acquired from different sections of the JHU medicine network, this data management pipeline was built in support of collective mission of CDEM. The pipeline uses
fastlink
R package to automate probablistic linkage of multiple datasets which bases on expectation-maximaztion algorithm. -
- In order to link multiple datasets acquired from different sections of the JHU medicine network, this data management pipeline was built in support of collective mission of CDEM. The pipeline uses
- Multivariate time-series hologram biometric signal parsing
- Coding
[Add contents here]
The projects that involve machine learning and artificial intelligence are listed here.
Github Repo: https://github.com/jshinm/inductive-bias-experiment
One of the purposes of machine learning models is a prediction of trend based on the patterns of given data. The issue is that these ML models are interpolative by nature and does not perform well as extrapolators. Despite that, ML models are widely used for forecasting uncharted territory. This project tests examines extrapolative behaviors of the ML algorithms such as random forest
, neural networks
, support vector machines
and measure performance against humans.
![](https://github.com/jshinm/inductive-bias-experiment/raw/main/figures/%5B20210820_generate_pub_figures%5D_figure1_sxor_2021-09-08.jpg?raw=true)
These ML algorithms are trained on non-linear simulation datasets (gaussian XOR shown above) and their posterior probability distributions are drawn out in a form of a grid for comparison. The line plot is indicative of increasing Hellinger distance for neural nets
posteriors more so than both random forest
and humans
as we move further away from the origin, which suggests that the latter algorithm is more similar to the former algorithm in this experiment.
![](https://github.com/jshinm/inductive-bias-experiment/raw/main/figures/%5B20210518_matching_grid%5D_fullplot_animated_spiral_2021-11-02.gif?raw=true)
As these non-linear datasets are not space-invariant, we can assess the posterior in a piece-wise manner. The linear evaluation with a function of angle reveals more drastic difference between neural nets
and random forest
where the former algorithm reaches the limit of posterior much faster than that of the latter indicating that neural nets
is not only misrepresentative of spiral simulation estimation but it is also more confident in its decision.
Github Repo: --
< content here >
Github Repo: https://github.com/jshinm/deepnet-behavioral
< content here >
Github Repo: https://github.com/jshinm/probabilistic-linkage
< content here >
Github Repo: https://github.com/jshinm/hologram-biometric-signal-parsing
< content here >
Current and past software projects.
Project | Description | Code |
---|---|---|
WebApp for machine versus human extrapolation experiment | Desc1 | Code1 |
Web Scrapper for data mining on GitHub repository | Desc2 | Code2 |
P&L generator for tax report | Desc2 | Code2 |
Pandarize | Desc2 | Code2 |
Amortized loan simulator | Desc2 | Code2 |
Omega Messenger | Desc2 | Code2 |
FlipScope | Desc2 | Code2 |
KeeWee | Desc2 | Code2 |
Some data science side projects for Kaggle challenges
Project | Code |
---|---|
Item1 | Code1 |
Item2 | Code2 |
The following is the programming exercise that covers various algorithms
and data structures
. There is a dedicated section for sorting
. Also contains is sql
and bash/shell
coding challenges.
Name | Example |
---|---|
Heap | Note |
Sorting (simulation notebook)
Name | Best TC | Average TC | Worst TC | Worst SC | Stability |
---|---|---|---|---|---|
Bubble Sort | Ω(N) | Θ(N^2) | O(N^2) | O(1) | Stable |
Selection Sort | Ω(N^2) | Θ(N^2) | O(N^2) | O(1) | |
Insertion Sort | Ω(N) | Θ(N^2) | O(N^2) | O(1) | Stable |
Shell Sort | Ω(N log N) | Θ(N log^2 N) | O(N log^2 N) | O(1) | |
Heap Sort | Ω(N log N) | Θ(N log N) | O(N log N) | O(1) | |
Merge Sort | Ω(N log N) | Θ(N log N) | O(N log N) | O(N) | |
Quick Sort | Ω(N log N) | Θ(N log N) | O(N^2) | O(logN) | |
Counting Sort | Ω(N+K) | Θ(N+K) | O(N+K) | O(K) | |
Tree Sort | Ω(N log N) | Θ(N log N) | O(N^2) | O(N) | |
Tim Sort | Ω(N) | Θ(N log N) | O(N log N) | O(N) | |
Smooth Sort | Ω(N) | Θ(N log N) | O(N log N) | O(1) | |
Radix Sort | Ω(NK) | Θ(NK) | O(NK) | O(N+K) |
Database [SQL (syntax note) || Python]
Problem Name | Platform | Language |
---|---|---|
Combine Two Tables | LeetCode | SQL |
Second Highest Salary | LeetCode | SQL |
Nth Highest Salary | LeetCode | SQL |
Rank Scores | LeetCode | SQL |
Consecutive Numbers | LeetCode | SQL |
Employees Earning More Than Their Managers | LeetCode | SQL |
Duplicate Emails | LeetCode | SQL |
Customers Who Never Order | LeetCode | SQL |
Department Highest Salary | LeetCode | SQL |
Department Top Three Salaries | LeetCode | SQL |
Delete Duplicate Emails | LeetCode | SQL |
Rising Temperature | LeetCode | SQL |
Trips and Users | LeetCode | SQL |
Big Countries | LeetCode | SQL |
Classes More Than 5 Students | LeetCode | SQL |
Human Traffic of Stadium | LeetCode | SQL |
Not Boring Movies | LeetCode | SQL |
Exchange Seats | LeetCode | SQL |
Swap Salary | LeetCode | SQL |
Reformat Department Table | LeetCode | SQL |
SqlEventsDelta | Codility | SQL |
SqlWorldCup | Codility | SQL |
Weather Observation | HackerRank | SQL |
SQL Project Planning | HackerRank | SQL |
Interviews | HackerRank | SQL |
15 Days of SQL | HackerRank | SQL |
Japanese Population | HackerRank | SQL |
Aggregation | HackerRank | SQL |
Acceptance Rate By Date | StrataStratch | Python |
Highest Energy Consumption | StrataStratch | Python |
Finding User Purchases | StrataStratch | Python |
Popularity Percentage | StrataStratch | Python |
Highest Cost Orders | StrataStratch | Python |
Users By Avg Session time | StrataStratch | Python |
Top 5 States With 5 Star Businesses | StrataStratch | Python |
Finding Updated Records | StrataStratch | Python |
Risky Projects | StrataStratch | Python |
Number Of Bathrooms And Bedrooms | StrataStratch | Python |
Customer Details | StrataStratch | Python |
SMS Confirmations From Users | StrataStratch | Python |
Customer Revenue In March | StrataStratch | Python |
Find the rate of processed tickets for each type | StrataStratch | Python |
Find the overall friend acceptance count for a given date | StrataStratch | Python |
Daily Interactions By Users Count | StrataStratch | Python |
Successfully Sent Messages | StrataStratch | Python |
Popularity of Hack | StrataStratch | Python |
Most Active Users On Messenger | StrataStratch | Python |
Average Salaries | StrataStratch | Python |
Spam Posts | StrataStratch | Python |
Total Cost Of Orders | StrataStratch | Python |
Classify Business Type | StrataStratch | Python |
Top Cool Votes | StrataStratch | Python |
Order Details | StrataStratch | Python |
Workers With The Highest Salaries | StrataStratch | Python, SQL |
Reviews of Categories | StrataStratch | Python, SQL |
Highest Salary in Department | StrataStratch | Python, SQL |
Distances Traveled | StrataStratch | Python, SQL |
Gender with Generous Reviews | StrataStratch | Python, SQL |
Rank Variance Per Country | StrataStratch | SQL |
Users By Average Session Time | StrataStratch | SQL |
Finding User Purchases | StrataStratch | SQL |
Highest Cost Orders | StrataStratch | SQL |
Total Cost of Orders | StrataStratch | SQL |
Ranking Most Active Guests | StrataStratch | SQL |
Algorithm Performance | StrataStratch | SQL |
Problem Name | Platform |
---|---|
Comparing Numbers | HackerRank |
Comparing Strings | HackerRank |
Loop and Skip | HackerRank |
Arithmetic Operations | HackerRank |
Compute Average | HackerRank |
Cut Command | HackerRank |
Text Processing | HackerRank |