Sharpen your Data Science skills with this is a hands-on workshop on Nonparametric techniques in R.
The classical statistical analysis is performed under the assumption that the underlying distribution of a variable of interest is known, or the sample size is sufficiently large (at least 30) so that the Central Limit Theorem starts working. This type of analysis is termed Parametric Statistics because all that needs to be done is to find estimates of distribution parameters and draw the inference based on these estimates. If, however, nothing is known about the underlying distribution and the sample size is small (less than 30), there will be no parameters to be estimated and a completely different methodology has to be applied to do statistical inference. The collection of methods in this case is called Nonparametric Statistics.
In this presentation, we will go over several most widely used nonparametric techniques. For each technique, just enough theory will be explained, followed by a thorough example or two, and then the audience will work on an exercise to reinforce understanding of the material.
The workshop is designed to be hands-on. Participants are required to bring laptops and be ready to write R, analyzing data and interpreting results. For each model, we present an example with a complete R code, and then will an exercise to work on. Workshop participants should be familiar with algebraic expressions of different probability distributions, and have a fundamental knowledge of simple linear regression: normally distributed random error, continuous and categorical independent variables (requiring creating dummy variables).
Dr. Olga Korosteleva, is a professor of Statistics at the Department of Mathematics and Statistics at California State University, Long Beach (CSULB). She received her Bachelor’s degree in Mathematics in 1996 from Wayne State University in Detroit, and a Ph.D. in Statistics from Purdue University in West Lafayette, Indiana, in 2002. Since then she has been teaching mostly Statistics courses in the Master’s program in Applied Statistics at CSULB, and loving it!
Dr. Olga is an undergraduate advisor for students majoring in Mathematics with an option in Statistics. She is also the faculty supervisor for the Statistics Student Association. She is also the Treasurer of the Southern California Chapter of the American Statistical Association (SCASA). Dr. Olga is the editor-in-chief of SCASA’s monthly eNewsletter and the author (co-author) of five statistical books.
When: Feburary 26, 2022
- Saturday: 8:30 AM - 04:30 PM
Where:
University of California, Irvine -- Paul Merage School of Business
4293 Pereira Drive
Irvine, CA 92617
- Google Maps
- Directions & Parking Information. You pay for parking at the entrance. To get a full day pass, go to the second screen on the kiosk. It is quite expensive to pay by hour.
- Room
- Building: SB1, Room 2100
Registration
- Cost: $10
- Register through EventBright
- All participants must register for the event and have a valid ticket to attend. If there is space you can register at the door.
- All participants must abide by the SoCal RUG Code of Conduct, including the R Consortium and the R Community Code of Conduct.
- Connect to SSID: UCInet Mobile
- Go to https://oit.uci.edu/reg
- register your device as a guest
If you have problems, please call OIT support line at (949) 824-2222 option 3 or the Merage IT support line (949)824-0852
You have two options for working with the code examples and exercises for the workshop:
- Download and install R and RStudio (if you haven't already)
- Download the examples and exercises code from the workshop GitHub repository: https://github.com/ocrug/regression_models_2021-02-09
- If you don't know how to use Git, download the course files by clicking the green "Code" button and select "Download ZIP".
- If you do know how to use Git, clone the repo to your computer
- Unzip the files (or go to the directory where you cloned the repository), and double click the file called
project.Rproj
. This will start RStudio and you can see the examples and exercises code in the two folder called examples and exercises. - Install the following packages
bootstrap
devtools
exactRankTests
fANCOVA
magrittr
rstudioapi
stats
stringr
tibble
tidyverse
- Create a free account on RStudio Cloud: https://rstudio.cloud
- Go to the workshop project: https://rstudio.cloud/project/3583424
- At the top of the project window, Click "Save a Permanent Copy" — it's by the flashing red "Temporary Project" sign.
- The project and all its files will now be in your own Personal workspace. You have 25 free hours per month using RStudio Cloud.
SoCal RUG GitHub Repo: https://github.com/socalrug/
You will be able to access all the slides, code, and other course material in the course repository. You can download the content as a ZIP file. If you use this method you will not be able to get updates that will be made unless you download the content again. Another alternative is to clone the git repo. To do this you will need to install git. You can find instructions on doing that here.
Please install git and clone the following repo before the event and pull before the start of the event
command:
git clone https://github.com/socalrug/nonparametric_analysis_2022-01-22.git
Event Repo: https://github.com/socalrug/nonparametric_analysis_2022-01-22
Since this event depends on you have an R setup that is functional with the correct packages and version of R, we highly recommend that you run the check_setup.r before the event. If you have issues, please reach out to us in the slack channel (see above) to get help.
A slack channel has been set up for the event. This will be used for general announcements but it is also a great source for you to ask questions to other participants.
If you have not created an account on our slack group, create one using the following link:
Slack Group Sign-up: https://tinyurl.com/socalrug-slack-signup
Once you have an account, sign in (you can do it on a web browser or download an app on your phone or desktop).
Slack channel: https://tinyurl.com/socalrug-slack
The channel for the course is nonparametric-2022
Due to COVID restrictions we will not be able to provide lunch. The University Town Center (UTC) is located within a 5-minute walk of the classroom. There are many restaurants there. There will be a 1.5 hour break.
Start | End | Activity |
---|---|---|
08:30 | 09:00 | Sign-in |
09:00 | 09:30 | Introduction and computer setup |
09:30 | 10:30 | Wilcoxon Rank-sum test |
10:30 | 11:10 | Kolmogorov-Smirnov test |
11:10 | 11:25 | Break |
11:25 | 12:00 | Kruskal-Wallis H-test |
12:00 | 12:30 | Spearman Rank Correlation Coefficient |
12:30 | 02:00 | Lunch |
02:00 | 02:40 | Fisher Exact test |
02:40 | 03:20 | Loess Regression |
03:20 | 04:15 | Bootstrap method of estimation |
04:15 | 04:30 | Wrap-up |