This workshop is a joint collaboration between the Southern California R Users Group (SoCal RUG) and the UCI Paul Merage School of Business, Masters of Science in Business Analytics (MSBA)
In this workshop, a basic introduction to Python will be presented covering fundamentals of Python programming and practical data science skills using the pandas
Python library.
- 2022-07-28 5 PM to 8 PM, Pacific Time
- 2022-07-29 1 PM to 4 PM, Pacific Time
Meeting ID: 946 6181 1499
Passcode: 398566
- Sign up for a free RStudio Cloud account.
- 3 hours each day, 2 days total
- Divide into 40 min sessions
- 15 min instruction
- 15 min practice in breakout rooms
- Teaching assistants will be assigned to breakout rooms
- 10 min review & questions
- 4 sessions per day (160 min for sessions + 10 min break + 10 min wrap-up)
- Eight sessions total for both days
- RStudio Cloud Project. This is where you will do your coding.
The two-day workshop will be presented as 4 sessions each day, where each session is roughly divided into 15 minutes of instruction, 15 minutes of practice and exercises, and 10 minutes of review (~40 min per session).
The focus will be on Python as a language, drawing from the Python docs.
Instructor: Bryan Smith, Ph.D., Data Scientist at Google.
- Session 1: Using JupyterLab Notebooks
- What is a notebook, why are they useful?
- JupyterLab interface
- Working with cells (creating, executing, cell types, etc)
- Tips and best practices
- Session 2: Review of Python Fundamentals
- Importance of spacing
- Expressions and variables
- Math operations
- Data types (numbers, strings, boolean)
- Lists
- Session 3: Control Flows
- Conditional statements
- Loops
- Session 4: Functions
- What are they and why are they important
- Function syntax
- How to write your own functions
- Tips and best practices
The focus will be on Pandas as the entry into data science specific tasks, drawing from the getting started tutorials.
- Session 5: Introduction to
pandas
- Why tabular data tables are useful for data science (compare to Excel)
- Series and DataFrames
- How to create Series and DataFrames
- How to read Series and DataFrames from files
- Session 6: Subsetting
DataFrames
- Selecting columns
- Filtering rows
- The various ways of indexing data frames (by labels, slices, conditional expressions),
loc
andiloc
- Session 7: Reshaping and Merging
DataFrames
- Wide vs. long formats and converting between the two: pivot and melt
- Grouped summaries,
groupby
- Concatenating tables by column and row:
concat
- Joining data tables:
merge
- Session 8: Data Visualization with
pandas
- Basic plotting from pandas:
plot
,scatter
,box
,hist
, etc - Examples of more complex plots, coloring and grouping by variables
- Tuning plot parameters (sizes, colors, layouts)
- Saving plots (e.g. to use in presentations, etc)
- Basic plotting from pandas: