Skip to content

Commit

Permalink
Update analyze-us-census-data-with-scipy.mdx
Browse files Browse the repository at this point in the history
  • Loading branch information
sonnynomnom authored Jan 16, 2025
1 parent bc2ffdf commit de7865b
Showing 1 changed file with 11 additions and 6 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ For the categories listed, each dataset contains the following columns, which ar
- **Bachelor's Degree**: the total number of individuals who have earned a Bachelor's Degree, typically after completing four years of undergraduate education at a university or college
- **Graduate or Professional Degree**: the total number of individuals who have earned a Master's Degree, Doctoral Degree (Ph.D.), or other professional degrees such as a Law Degree (J.D.) or Medical Degree (M.D.)

In this tutorial, we'll use SciPy to run some analysis and find out whether there are statistically significant differences in relocation patterns for each group - but first, let’s review the basics.
In this tutorial, we'll use SciPy to run some analysis and find out whether there are statistically significant differences in relocation patterns for each group but first, let’s review the basics.

## Some Basic Stats

This comment has been minimized.

Copy link
@jules-kris

jules-kris Jan 21, 2025

Contributor

line 133: For this tutorial, we'll be running t-tests.

I think it would be helpful to write a quick sentence explaining what a t-test is, or link to a reference!

Expand Down Expand Up @@ -189,19 +189,18 @@ state["Relocated Between States"] = variant.groupby("State")["Total Population"]

state.head()
```

Comparing California residents to those from New York only, **is there a significant difference in mobility between those that relocated within the same** area (in this case, state) **versus those that moved across state lines?**

We'll use the .loc method to search for the two states and extract the summed values that we calculated in the exercise above.
We'll use the `.loc[]` method to search for the two states and extract the summed values that we calculated in the exercise above.

```python
cny = state.loc[["California", "New York"]]

cny
```
<RoundedImage
link="https://i.imgur.com/IR9CX8c.png"
description="U.S. Census Data Analysis"
/>

![U.S. Census Data Analysis](https://i.imgur.com/IR9CX8c.png)

```python
t_stat, p_value = stats.ttest_ind(cny["Relocated Within State"], cny["Relocated Between States"])
Expand Down Expand Up @@ -313,3 +312,9 @@ So what have we learned?? We've learned that:
Why does this matter? It matters because it demonstrates that there's actually a sound and scientific method for answering these questions when they come up. Feel free to try your hand at doing the same the next time you run into an interesting dataset.

This comment has been minimized.

Copy link
@jules-kris

jules-kris Jan 21, 2025

Contributor

It would be great to add some ideas for how this data could be used in a project! How could this data be visually represented? Or, how do data scientists work with this kind of data to create visual output and communicate info to broader audiences? No need to actually create that output in this PT, but I think it would be great to briefly touch on what learners could do with these types of datasets in the future or even in their careers! This would contextualize this tutorial in the larger world of data science.


Thanks for coding with us!

### More Resources

- [Source Code](https://colab.research.google.com/drive/1ujk1u0TWqlNolFwv9-rUNMjaghZuLLZK)
- [NumPy course](https://www.codedex.io/numpy)
- [A/B Testing](https://www.oracle.com/cx/marketing/what-is-ab-testing)

0 comments on commit de7865b

Please sign in to comment.