Project: Predicting Usage of a Video Game Research Server¶
This year we have a real data science project with real stakeholders (new this semester!)
+-
+
- A group in CS at UBC is interested in understanding how people play video games +
- They've set up a server running MineCraft and are recording play sessions +
- Running the server is not easy: need to have the right hardware resources, software licenses, recruiting efforts, etc. +
- The data:
-
+
- Player skill level, demographic information +
- Past play sessions +
+
Your task¶
Formulate and answer a predictive question about the data. Present the full analysis, from reading the data to communicating results.
+-
+
- What is classification?
-
+
- Predict a class/category for a new observation/measurement +
- Using past observations with known class/category +
- Learn more in lecture 6 & 7! +
+
-
+
- What is regression?
-
+
- Predict a numerical value for a new observation/measurement +
- Using past observations with known numerical value +
- Learn more in lecture 8 & 9! (you will have to read ahead if you want to do this!) +
+
Deliverables¶
-
+
- Team Contract +
- Project Planning Stage (Individual)
-
+
- all the project details are in this item on Canvas! +
+ - Final Project Report +
See dates on Canvas.
+One Important Note¶
This is new, untested real data. So the final conclusion of your project might be "we couldn't get anything out of this data."
+That's totally fine and is just as valuable as a report that says "wow look at all the cool things we learned!".
+Just make sure you:
+-
+
- Critically analyze what happened: why things worked, or why they didn't +
- Come up with suggestions for next steps ("collect this other data to actually see a useful signal!" or "this data has a lot of info left over, maybe try a fancier model like X!") +
Create Repo with Template¶
-
+
Template URL: https://github.com/UBC-DSCI/dsci-100-project_template
+
+Click on "Use this template"
+
+Why are we doing this?
+-
+
- jupyter creates temp "checkpoint" files (for backup in case issue) in a folder
.ipynb_checkpoints
+ - We don't want to version control this file becuase it's not the actual work you want to track in version control +
- You only want to version control files that matter, not temporary and backup files +
+- jupyter creates temp "checkpoint" files (for backup in case issue) in a folder
The
+.gitignore
file will "disappear" on jupyter lab, but you will see it in github
+Using Git + Github is optional for the project, but it's by far the easiest way to collaborate effectively.
+
+
Give the project a name¶
Project Template Setup Recap¶
-
+
- Go to the template repository + +
- Click "Use this template" and create a repository with the template repository +
- Clone the repository from the owner's repo into their JupyterHub
-
+
- https://datasciencebook.ca/version-control.html#cloning-a-repository-using-jupyter +
- Note: Put this in your home folder, DO NOT clone into your
dsci-100-student
folder. If you do, move it out.
+ - multiple people can have the same repo name, but within any given user, all the repo names need to be unique +
+
Activity #1: Explore Datasets - Preliminary¶
project assignment details on Canvas
+-
+
- Using what you have learnt in weeks 1-4, read the dataset, take a look at it, and write a short description about the dataset. +
- Some questions you should try to answer:
-
+
- What is the dataset about? +
- How many variables are there? +
- How many observations are there? +
+
Activity #2: Explore Datasets Part 2 - Outcome Variable¶
-
+
- Try to answer these questions now:
-
+
- Identify the main outcome/categorical/label variable in the dataset. +
- How many values/groups are in this variable? +
- How many observations are there in each value/group? +
+ - Tip: Think about how you are organising your workbook: add more code and markdown cells (and arranged them!) to keep your notebook neat +
Activity #3: Explore Datasets Part 3 - Visualisations!¶
-
+
- Make some visualisations of the outcome variable:
-
+
- What does the distribution of the variable look like? +
- What relationship does it have with some of the other variables? +
+ - Tip: Try using a range of box plots, scatterplots, bar charts, line graphs, etc. +
+