Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hw10 is ready for grading #9

Open
Mathnstein opened this issue Dec 5, 2017 · 3 comments
Open

Hw10 is ready for grading #9

Mathnstein opened this issue Dec 5, 2017 · 3 comments

Comments

@Mathnstein
Copy link
Owner

Mathnstein commented Dec 5, 2017

@vincenzocoia @gvdr @ksedivyhaley @JoeyBernhardt @mynamedaike @pgonzaleze @derekcho

@hsmohammed
Copy link

Hello,

Good job on doing homework 10. I liked what you did on extracting rating data from IMDB website and using the output in generating relationships between the movie year and its rating and between the movie title length and its rating. We can see from your analysis that there good movies produced in almost every year. Good job on using scrapping and using the gsub() function to clean your data. My only comment is that you didn't put a markdown file on your repo instead of using the pdf format. the markdown format is better on github in my opinion. Overall very good job and I was happy to review your work.

Thank you,
Hossameldin Mohammed

@emilymistick
Copy link

Hi @Mathnstein,

Nice job on this homework!

I was able to download your .Rmd and reproduce the analysis.

Your scraping method is correct and concise as you extract the movie title, ranking, and year from the HTML of the url of interest. You save the data as .csv then load it back in for a small plotting analysis. The analysis is quite simple, but sufficient for the assignment, and the results were interesting. It would be nice also to include a subset of the data table for the reader of the report to see. It would be cool to know what the top few movies are just from the .Rmd report without having to open the .csv.

Overall nice work with web scraping! I've never used gsub() before and will try to remember that option in the future, looks quite useful.

Thanks,
Emily

@derekcho
Copy link

derekcho commented Dec 21, 2017

Hi @Mathnstein, here are some comments about your hw10:

Task(s) selected: Scrape data
Data stored as file ready for downstream analysis: Yes
Basic Exploration: Yes
Reflection: Yes

  • It isn’t clear what the results of your scraping are. You should show an example of the clean data in a table in the report! However, it looks like the scraping of web data was successful
  • Interesting plots, but it seems like your conclusions could be incorrect without further exploration. Although there appear to be higher rated movies in recent years, there could also be more movies in recent years too! Not sure why length of a movie title would have any effect on its rating though.
  • Your assignment hits the required elements, however it feels like you could have dug a little deeper with the scraping. For example, perhaps find other datasets with other variables for these movies like box office earnings. The limited dataset really limits the amount of meaningful exploration that you can do.
  • Good work overall in STAT 545A and STAT 547M!

Your grade will be emailed to you at a later date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants