output | ||||
---|---|---|---|---|
|
The OCRUG Hackathon 2019, hosted by the Orange County R Users Group and the UCI Paul Merage School of Business is a two day event where we will "hack" a data set for fun, education, and prizes. The focus of the event is on education and teamwork, with the main goal of taking a data set from its "raw" form all the way through to a final "product" (e.g. visualization, model, insight). To frame this process, we will have light competitive aspect where teams will present their work at end of the event to a panel of judges, with prizes awarded in several categories (see below).
The event will start with a series of practical educational tutorials to get you started with fundamental data analysis with the R programming language, followed by working sessions where teams will explore and analyze the data set in preparation for the team presentations. Participants will work in small teams (2 - 5 people). Teams can either be pre-arranged by participants themselves, or will be assigned at the start of the event.
This event is open to data scientists, enthusiasts and hackers of all levels, from the beginner to the highly experienced. If you are a beginner, it may be helpful to do some preparatory learning before the event — see the suggested resources below. If you are an experienced user, we look forward to you sharing your expertise with others. Assisting others, both within and between teams, is highly encouraged.
- The hackathon is primarily an educational event, not a competition. However, the hackathon is framed in the context of a light competition to provide overall structure, including team-based collaboration, the presentation of final work, judging, and prizes.
- Novice Users: provide an opportunity to work with a real-world data sets from start (acquire the data) to finish (produce a final “report” on the findings from their work).
- Experienced Users: provide an opportunity to practice data analysis skills in a structured environment, interact with others, and assist new users.
When: May 18 and 19, 2019
- Saturday: 8:30 AM - 10:00 PM
- Sunday: 8:30 AM - 4:00 PM
Where: University of California, Irvine -- Paul Merage School of Business
- Google Maps
- Directions & Parking Information
- Rooms
- SB1 2321 - Main event room
- SB1 3100 - break out room - meeting room
- SB1 3104 - break out room - meeting room
- SB1 3107 - break out room - quite room
- SB1 3313 - break out room - storage
- SB1 3rd floor patio - meals
Registration
- Cost: $30
- Register through EventBright
Registration is currently full, but please sign-up on the waitlist!
Time | Event |
---|---|
08:30 AM | Registration starts |
08:30 AM - 09:00 | Breakfast |
09:00 AM – 09:45 AM | Tutorial - Data manipulation with tidy tools |
09:45 AM - 10:30 AM | Tutorial - Data visualization with ggplot2 |
10:30 AM - 10:45 AM | Break |
10:45 AM - 11:30 AM | Tutorial - Data modelling with caret |
11:30 AM - 12:30 PM | Tutorial - Using the AWS Console |
12:45 PM – 01:15 PM | Lunch |
01:00 PM | Registration closes |
01:15 PM – 01:45 PM | Welcome talk, data set overview, groups formed |
01:45 PM – 05:45 PM | Working Session |
05:45 PM – 06:30 PM | Dinner |
06:30 PM – 07:30 PM | Discussion Session |
07:00 PM - 10:00 PM | Working Session |
10:00 PM | Building automatically locks |
Time | Event |
---|---|
08:30 AM - 09:00 AM | Breakfast |
09:00 AM – 12:00 PM | Working session |
12:00 PM | Most Helpful Person Award voting opens |
12:00 PM – 01:00 PM | Lunch |
01:00 PM – 02:00 PM | Groups prepare presentations |
02:00 PM | Voting for Most Helpful Person Award closes |
02:00 PM – 03:00 PM | Group presentations |
03:00 PM – 03:30 PM | Judges discuss and select winners |
03:30 PM – 04:00 PM | Award presentation & wrap-up |
- All participants must register for the event and have a valid ticket to attend.
- All participants must abide by the OCRUG Code of Conduct, including the R Consortium and the R Community Code of Conduct.
- Participants are free to come and go during the event. However, any participant who has not checked-in, in person, by 01:00 PM on Saturday will be considered a "no-show" and their spot may be given to someone else.
- Though this is an R focused event, participants are free to use any programming language or tool for their work.
- Participants are free to work on their projects both onsite at the Hackathon and offsite, though we highly encourage participants to attend all working sessions to maximize team and group interactions.
- We ask that the final submissions from the teams are a result of work performed during the event. Please do not use any previous work you or others may have produced as part of team submissions.
- Connect to SSID: UCInet Mobile
- Go to https://oit.uci.edu/reg
- register your device as a guest
If you have problems, please call OIT support line at (949) 824-2222 option 3
OCRUG GitHub Repo: https://github.com/ocrug/
Please install git and clone the following repo before the event and pull before the start of the event
command:
git clone git@github.com:ocrug/hackathon-2019.git
Hackathon Repo: https://github.com/ocrug/hackathon-2019
A slack channel has been set up for the hackathon. This will be used for general announcements but it is also a great source for you to ask questions to other participants.
If you have not created an account on our slack group, create one using the following link:
Slack Group Sign-up: https://ocrug-slack.herokuapp.com
Once you have an account, sign in (you can do it on a web browser or download an app on your phone or desktop).
Slack channel: https://ocrug.slack.com
The channel for the hackathon is hackathon-2019
Please follow us on twitter, oc_rug, and also tweet about the event with the hash tag #OCRUG
- All participants will work on teams between 2 and 5 people.
- Participants are free to form their own teams prior to the event.
- We will assist in team formation at the beginning of the event for any participants that do not already have a team.
- Teams will select a team name.
- Assisting others within and between teams is highly encouraged.
See the presentation guidelines for the requirements. The team prizes will be determined by a panel of judges using the following judging guidelines. The judge's decision is final.
Below is a list of the awards and prizes that may be given out.
- Most Helpful Person
- Personal prize
- 1-year membership to SuperDataScience
- Best Model
- Team prize
- Data Science Go conference tickets
- Best Insight
- Team prize
- 1-year membership to SuperDataScience
- Best Visualization
- Team prize
- Books
- Everyone
- $100 discount code on 1-year membership for SuperDataScience
- Stickers
- $30 of AWS credits
- Early-bird registration
- 1-month membership to DataCamp
The award for Most Helpful Person Award will be decided by using cumulative voting system. In this system, each participant is given 10 votes that they can reward other participants for being helpful. You can assign multiple points to multiple people. Voting for oneself or one's team members is prohibited. The idea is to award points to individuals on other teams. The person with the most overall votes wins. When voting opens, you will receive an email with a link to a website. Use the link to cast your votes. You will need to vote before voting closes at 2:00 PM on Sunday.
There will be a discussion session Saturday between 6:00 PM and 7:00 PM. The goal is to allow sharing of ideas, knowledge and inspiration between groups. At the top of the hour each team will be presented with a "bingo card" with the names of all the teams on in. The objective is to visit with as many teams as possible to allow cross pollination of ideas and knowledge. Team-members will visit another teams to hear about what they are doing and share thoughts. Not all team-members need to visit all teams but you should try to visit as many as possible. Having multiple team-members visit the same team is a good idea.
To make this session run as smooth as possible please
- Have at least one person stay back to talk to incoming participants from other teams. You can have multiple presentors from your team show your work.
- Plan a 2-3 min talk explaining what you have been doing. Share your ideas, insights and thoughts. Also, share your problems and ask for solutions.
- Allow for multiple participants to take part of the conversation at any given group discussion.
- Follow the Pac-man Rule so that others can be included.
- Remember that the hackathon is primarily an education event and sharing of ideas is highly encouraged.
-
- 1-page note sheets covering data science fundamentals and useful R packages.
-
- Comprehensive book on the complete data science workflow, including data importing/cleaning, visualization, and data analysis
- Focus on
tidyverse
packages - Accessible for beginners who have a basic grasp of R
-
- This is the hub website for the core
tidyverse
packages - Check out the Packages section and associated links for helpful information on using the packages.
- This is the hub website for the core
-
- This book digs into the details of R.
- A great resource for more advanced users wanting to learning more about R under the hood.
- There is also a 1st Edition of the book.
-
- Useful when you need to look up more info on specific geoms, stats, scales, etc.
- Check out the examples in the details pages for each function.
-
- Gallary of various types of chart and the code needed to create them.
-
- A practical guide that provides more than 150 recipes
-
Mistakes, we’ve drawn a few: Learning from our errors in data visualisation
- From the Economist about mistakes they've made with published data visualizations, and how they'd fix the problems.
- Note: even professionals make mistakes too!
-
- Good overview of caret with code examples
- In particular, check out the table of available models
-
DALEX R Package -- Descriptive mAchine Learning EXplanations
- Provides a set of tools that help you to understand how complex models are working
- Helps you visualize what's going on
- Check out the cheatsheet
Food, drinks and snacks will be provided throughout the event. We will have vegetarian options available. Please feel free to bring any additional food for yourself if you would like to supplement the meals or if you have other specific dietary constraints.
- Saturday
- Breakfast: coffee, light breakfast
- Lunch: sandwiches and salad
- Dinner: mexican (tacos, rice & beans, chips & salsa)
- Sunday
- Breakfast: coffee, light breakfast
- Lunch: pizza and salad
- Snacks and Drinks
- Coffee
- Soft drinks
- Water
- Various snacks, TBD (e.g. fruit, chips, nuts, granola bars)
OCRUG, in partnership with UCI, was able to secure a limited number of credits for the hackathon. You will be personally be responsible for any AWS fees that you incure above the value of credits that you receive or for services that are not covered by the credits.
If you do not have an AWS account, you will have to create one. You can monitor your costs in the console and setting up a alarms. The following references may also be helpful:
- Cost Calculator: https://calculator.aws
- EC2 On-demand pricing: https://aws.amazon.com/ec2/pricing/on-demand/
The codes will expire on August 31, 2019 or until credits are fully used up, whichever comes earlier. Credits cannot be transferred to a different account once applied and the duration cannot be extended. Once redeemed to the account, credits backdate to the beginning of the month. Credits cannot be applied to any past month’s charges.
Each Promotion code amount is - $30.00 USD
You redeemed the credits by visiting https://aws.amazon.com/awscredits/ or by entering code via AWS account dashboard, under ‘credits’. You must agree to AWS Credits Terms and Conditions
You may set up billing alerts https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/monitor-charges.html in order to avoid unwanted charges. Credits are non-refundable https://aws.amazon.com/premiumsupport/knowledge-center/close-aws-account/.
If yau are planning to join organizations https://docs.aws.amazon.com/organizations/latest/userguide/orgs_introduction.html or add other accounts to your organization, to be aware that by default, promotional credits are shared between all accounts in an organization. Credit sharing https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/useconsolidatedbilling-credits.html can be disabled by the Payer (Master Account) only via Preferences tab within main account dashboard. If you navigate to the preference tab of your console you can see if credits are currently being shared within your Organization. Please review information on credits and billing cycle https://aws.amazon.com/premiumsupport/knowledge-center/consolidated-billing-credits/, or contact AWS support if you have questions.
Some support may be covered by your promotional credits. Please note that upfront Subscription fees are not covered by promotional credits. See below for a list of the support that is covered.
- AWSAppSync
- AWSBackup
- AWSBudgets
- AWSCertificateManager
- AWSCloudTrail
- AWSCodeCommit
- AWSCodeDeploy
- AWSCodePipeline
- AWSCompetency
- AWSConfig
- AWSDDoSProtection
- AWSDataTransfer
- AWSDatabaseMigrationSvc
- AWSDeveloperSupport
- AWSDeviceFarm
- AWSDirectConnect
- AWSDirectoryService
- AWSElasticBeanstalk
- AWSElementalMediaConvert
- AWSGlue
- AWSGreengrass
- AWSIoT
- AWSLambda
- AWSMobileHub
- AWSQueueService
- AWSRoboMaker
- AWSServiceCatalog
- AWSShield
- AWSStorageGateway
- AWSSupportBasic
- AWSSupportBusiness
- AWSSupportDeveloper
- AWSTransfer
- AWSXRay
- AlexaSiteThumbnail
- AlexaTopSites
- AlexaWebInfoService
- AmazonApiGateway
- AmazonAppStream
- AmazonAssociatesWebService
- AmazonAthena
- AmazonChime
- AmazonChimeCallMe
- AmazonChimeDialin
- AmazonClearBox
- AmazonCloudFront
- AmazonCloudSearch
- AmazonCloudWatch
- AmazonCloudcast
- AmazonCognitoSync
- AmazonConnect
- AmazonDAX
- AmazonDynamoDB
- AmazonEC2
- AmazonECR
- AmazonECS
- AmazonEFS
- AmazonEKS
- AmazonES
- AmazonETS
- AmazonElastiCache
- AmazonGameLift
- AmazonGlacier
- AmazonInspector
- AmazonKinesis
- AmazonKinesisFirehose
- AmazonLex
- AmazonLightsail
- AmazonML
- AmazonMSK
- AmazonMacie
- AmazonNeptune
- AmazonPolly
- AmazonQuickSight
- AmazonRDS
- AmazonRedshift
- AmazonRekognition
- AmazonRoute53
- AmazonS3
- AmazonSES
- AmazonSNS
- AmazonSWF
- AmazonSageMaker
- AmazonSimpleDB
- AmazonStates
- AmazonSumerian
- AmazonVPC
- AmazonWorkDocs
- AmazonWorkMail
- AmazonWorkSpaces
- AmazonZocalo
- CloudHSM
- CodeBuild
- ContactCenterTelecomm
- ElasticMapReduce
- IngestionServiceSnowball
- OpsWorks
- RemoteConfiguration
- ResourceAllocationService
- SnowballExtraDays
- awskms
- awswaf
- comprehend
- datapipeline
- mobileanalytics
- transcribe
- translate
Note: There may be services that your code does not apply to.
This is an open data hackathon. Any and all puplicly avaliable data can be used as long as it adheres to the rules. Listed below is a handful of datasets that provides a good starting point. All of these datasets allong with data dictionaries are located in the data folder in this repository.
List of Active Public Water Systems (PWS) in California. It includes basic information such as population served, number of connections, county, etc. This data also have a public portal.
California county population projections by age, gender and ethnicity for 1970-2050, developed by the CA Department of Finance. Last updated 2018-05-23.
The DWR Periodic Groundwater Levels dataset contains seasonal and long-term groundwater level measurements collected by the Department of Water Resources and cooperating agencies in groundwater basins statewide. It also includes data collected through the CASGEM (California Statewide Groundwater Elevation Monitoring) Program. Most measurements are taken manually twice per year to capture the peak high and low values in groundwater elevations. However, the dataset also includes measurements recorded more frequently, monthly, weekly, or daily.
This dataset includes public water systems (PWS) compliance status regarding violations.