In this analysis we compare the temperature data between two different months, June and December, over the course of about seven years to determine the viability of opening up a surf shop.
Fortunately, the Pandas environment allows us to call upon a dataframe with the .describe() method
Here we can see the start and end points for the December dataset:
And the June dataset:
The data is not perfect (what data is?!): June has earlier and later datapoints by about a year.
But it's enough to analyze.
- Curiously enough, the average temperature for June and December wasn't that much different! I highly doubt that tried-and-true surfers would make much fuss about a 4-degree difference, especially when the temperature is still in the 70's:
-
However, we must follow this point with the notable minimums: December can be nearly 30 degrees colder than the highest temperatures of Summer.
-
Finally, it would be interesting to perform an analysis on surfing interest in sub-60 degree weather. The other factor that might play here is temperament of the surfers: If it was under 60 degrees yesterday, do I really want to take a chance and go out to the water today, even if the forecast says it will be warmer?
To summarize, the current datasets do not lineup perfectly (they are off by about a year) but they provide us a good snapshot of average temperatures over the course of 6-7 years. Noteably, the average temperature doesn't vary by more than four degrees! That should be no problem if you're willing to wade throug the ocean in the first place.
I would recommend checking the weather of certain weather stations -- I believe there were over half a dozen involved in the first analysis. That might look like this, tacked on to the June or December month queries:
filter(Measurement.station == 'USC00519281')
Furthermore, each station has different counts (ranging from 511 datapoints to 2772 datapoints, a significant range). I would highly suggest looking at these month comparison charts, putting into context the robustness of the weather station data. That query might look like th following:
func.count(Measurement.station)