Skip to content

Case study project using data provided by Fitbit fitness tracker to answer the business question: Identify trends in how consumers use non-Bellabeat smart devices to apply insights into Bellabeat’s marketing strategy.I’ve used R, RStudio and Tableau

Notifications You must be signed in to change notification settings

megtan1905/Bellabeat-Fitness-Tracker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Bellabeat Fitness Tracker Case Study Report

The case study consists of 6 phases: Ask, Prepare, Process, Analyse, Share and Act.

Ask

1.Introduction

Bellabeat is a high-tech manufacturer of health-focused products for women. They focus mainly on developing wearables and accompanying products that monitor biometric and lifestyle data to help women better understand how their bodies work, and as a result, make healthier lifestyle choices.

Products-

Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products.

Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress.

Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress. The Time watch connects to the Bellabeat app to provide you with insights into your daily wellness.

Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels.

2.Business task

Identify trends in how consumers use non-Bellabeat smart devices to apply insights into Bellabeat’s marketing strategy.

3.Key Stakeholders

Urška Sršen — Bellabeat’s cofounder and Chief Creative Officer.

Sando Mur — Bellabeat’s cofounder; key member of the Bellabeat executive team.

Bellabeat marketing analytics team — A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy.

Prepare

The data source used for our case study is FitBit Fitness Tracker Data. This dataset is stored in Kaggle and was made available through Mobius.

Accessibility and privacy of data:

Verifying the metadata of our dataset we can confirm it is open-source. The owner has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighbouring rights, to the extent allowed by law. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.

Credibility of the data:

The data is public data from FitBit Fitness Tracker Data stored and widely available in Kaggle and is in long format. It's a dataset from 30 fitbit users that includes minute-level output for physical activity, heart rate, and sleep monitoring. It's a good database segmented in several tables with different aspects of the data of the device with lots of details about the user behaviour. I have no doubt of its source integrity.

Information about the dataset:

These datasets were generated by customers to a distributed survey via Amazon Mechanical Turk between 12th April-12th May 2016. There are a total of 18 CSV documents made available for the analysis. Each document represents different quantitative data tracked by Fitbit.

Does the data ROCCC?

• Reliable - Med - It has a large amount of data, but could induce a sampling bias as there are only 30 users, we are not sure if the sample is representative of the population as a whole. • Original - High - It is a first-party data. • Comprehensive - High - It matches most of Bellabeat's product parameters. • Current - Low - 12th April-12th May 2016, no real time data and consists of only 2 months of data. • Cited - High- It is a first-party data.

That is why we will give our case study an operational approach.

Process

I used RStudio to conduct my analysis due to the efficiency and accessibility of the program, its ability to handle huge amount of data and being able to create the visualisations.

1.Installing packages and library


install.packages(“tidyverse”)

install.packages(“dplyr”)

install.packages(“ggplot2”)

install.packages(“lubridate”)

install.packages(“janitor”)

install.packages(“ggpubr”)

install.packages(“skimr”)

install.packages(“here”)

install.packages(“ggrepel”)

install.packages(“readr”)

install.packages(“scales”)


library(tidyverse)

library(dplyr)

library(ggplot2)

library(lubridate)

library(janitor)

library(ggpubr)

library(skimr)

library(here)

library(ggrepel)

library(readr)

library(scales)

I chose these packages to help me with my analysis.

2.Importing and preparing the data


daily_activity <- read.csv("C:/Users/Computer/Desktop/12Apr-12May/dailyActivity_merged.csv")

daily_steps <- read.csv("C:/Users/Computer/Desktop/12Apr-12May/dailySteps_merged.csv")

daily_sleep <- read.csv("C:/Users/Computer/Desktop/12Apr-12May/sleepDay_merged.csv")

hourly_steps <- read.csv("C:/Users/Computer/Desktop/12Apr-12May/hourlySteps_merged.csv")

hourly_calories <- read.csv("C:/Users/Computer/Desktop/12Apr-12May/hourlyCalories_merged.csv")

hourly_intensities <- read.csv("C:/Users/Computer/Desktop/12Apr-12May/hourlyIntensities_merged.csv")

weight <- read.csv("C:/Users/Computer/Desktop/12Apr-12May/weightLogInfo_merged.csv")

3.Data cleaning and formatting

Check the number of participants for each data set

I will make sure to check the number of unique users per data frame using n_unique() before coming on with the cleaning process.


n_unique(daily_activity$Id)

n_unique(daily_sleep$Id)

n_unique(daily_steps$Id)

n_unique(hourly_calories$Id)

n_unique(hourly_intensities$Id)

n_unique(hourly_steps$Id)

n_unique(weight$Id)

All datasets have 33 participants each, except for the daily_sleep and weight datasets, which have 24 and 8 participants respectively. I will exclude the weight dataset because 8 participants are too small a sample size to draw meaningful conclusions and make recommendations.

I used rm(weight) to remove the weight table.

Check for duplicates and Missing Values

I will check for any duplicates in the data


sum(duplicated(daily_activity))

sum(duplicated(daily_sleep))

sum(duplicated(daily_steps))

sum(duplicated(hourly_calories))

sum(duplicated(hourly_intensities))

sum(duplicated(hourly_steps))

I found 3 duplicates in daily_sleep table.

I used distinct () to remove the said duplicates.


daily_sleep <- daily_sleep %>% distinct()

then, I proceeded to remove missing values from all the tables using drop_na() to ensure accuracy.


daily_activity<-daily_activity %>% drop_na()

daily_sleep<-daily_sleep %>% drop_na()

daily_steps<-daily_steps %>% drop_na()

hourly_calories<-hourly_calories %>% drop_na()

hourly_intensities<-hourly_intensities %>% drop_na()

hourly_steps<-hourly_steps %>% drop_na()

-I cleaned and formatted the column names of every table to make sure of its accuracy and consistency.


clean_names(daily_activity)
daily_activity <- rename_with(daily_activity, tolower)

clean_names(daily_sleep)
daily_sleep <- rename_with(daily_sleep, tolower)

clean_names(daily_steps)
daily_steps <- rename_with(daily_steps, tolower)

clean_names(hourly_calories)
hourly_calories <- rename_with(hourly_calories, tolower)

clean_names(hourly_intensities)
hourly_intensities <- rename_with(hourly_intensities, tolower)

clean_names(hourly_steps)
hourly_steps <- rename_with(hourly_steps, tolower)

Make date and time columns consistent


daily_activity <- daily_activity %>%
+   rename(date = activitydate) %>%
+   mutate(date = mdy(date))


daily_sleep <- daily_sleep %>%
  rename(date = sleepday) %>%
  mutate(date = mdy_hms(date, tz = Sys.timezone()))

hourly_calories <- hourly_calories %>% 
  rename(date_time = activityhour) %>% 
  mutate(date_time = as.POSIXct(date_time, format = "%m/%d/%Y %I:%M:%S %p", tz = Sys.timezone()))

hourly_intensities <- hourly_intensities %>% 
  rename(date_time = activityhour) %>% 
  mutate(date_time = as.POSIXct(date_time, format = "%m/%d/%Y %I:%M:%S %p", tz = Sys.timezone()))

hourly_steps <- hourly_steps %>% 
  rename(date_time = activityhour) %>% 
  mutate(date_time = as.POSIXct(date_time, format = "%m/%d/%Y %I:%M:%S %p", tz = Sys.timezone()))

Merging data sets

I will merge the daily_activity and daily_sleep data sets together.


daily_activity_sleep <- merge(daily_activity, daily_sleep, by = c("id", "date"))

Analyze and Share (Visualization)

I will analyze trends of the users of FitBit and determine if that can help me on BellaBeat’s marketing strategy.

1.Summarize and explore each data set


daily_activity %>%  
  select(totalsteps,
         totaldistance,
         sedentaryminutes, calories) %>%
  summary()

daily_activity_sleep %>% 
  select(totalminutesasleep,) %>% summary()

• Average total steps are 7638 in a day. The daily recommended number of steps to be taken per day is 7500.

• Sedentary minutes on an average is 991(~17 hours). This needs to be reduced.

• Majority of the participants are light users.

• Participants sleep for an average of 419 minutes (~7 hours).

• A total of 97 calories is burned per hour on an average.

2. Steps taken and Calories burned

I made a visualization to check if there’s a correlation between steps taken and the amount of calories burned.


ggplot(data = daily_activity, aes(x = totalsteps, y = calories)) + 
  geom_point() + geom_smooth() + labs(title ="Total Steps vs. Calories")

refer to the attached document for the graph of "Total steps vs Calories".

I see a positive correlation between total steps taken and the amount of calories burned.

3. Steps per weekday

To know what day of the week users are more active.


weekday_steps <- daily_activity %>%
  mutate(weekday = weekdays(date))

weekday_steps$weekday <-ordered(weekday_steps$weekday, levels=c("Monday", "Tuesday", "Wednesday", "Thursday","Friday", "Saturday", "Sunday"))

weekday_steps <-weekday_steps %>%
  group_by(weekday) %>%summarize (daily_steps = mean(totalsteps))

Check the table from the attached document.

Now that we have gotten the steps taken for each day of the week. I will make a visualization to better understand the data using Tableau.

Check the graph from the attached document for "daily steps per weekday"

  • Users take recommended number of 7500 steps a day excepts for Sundays.

4. Sleep per weekday


weekday_sleep <- daily_sleep %>%
  mutate(weekday = weekdays(date))

weekday_sleep$weekday <-ordered(weekday_sleep$weekday, levels=c("Monday", "Tuesday", "Wednesday", "Thursday","Friday", "Saturday", "Sunday"))

weekday_sleep <-weekday_sleep %>% group_by(weekday) %>%
  summarize (daily_sleep = mean(totalminutesasleep))

Refer to the graph from the document for "Sleep per Weekday".

-In the graphs above we can deduce that users don’t take the recommended amount of sleep of 8 hours.

5.Hourly intensities throughout the day

-Split the datetime column into date and time columns


hourly_intensities <- hourly_intensities %>% separate(date_time, into = c("date", "time"), sep= " ") 

head(hourly_intensities)

hourly_intensities <- hourly_intensities %>%
  group_by(time) %>%drop_na() %>% summarise(mean_total_int = mean(totalintensity))

-I will make a visualization of the hourly intensities data.


ggplot(data = hourly_intensities, aes(x = time, y = mean_total_int)) + geom_histogram(stat = "identity", fill='purple') +
  theme(axis.text.x = element_text(angle = 90)) + labs(title="Average Total Intensity vs. Time")

Check the graph from the document for " Average Total Intensity vs Time "

• I found out that people are more active between 5am and 11pm.

• Most of the activity happens between 5 pm and 7 pm — I suppose that when people are done with work for the day, they go to the gym or take a walk. We can use this period of time to remind and motivate users to go for a run or walk using the Bellabeat app.

6. Hourly steps throughout the day

-Separating the datetime column into date and time column.


hourly_steps <- hourly_steps %>% separate(date_time, into = c("date", "time"), sep= " ") %>%mutate(date = ymd(date)) 

head(hourly_steps)

-I will make a visualization of the hourly steps throughout the day.


hourly_steps %>%
 group_by(time) %>%
 summarize(average_steps = mean(steptotal)) %>%
 ggplot() +
 geom_col(mapping = aes(x=time, y = average_steps, fill = average_steps)) + 
 labs(title = "Hourly steps throughout the day", x="", y="") + 
 scale_fill_gradient(low = "red", high = "green")+
 theme(axis.text.x = element_text(angle = 90))

Refer to the document for the graph of " Hourly steps throughout the day "

• Users are more active from 8am to 7pm and they walk more steps from 12pm to 2pm and 5pm to 7pm.

• I assume that most users are working class women and the time more steps are recorded suggests that they have their lunch break (12pm-2pm) and close for the day (5pm-7pm) during those periods.

7. Type of user based on the number of days smart device was used

Calculate the number of users that use their smart device on a daily basis, classifying our sample into three categories knowing that the duration of the survey is 31 days:

• High user — users who use their device for 21–31 days • Moderate user — users who use their device for 10–20 days • Low user — users who use their device for 1–10 days

Next, I will create a new data frame grouping by id, calculate the number of days smart device was used and create a new column with the classification explained above.


daily_use <- daily_activity_sleep %>%
 group_by(id) %>%
 summarize(days_used=sum(n())) %>%
 mutate(user_type= case_when(
   days_used >= 1 & days_used <= 10 ~ "low user",
   days_used >= 11 & days_used <= 20 ~ "moderate user", 
   days_used >= 21 & days_used <= 31 ~ "high user", 
 ))

head(daily_use)

-Create a percentage data frame to better visualize the results in the graph


daily_use_percent <- daily_use %>%
  group_by(user_type) %>%
  summarise(total = n()) %>%
  mutate(totals = sum(total)) %>%
  group_by(user_type) %>%
  summarise(total_percent = total / totals) %>%
  mutate(labels = scales::percent(total_percent))

daily_use_percent$user_type <- factor(daily_use_percent$user_type, levels = c("high user", "moderate user", "low user"))

head(daily_use_percent)

Make a visualization of the smart device usage per user


daily_use_percent %>%
  ggplot(aes(x = "",y = total_percent, fill = user_type)) +
  geom_bar(stat = "identity", width = 1)+
  coord_polar("y", start=0)+
  theme_minimal()+
  theme(axis.title.x= element_blank(),
        axis.title.y = element_blank(),
        panel.border = element_blank(), 
        panel.grid = element_blank(), 
        axis.ticks = element_blank(),
        axis.text.x = element_blank(),
        plot.title = element_text(hjust = 0.5, size=14, face = "bold")) +
  geom_text(aes(label = labels),
            position = position_stack(vjust = 0.5))+
  scale_fill_manual(values = c( "#d62d58","#db7980","#fc9fb7"),
                    labels = c("High user - 21 to 31 days",
                               "Moderate user - 11 to 20 days",
                               "Low user - 1 to 10 days"))+
  labs(title="Daily use of smart device")

Refer to the document for the pie chart of "daily users of smart device"

• 50% of the users of our sample use their device frequently — between 21 to 31 days.

• 12% are moderate users (they use their device for 11 to 20 days).

• 38% of our sample rarely used their device.

Act

Conclusion and Recommendations

After analysing the FitBit Fitness data, I will respond to the business task of helping Bellabeat on its marketing strategy based on my results; I would advise that for further analysis, we use tracking data from Bellabeat’s device. The datasets used have a small sample and can be biased since we didn’t have any demographic details of users.

• Daily notification on steps taken

Bellabeat can encourage people to take at least 8, 000 explaining the benefits for their health. Sending notifications daily at different times will make users conscious of the number of steps achieved so far and encourage them to meet the set target of 8000 steps according to CDC. The app can also educate users on the health benefits of walking the daily recommended number of steps.

• Daily notification on sleep

From the results of my analysis, I can see that users have less than the recommended amount of sleep in a day. We can enable a feature on the app that allows user set up a desired time to go to sleep and receive a notification minutes before to prepare to sleep or set up an alarm to sleep.

• Notification based on user needs

If users want to lose weight, controlling daily calorie consumption is a good idea. The Bellabeat can suggest some ideas for low-calorie food and tips to lose weight to such users.

• Menstrual Cycle tracker

Since, it’s mainly targeted to women, it’ll be great to add a feature which tracks menstrual cycle monthly, and provide educational tips and knowledge about the PMS, Menstrual cycle and even Pregnancy planning, empowering them to make important health choices.

• Reward system

We can develop a reward system based on users' activity levels on the app. The app could feature different stages that users progress through, determined by their daily step count. Users would need to sustain their activity level for a specified duration, such as a month, to advance to the next stage. Upon reaching each stage, users would earn a certain number of stars, which could be redeemed for discounts on other Bellabeat products.

• Recommendation for the online campaign

Make sure the online campaign portrays the Bellabeat app more than just a fitness activity app. It should be seen as a guide that empowers women to strike a balance in their personal and professional life and their health habits by educating and motivating them through daily app recommendations.

About

Case study project using data provided by Fitbit fitness tracker to answer the business question: Identify trends in how consumers use non-Bellabeat smart devices to apply insights into Bellabeat’s marketing strategy.I’ve used R, RStudio and Tableau

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published