Projects : Case Study : How Can a Wellness Technology Company Play It Smart

About Bellabeat

Bellabeat is a high-tech manufacturer of health-focused products for women. Bellabeat is a successful small company, but they have the potential to become a larger player in the global smart device market. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly, positioned itself as a tech-driven wellness company for women. By 2016, Bellabeat had opened offices around the world and launched multiple products. Bellabeat products became available through a growing number of online retailers in addition to their own e-commerce channel on their website.

Bellabeat Products

Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bella Beat app connects to their line of smart wellness products.

Leaf: Bella Beat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bella Beat app to track activity, sleep, and stress.

Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress. The Time watch connects to the Bella Beat app to provides insights of daily wellness.

Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that user are appropriately hydrated throughout the day. The spring bottle connects to the Bella Beat app to track user hydration levels

Bellabeat membership

Bellabeat also offers a subscription-based membership program for users. Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health , beauty and mindfulness based on their lifestyle and goals.

Business Task:

Too analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices. Then selecting Bellabeat product to apply these insights.

Key Stakeholders

Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer.

Sando Mur: Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team.

Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy.

Data Analysis Process

Ask

1. What are some trends in smart device usage?

2. How could these trends apply to Bellabeat customers?

3. How could these trends help influence Bellabeat marketing strategy?

Prepare

I have used public data that explores smart device users’ daily habits. It is Fitbit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius): This Kaggle data set contains personal fitness tracker from thirty Fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about activity, steps, weight and heart rate that can be used to explore user habits.

Process

The Zip files were downloaded locally and copy was stored in a new folder named Bellabeat project with a csv extension.

The csv files were opened using Excel and copy of relevant datasets was stored in desktop as folder. Then each folder was inspected.

Activity, calories, intensities, steps datasets have no duplicates. Sleep and heart datasets have duplicates that were removed. Weight dataset have no duplicates but some manual reports in this data are false as a result false reports were filtered out, new column USERTYPE was created based on BMI classification.

R STUDIO Codes

# installing packages

install.packages("tidyverse")

install.packages("lubridate")

install.packages("dplyr")

install.packages("ggplot2")

install.packages("tidyr")

install.packages("here")

install.packages("skimr")

install.packages("janitor")

# loading libraries

library(tidyverse)

library(lubridate)

library(dplyr)

library(ggplot2)

library(tidyr)

library(here)

library(skimr)

library(janitor)

# Working Directory

setwd("C:/Users/H/Desktop")

> d_Activity <- read.csv("daily_Activity.csv") # 1

> d_calories <- read.csv("daily_calories.csv") # 2

> d_intensities <- read.csv("daily_intensities.csv") # 3

> d_steps <- read.csv("daily_steps.csv") # 4

> d_weight <- read.csv("cleanbmi.csv") #5

> d_sleep <- read.csv("cleansleep.csv") #6

Analyse

# working with d_sleep dataset

> str(d_sleep)

'data.frame': 410 obs. of 7 variables:

$ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...

$ SleepDay : chr "04/12/2016 00:00" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...

$ TotalSleepRecords : int 1 2 1 2 1 1 1 1 1 1 ...

$ TotalMinutesAsleep: int 327 384 412 340 700 304 360 325 361 430 ...

$ TotalSleepHours : chr "5:27" "6:24" "6:52" "5:40" ...

$ TotalTimeInBed : int 346 407 442 367 712 320 377 364 384 449 ...

$ TotalBedHours : chr "5:46" "6:47" "7:22" "6:07" ...

> summary(d_sleep)

Id SleepDay TotalSleepRecords TotalMinutesAsleep

Min. :1.504e+09 Length:410 Min. :1.00 Min. : 58.0

1st Qu.:3.977e+09 Class :character 1st Qu.:1.00 1st Qu.:361.0

Median :4.703e+09 Mode :character Median :1.00 Median :432.5

Mean :4.995e+09 Mean :1.12 Mean :419.2

3rd Qu.:6.962e+09 3rd Qu.:1.00 3rd Qu.:490.0

Max. :8.792e+09 Max. :3.00 Max. :796.0

TotalSleepHours TotalTimeInBed TotalBedHours

Length:410 Min. : 61.0 Length:410

Class :character 1st Qu.:403.8 Class :character

Mode :character Median :463.0 Mode :character

Mean :458.5

3rd Qu.:526.0

Max. :961.0

> n_distinct(d_sleep)

[1] 410

# creating usertype based on TotalMinutesAsleep

> user<-d_sleep %>%

+ mutate(user_type=case_when(

+ TotalMinutesAsleep <360 ~ "SSS",

+ TotalMinutesAsleep >=360 & TotalMinutesAsleep <540 ~ "NORMAL",

+ TotalMinutesAsleep >540 ~ "OVERSLEEP"

+ ))

# convert user_type chr to factor user_type

d_user <-mutate(user,user_type=as.factor(user_type))

glimpse(d_user)

Rows: 410

Columns: 8

$ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 15039…

$ SleepDay <chr> "04/12/2016 00:00", "4/13/2016 12:00:00 AM", "4/15/20…

$ TotalSleepRecords <int> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…

$ TotalMinutesAsleep <int> 327, 384, 412, 340, 700, 304, 360, 325, 361, 430, 277…

$ TotalSleepHours <chr> "5:27", "6:24", "6:52", "5:40", "11:40", "5:04", "6:0…

$ TotalTimeInBed <int> 346, 407, 442, 367, 712, 320, 377, 364, 384, 449, 323…

$ TotalBedHours <chr> "5:46", "6:47", "7:22", "6:07", "11:52", "5:20", "6:1…

$ user_type <fct> SSS, NORMAL, NORMAL, SSS, OVERSLEEP

ggplot(data = d_user)+

geom_smooth(mapping = aes(x=TotalMinutesAsleep,y=TotalTimeInBed))+

geom_point(mapping = aes(x=TotalMinutesAsleep,y=TotalTimeInBed,color="orange"))

d_user %>%

group_by(user_type) %>%

summarise(total = n()) %>%

mutate(totals = sum(total)) %>%

group_by(user_type) %>%

summarise(Percent = total / totals) %>%

ggplot(aes(user_type,y=Percent, fill=user_type)) +

geom_col()+

scale_y_continuous(labels = scales::percent) +

theme(legend.position="none") +

labs(title="Usertype", x=NULL) +

theme(legend.position="none", text = element_text(size = 20),plot.title = element_text(hjust = 0.5))

# Working with weight data

> str(d_weight)

'data.frame': 41 obs. of 6 variables:

$ Id : num 1.50e+09 1.50e+09 2.87e+09 2.87e+09 4.32e+09 ...

$ WeightKg : num 52.6 52.6 56.7 57.3 72.4 ...

$ WeightPounds : num 116 116 125 126 160 ...

$ BMI : num 22.6 22.6 21.5 21.7 27.5 ...

$ USERTYPE : chr "normal" "normal" "normal" "normal" ...

$ IsManualReport: logi TRUE TRUE TRUE TRUE TRUE TRUE ...

> summary(d_weight)

Id WeightKg WeightPounds BMI

Min. :1.504e+09 Min. :52.60 Min. :116.0 Min. :21.45

1st Qu.:4.559e+09 1st Qu.:61.20 1st Qu.:134.9 1st Qu.:23.89

Median :6.962e+09 Median :61.50 Median :135.6 Median :24.00

Mean :6.074e+09 Mean :62.41 Mean :137.6 Mean :24.39

3rd Qu.:6.962e+09 3rd Qu.:62.10 3rd Qu.:136.9 3rd Qu.:24.24

Max. :6.962e+09 Max. :72.40 Max. :159.6 Max. :27.46

USERTYPE IsManualReport

Length:41 Mode:logical

Class :character TRUE:41

Mode :character

> n_distinct(d_weight)

[1] 22

> #change USERTYPE chr.to factor

> d_w <-mutate(d_weight,USERTYPE=as.factor(USERTYPE))

> glimpse(d_w)

Rows: 41

Columns: 6

$ Id <dbl> 1503960366, 1503960366, 2873212765, 2873212765, 431970357…

$ WeightKg <dbl> 52.6, 52.6, 56.7, 57.3, 72.4, 72.3, 69.7, 70.3, 69.9, 69.…

$ WeightPounds <dbl> 115.9631, 115.9631, 125.0021, 126.3249, 159.6147, 159.394…

$ BMI <dbl> 22.65, 22.65, 21.45, 21.69, 27.45, 27.38, 27.25, 27.46, 2…

$ USERTYPE <fct> normal, normal, normal, normal, overweight, overweight, o…

$ IsManualReport <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRU…

ggplot(data = d_w)+

geom_smooth(mapping = aes(x=WeightKg,y=BMI))+

geom_point(mapping = aes(x=WeightKg,y=BMI,color="orange"))

)

d_w %>%

group_by(USERTYPE) %>%

summarise(total = n()) %>%

mutate(totals = sum(total)) %>%

group_by(USERTYPE) %>%

summarise(Total_Percent = total / totals) %>%

ggplot(aes(USERTYPE,y=Total_Percent, fill=USERTYPE)) +

geom_col()+

scale_y_continuous(labels = scales::percent) +

theme(legend.position="none") +

labs(title="USERTYPE", x=NULL) +

theme(legend.position="none", text = element_text(size = 20),plot.title = element_text(hjust = 0.5))

# working with activity, calories, intensities, steps datasets

> #How many unique participants are there in each dataframe?

> n_distinct(d_Activity$Id)

[1] 33

> n_distinct(d_calories$Id)

[1] 33

> n_distinct(d_intensities$Id)

[1] 33

> n_distinct(d_steps$Id)

[1] 33

> #How many observations are there in each dataframe?

> nrow(d_Activity)

[1] 940

> nrow(d_calories)

[1] 940

> nrow(d_intensities)

[1] 940

> nrow(d_steps)

[1] 940

str(d_Activity)

'data.frame': 940 obs. of 15 variables:

$ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...

$ ActivityDate : chr "04/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...

$ TotalSteps : int 13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...

$ TotalDistance : num 8.5 6.97 6.74 6.28 8.16 ...

$ TrackerDistance : num 8.5 6.97 6.74 6.28 8.16 ...

$ LoggedActivitiesDistance: num 0 0 0 0 0 0 0 0 0 0 ...

$ VeryActiveDistance : num 1.88 1.57 2.44 2.14 2.71 ...

$ ModeratelyActiveDistance: num 0.55 0.69 0.4 1.26 0.41 ...

$ LightActiveDistance : num 6.06 4.71 3.91 2.83 5.04 ...

$ SedentaryActiveDistance : num 0 0 0 0 0 0 0 0 0 0 ...

$ VeryActiveMinutes : int 25 21 30 29 36 38 42 50 28 19 ...

$ FairlyActiveMinutes : int 13 19 11 34 10 20 16 31 12 8 ...

$ LightlyActiveMinutes : int 328 217 181 209 221 164 233 264 205 211 ...

$ SedentaryMinutes : int 728 776 1218 726 773 539 1149 775 818 838 ...

$ Calories : int 1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...

> str(d_calories)

'data.frame': 940 obs. of 3 variables:

$ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...

$ ActivityDay: chr "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...

$ Calories : int 1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...

> str(d_intensities)

'data.frame': 940 obs. of 10 variables:

$ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...

$ ActivityDay : chr "04/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...

$ SedentaryMinutes : int 728 776 1218 726 773 539 1149 775 818 838 ...

$ LightlyActiveMinutes : int 328 217 181 209 221 164 233 264 205 211 ...

$ FairlyActiveMinutes : int 13 19 11 34 10 20 16 31 12 8 ...

$ VeryActiveMinutes : int 25 21 30 29 36 38 42 50 28 19 ...

$ SedentaryActiveDistance : num 0 0 0 0 0 0 0 0 0 0 ...

$ LightActiveDistance : num 6.06 4.71 3.91 2.83 5.04 ...

$ ModeratelyActiveDistance: num 0.55 0.69 0.4 1.26 0.41 ...

$ VeryActiveDistance : num 1.88 1.57 2.44 2.14 2.71 ...

> str(d_steps)

'data.frame': 940 obs. of 3 variables:

$ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...

$ ActivityDay: chr "04/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...

$ StepTotal : int 13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...

> #all datasets had the 'Id' field common.

> #all dataets expect for d_activity have ActivityDay common.We can rename the ActivityDate to AcitivityDay

# rename d_Activity data ActivityDate col to ActivityDay col

d_Activity <- rename( d_Activity,

ActivityDay = ActivityDate)

# now we can merge 4 dataset by Id and ActivityDay

> merge_1 <- merge(d_Activity, d_calories, by= c("Id", "ActivityDay"))

> merge_2 <- merge(d_intensities,d_steps, by= c("Id","ActivityDay"))

> All_merge <- merge(merge_1, merge_2, by = c("Id","ActivityDay","SedentaryMinutes",

+ "LightlyActiveMinutes","FairlyActiveMinutes",

+ "VeryActiveMinutes", "SedentaryActiveDistance",

+ "LightActiveDistance", "ModeratelyActiveDistance",

+ "VeryActiveDistance"))

glimpse(All_merge)

Rows: 578

Columns: 17

$ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366,…

$ ActivityDay <chr> "4/13/2016", "4/14/2016", "4/15/2016", "4/16/20…

$ SedentaryMinutes <int> 776, 1218, 726, 773, 539, 1149, 775, 818, 838, …

$ LightlyActiveMinutes <int> 217, 181, 209, 221, 164, 233, 264, 205, 211, 13…

$ FairlyActiveMinutes <int> 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21, 5, 1…

$ VeryActiveMinutes <int> 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 41, 39,…

$ SedentaryActiveDistance <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,…

$ LightActiveDistance <dbl> 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5.03, 4.24,…

$ ModeratelyActiveDistance <dbl> 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1.32, 0.48,…

$ VeryActiveDistance <dbl> 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3.53, 1.96,…

$ TotalSteps <int> 10735, 10460, 9762, 12669, 9705, 13019, 15506, …

$ TotalDistance <dbl> 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.88, 6.68,…

$ TrackerDistance <dbl> 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.88, 6.68,…

$ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…

$ Calories.x <int> 1797, 1776, 1745, 1863, 1728, 1921, 2035, 1786,…

$ Calories.y <int> 1797, 1776, 1745, 1863, 1728, 1921, 2035, 1786,…

$ StepTotal <int> 10735, 10460, 9762, 12669, 9705, 13019, 15506, …

> # convert ActivityDay chr to date format

> d_merge <- mutate(All_merge, ActivityDay = as.Date(ActivityDay, format= "%m/%d/%Y"))

> class(Date$ActivityDay)

[1] "Date"

> # convert date to weekday

d_merge$Day <- weekdays(d_merge$ActivityDay)

d_merge$Day <- factor(d_merge$Day,levels = c('Sunday','Monday',

'Tuesday','Wednesday','Thursday','Friday','Saturday'))

glimpse(d_merge)

Rows: 578

Columns: 18

$ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366,…

$ ActivityDay <date> 2016-04-13, 2016-04-14, 2016-04-15, 2016-04-16…