Thursday, March 16, 2023

Case Study : How Can a Wellness Technology Company Play It Smart

About Bellabeat


Bellabeat is a high-tech manufacturer of health-focused products for women. Bellabeat is a successful small company, but they have the potential to become a larger player in the global smart device market. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly, positioned itself as a tech-driven wellness company for women. By 2016, Bellabeat had opened offices around the world and launched multiple products. Bellabeat products became available through a growing number of online retailers in addition to their own e-commerce channel on their website. 

Bellabeat Products


Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bella Beat app connects to their line of smart wellness products. 

Leaf: Bella Beat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bella Beat app to track activity, sleep, and stress. 

Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress. The Time watch connects to the Bella Beat app to provides insights of daily wellness. 

Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that user are appropriately hydrated throughout the day. The spring bottle connects to the Bella Beat app to track user hydration levels

Bellabeat membership
Bellabeat also offers a subscription-based membership program for users. Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health , beauty and mindfulness based on their lifestyle and goals.

Business Task:

Too analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices. Then selecting  Bellabeat product to apply these insights.

Key Stakeholders


Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer.

Sando Mur: Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team.

Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy.

Data Analysis Process


Ask 


1. What are some trends in smart device usage? 
2. How could these trends apply to Bellabeat customers? 
3. How could these trends help influence Bellabeat marketing strategy? 


Prepare

I have used public data that explores smart device users’ daily habits. It is Fitbit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius): This Kaggle data set contains personal fitness tracker from thirty Fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about activity, steps, weight and heart rate that can be used to explore user habits.

Process

The Zip files were downloaded locally and copy was stored in a new folder named Bellabeat project  with a csv extension.
The csv files were opened using Excel and copy of relevant datasets was stored in desktop as folder. Then each folder was inspected.
Activity, calories, intensities, steps  datasets have no duplicates. Sleep and heart datasets have duplicates that were removed. Weight dataset have no duplicates but some manual reports in this data are false as a result false reports were filtered out, new column USERTYPE was created based on BMI classification.

 R STUDIO Codes 

# installing packages
install.packages("tidyverse")
install.packages("lubridate")
install.packages("dplyr")
install.packages("ggplot2")
install.packages("tidyr")
install.packages("here") 
install.packages("skimr") 
install.packages("janitor")

# loading libraries 
library(tidyverse)
library(lubridate)
library(dplyr)
library(ggplot2)
library(tidyr)
library(here)
library(skimr)
library(janitor)

# Working Directory 
setwd("C:/Users/H/Desktop")
> d_Activity <- read.csv("daily_Activity.csv")           # 1
> d_calories <- read.csv("daily_calories.csv")            # 2
> d_intensities <- read.csv("daily_intensities.csv")    # 3
> d_steps <- read.csv("daily_steps.csv")                    # 4
> d_weight <- read.csv("cleanbmi.csv")                    #5
> d_sleep <-  read.csv("cleansleep.csv")                   #6
 

Analyse


# working with d_sleep dataset
> str(d_sleep)
'data.frame': 410 obs. of  7 variables:
 $ Id                : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
 $ SleepDay          : chr  "04/12/2016 00:00" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
 $ TotalSleepRecords : int  1 2 1 2 1 1 1 1 1 1 ...
 $ TotalMinutesAsleep: int  327 384 412 340 700 304 360 325 361 430 ...
 $ TotalSleepHours   : chr  "5:27" "6:24" "6:52" "5:40" ...
 $ TotalTimeInBed    : int  346 407 442 367 712 320 377 364 384 449 ...
 $ TotalBedHours     : chr  "5:46" "6:47" "7:22" "6:07" ...
> summary(d_sleep)
       Id              SleepDay         TotalSleepRecords TotalMinutesAsleep
 Min.   :1.504e+09   Length:410         Min.   :1.00      Min.   : 58.0     
 1st Qu.:3.977e+09   Class :character   1st Qu.:1.00      1st Qu.:361.0     
 Median :4.703e+09   Mode  :character   Median :1.00      Median :432.5     
 Mean   :4.995e+09                      Mean   :1.12      Mean   :419.2     
 3rd Qu.:6.962e+09                      3rd Qu.:1.00      3rd Qu.:490.0     
 Max.   :8.792e+09                      Max.   :3.00      Max.   :796.0     
 TotalSleepHours    TotalTimeInBed  TotalBedHours     
 Length:410         Min.   : 61.0   Length:410        
 Class :character   1st Qu.:403.8   Class :character  
 Mode  :character   Median :463.0   Mode  :character  
                    Mean   :458.5                     
                    3rd Qu.:526.0                     
                    Max.   :961.0                     
> n_distinct(d_sleep)
[1] 410
# creating usertype based on TotalMinutesAsleep
> user<-d_sleep %>% 
+   mutate(user_type=case_when(
+   TotalMinutesAsleep <360 ~ "SSS", 
+   TotalMinutesAsleep >=360 & TotalMinutesAsleep <540 ~ "NORMAL", 
+   TotalMinutesAsleep >540 ~ "OVERSLEEP"
+   ))
# convert user_type chr to factor user_type
d_user <-mutate(user,user_type=as.factor(user_type))
glimpse(d_user)
Rows: 410
Columns: 8
$ Id                 <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 15039…
$ SleepDay           <chr> "04/12/2016 00:00", "4/13/2016 12:00:00 AM", "4/15/20…
$ TotalSleepRecords  <int> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ TotalMinutesAsleep <int> 327, 384, 412, 340, 700, 304, 360, 325, 361, 430, 277…
$ TotalSleepHours    <chr> "5:27", "6:24", "6:52", "5:40", "11:40", "5:04", "6:0…
$ TotalTimeInBed     <int> 346, 407, 442, 367, 712, 320, 377, 364, 384, 449, 323…
$ TotalBedHours      <chr> "5:46", "6:47", "7:22", "6:07", "11:52", "5:20", "6:1…
$ user_type          <fct> SSS, NORMAL, NORMAL, SSS, OVERSLEEP


ggplot(data = d_user)+
  geom_smooth(mapping = aes(x=TotalMinutesAsleep,y=TotalTimeInBed))+
  geom_point(mapping = aes(x=TotalMinutesAsleep,y=TotalTimeInBed,color="orange"))

d_user %>%
  group_by(user_type) %>%
  summarise(total = n()) %>%
  mutate(totals = sum(total)) %>%
  group_by(user_type) %>%
  summarise(Percent = total / totals) %>%
  ggplot(aes(user_type,y=Percent, fill=user_type)) +
  geom_col()+
  scale_y_continuous(labels = scales::percent) +
  theme(legend.position="none") +
  labs(title="Usertype", x=NULL) +
  theme(legend.position="none", text = element_text(size = 20),plot.title = element_text(hjust = 0.5))



 # Working with weight data
> str(d_weight)
'data.frame': 41 obs. of  6 variables:
 $ Id            : num  1.50e+09 1.50e+09 2.87e+09 2.87e+09 4.32e+09 ...
 $ WeightKg      : num  52.6 52.6 56.7 57.3 72.4 ...
 $ WeightPounds  : num  116 116 125 126 160 ...
 $ BMI           : num  22.6 22.6 21.5 21.7 27.5 ...
 $ USERTYPE      : chr  "normal" "normal" "normal" "normal" ...
 $ IsManualReport: logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
> summary(d_weight)
       Id               WeightKg      WeightPounds        BMI       
 Min.   :1.504e+09   Min.   :52.60   Min.   :116.0   Min.   :21.45  
 1st Qu.:4.559e+09   1st Qu.:61.20   1st Qu.:134.9   1st Qu.:23.89  
 Median :6.962e+09   Median :61.50   Median :135.6   Median :24.00  
 Mean   :6.074e+09   Mean   :62.41   Mean   :137.6   Mean   :24.39  
 3rd Qu.:6.962e+09   3rd Qu.:62.10   3rd Qu.:136.9   3rd Qu.:24.24  
 Max.   :6.962e+09   Max.   :72.40   Max.   :159.6   Max.   :27.46  
   USERTYPE         IsManualReport
 Length:41          Mode:logical  
 Class :character   TRUE:41       
 Mode  :character                 
                                                          
> n_distinct(d_weight)
[1] 22
> #change USERTYPE chr.to factor 
> d_w <-mutate(d_weight,USERTYPE=as.factor(USERTYPE))
> glimpse(d_w)
Rows: 41
Columns: 6
$ Id             <dbl> 1503960366, 1503960366, 2873212765, 2873212765, 431970357…
$ WeightKg       <dbl> 52.6, 52.6, 56.7, 57.3, 72.4, 72.3, 69.7, 70.3, 69.9, 69.…
$ WeightPounds   <dbl> 115.9631, 115.9631, 125.0021, 126.3249, 159.6147, 159.394…
$ BMI            <dbl> 22.65, 22.65, 21.45, 21.69, 27.45, 27.38, 27.25, 27.46, 2…
$ USERTYPE       <fct> normal, normal, normal, normal, overweight, overweight, o…
$ IsManualReport <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRU…

ggplot(data = d_w)+
  geom_smooth(mapping = aes(x=WeightKg,y=BMI))+
  geom_point(mapping = aes(x=WeightKg,y=BMI,color="orange"))

)
d_w %>%
    group_by(USERTYPE) %>%
    summarise(total = n()) %>%
    mutate(totals = sum(total)) %>%
    group_by(USERTYPE) %>%
    summarise(Total_Percent = total / totals) %>%
    ggplot(aes(USERTYPE,y=Total_Percent, fill=USERTYPE)) +
    geom_col()+
    scale_y_continuous(labels = scales::percent) +
    theme(legend.position="none") +
    labs(title="USERTYPE", x=NULL) +
    theme(legend.position="none", text = element_text(size = 20),plot.title = element_text(hjust = 0.5))


# working with activity, calories, intensities, steps datasets
> #How many unique participants are there in each dataframe? 
> n_distinct(d_Activity$Id)
[1] 33
> n_distinct(d_calories$Id)
[1] 33
> n_distinct(d_intensities$Id)
[1] 33
> n_distinct(d_steps$Id)
[1] 33
> #How many observations are there in each dataframe?
> nrow(d_Activity)
[1] 940
> nrow(d_calories)
[1] 940
> nrow(d_intensities)
[1] 940
> nrow(d_steps)
[1] 940
str(d_Activity)
'data.frame': 940 obs. of  15 variables:
 $ Id                      : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
 $ ActivityDate            : chr  "04/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
 $ TotalSteps              : int  13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
 $ TotalDistance           : num  8.5 6.97 6.74 6.28 8.16 ...
 $ TrackerDistance         : num  8.5 6.97 6.74 6.28 8.16 ...
 $ LoggedActivitiesDistance: num  0 0 0 0 0 0 0 0 0 0 ...
 $ VeryActiveDistance      : num  1.88 1.57 2.44 2.14 2.71 ...
 $ ModeratelyActiveDistance: num  0.55 0.69 0.4 1.26 0.41 ...
 $ LightActiveDistance     : num  6.06 4.71 3.91 2.83 5.04 ...
 $ SedentaryActiveDistance : num  0 0 0 0 0 0 0 0 0 0 ...
 $ VeryActiveMinutes       : int  25 21 30 29 36 38 42 50 28 19 ...
 $ FairlyActiveMinutes     : int  13 19 11 34 10 20 16 31 12 8 ...
 $ LightlyActiveMinutes    : int  328 217 181 209 221 164 233 264 205 211 ...
 $ SedentaryMinutes        : int  728 776 1218 726 773 539 1149 775 818 838 ...
 $ Calories                : int  1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...
> str(d_calories)
'data.frame': 940 obs. of  3 variables:
 $ Id         : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
 $ ActivityDay: chr  "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
 $ Calories   : int  1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...
> str(d_intensities)
'data.frame': 940 obs. of  10 variables:
 $ Id                      : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
 $ ActivityDay             : chr  "04/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
 $ SedentaryMinutes        : int  728 776 1218 726 773 539 1149 775 818 838 ...
 $ LightlyActiveMinutes    : int  328 217 181 209 221 164 233 264 205 211 ...
 $ FairlyActiveMinutes     : int  13 19 11 34 10 20 16 31 12 8 ...
 $ VeryActiveMinutes       : int  25 21 30 29 36 38 42 50 28 19 ...
 $ SedentaryActiveDistance : num  0 0 0 0 0 0 0 0 0 0 ...
 $ LightActiveDistance     : num  6.06 4.71 3.91 2.83 5.04 ...
 $ ModeratelyActiveDistance: num  0.55 0.69 0.4 1.26 0.41 ...
 $ VeryActiveDistance      : num  1.88 1.57 2.44 2.14 2.71 ...
> str(d_steps)
'data.frame': 940 obs. of  3 variables:
 $ Id         : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
 $ ActivityDay: chr  "04/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
 $ StepTotal  : int  13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
> #all datasets had  the 'Id' field common.
> #all dataets expect for d_activity have ActivityDay common.We can rename the ActivityDate to AcitivityDay

# rename d_Activity data ActivityDate col to ActivityDay col
d_Activity <- rename( d_Activity,
                      ActivityDay = ActivityDate)
# now we can merge 4 dataset by Id and ActivityDay                                                                                         
>    merge_1 <- merge(d_Activity, d_calories, by= c("Id", "ActivityDay"))
>    merge_2 <- merge(d_intensities,d_steps, by= c("Id","ActivityDay"))
>    All_merge <- merge(merge_1, merge_2, by = c("Id","ActivityDay","SedentaryMinutes",
+                                                "LightlyActiveMinutes","FairlyActiveMinutes",
+                                                "VeryActiveMinutes", "SedentaryActiveDistance", 
+                                                "LightActiveDistance", "ModeratelyActiveDistance", 
+                                                "VeryActiveDistance"))

glimpse(All_merge)
Rows: 578
Columns: 17
$ Id                       <dbl> 1503960366, 1503960366, 1503960366, 1503960366,…
$ ActivityDay              <chr> "4/13/2016", "4/14/2016", "4/15/2016", "4/16/20…
$ SedentaryMinutes         <int> 776, 1218, 726, 773, 539, 1149, 775, 818, 838, …
$ LightlyActiveMinutes     <int> 217, 181, 209, 221, 164, 233, 264, 205, 211, 13…
$ FairlyActiveMinutes      <int> 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21, 5, 1…
$ VeryActiveMinutes        <int> 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 41, 39,…
$ SedentaryActiveDistance  <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,…
$ LightActiveDistance      <dbl> 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5.03, 4.24,…
$ ModeratelyActiveDistance <dbl> 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1.32, 0.48,…
$ VeryActiveDistance       <dbl> 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3.53, 1.96,…
$ TotalSteps               <int> 10735, 10460, 9762, 12669, 9705, 13019, 15506, …
$ TotalDistance            <dbl> 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.88, 6.68,…
$ TrackerDistance          <dbl> 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.88, 6.68,…
$ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Calories.x               <int> 1797, 1776, 1745, 1863, 1728, 1921, 2035, 1786,…
$ Calories.y               <int> 1797, 1776, 1745, 1863, 1728, 1921, 2035, 1786,…
$ StepTotal                <int> 10735, 10460, 9762, 12669, 9705, 13019, 15506, …

> #  convert ActivityDay chr to date format
>    d_merge <- mutate(All_merge, ActivityDay = as.Date(ActivityDay, format= "%m/%d/%Y"))
>    class(Date$ActivityDay)
[1] "Date"
> #  convert date to weekday 
  d_merge$Day <- weekdays(d_merge$ActivityDay)
  d_merge$Day <- factor(d_merge$Day,levels = c('Sunday','Monday',
              'Tuesday','Wednesday','Thursday','Friday','Saturday'))

glimpse(d_merge)
Rows: 578
Columns: 18
$ Id                       <dbl> 1503960366, 1503960366, 1503960366, 1503960366,…
$ ActivityDay              <date> 2016-04-13, 2016-04-14, 2016-04-15, 2016-04-16…
$ SedentaryMinutes         <int> 776, 1218, 726, 773, 539, 1149, 775, 818, 838, …
$ LightlyActiveMinutes     <int> 217, 181, 209, 221, 164, 233, 264, 205, 211, 13…
$ FairlyActiveMinutes      <int> 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21, 5, 1…
$ VeryActiveMinutes        <int> 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 41, 39,…
$ SedentaryActiveDistance  <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,…
$ LightActiveDistance      <dbl> 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5.03, 4.24,…
$ ModeratelyActiveDistance <dbl> 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1.32, 0.48,…
$ VeryActiveDistance       <dbl> 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3.53, 1.96,…
$ TotalSteps               <int> 10735, 10460, 9762, 12669, 9705, 13019, 15506, …
$ TotalDistance            <dbl> 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.88, 6.68,…
$ TrackerDistance          <dbl> 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.88, 6.68,…
$ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Calories.x               <int> 1797, 1776, 1745, 1863, 1728, 1921, 2035, 1786,…
$ Calories.y               <int> 1797, 1776, 1745, 1863, 1728, 1921, 2035, 1786,…
$ StepTotal                <int> 10735, 10460, 9762, 12669, 9705, 13019, 15506, …
$ Day                      <fct> Wednesday, Thursday, Friday, Saturday, Sunday, …

# Summary statistics
>   n_distinct(d_merge$Id)
[1] 33
>   nrow(d_merge)
[1] 578
>  summary(d_merge)
       Id             ActivityDay         SedentaryMinutes LightlyActiveMinutes
 Min.   :1.504e+09   Min.   :2016-04-13   Min.   :   2     Min.   :  0.0       
 1st Qu.:2.347e+09   1st Qu.:2016-04-17   1st Qu.: 738     1st Qu.:135.0       
 Median :4.445e+09   Median :2016-04-21   Median :1070     Median :202.5       
 Mean   :4.882e+09   Mean   :2016-04-21   Mean   :1004     Mean   :198.3       
 3rd Qu.:6.962e+09   3rd Qu.:2016-04-26   3rd Qu.:1232     3rd Qu.:271.0       
 Max.   :8.878e+09   Max.   :2016-04-30   Max.   :1440     Max.   :518.0       
                                                                               
 FairlyActiveMinutes VeryActiveMinutes SedentaryActiveDistance
 Min.   :  0.00      Min.   :  0.00    Min.   :0.000000       
 1st Qu.:  0.00      1st Qu.:  0.00    1st Qu.:0.000000       
 Median :  7.00      Median :  5.00    Median :0.000000       
 Mean   : 13.73      Mean   : 22.06    Mean   :0.001609       
 3rd Qu.: 20.00      3rd Qu.: 31.75    3rd Qu.:0.000000       
 Max.   :113.00      Max.   :210.00    Max.   :0.110000       
                                                              
 LightActiveDistance ModeratelyActiveDistance VeryActiveDistance   TotalSteps   
 Min.   : 0.000      Min.   :0.0000           Min.   : 0.000     Min.   :    0  
 1st Qu.: 2.002      1st Qu.:0.0000           1st Qu.: 0.000     1st Qu.: 3992  
 Median : 3.430      Median :0.2600           Median : 0.270     Median : 7640  
 Mean   : 3.425      Mean   :0.5652           Mean   : 1.552     Mean   : 7787  
 3rd Qu.: 4.827      3rd Qu.:0.8175           3rd Qu.: 2.150     3rd Qu.:10778  
 Max.   :10.710      Max.   :5.1200           Max.   :21.660     Max.   :29326  
                                                                                
 TotalDistance    TrackerDistance  LoggedActivitiesDistance   Calories.x  
 Min.   : 0.000   Min.   : 0.000   Min.   :0.0000           Min.   :   0  
 1st Qu.: 2.683   1st Qu.: 2.683   1st Qu.:0.0000           1st Qu.:1862  
 Median : 5.335   Median : 5.335   Median :0.0000           Median :2138  
 Mean   : 5.592   Mean   : 5.575   Mean   :0.1101           Mean   :2340  
 3rd Qu.: 7.728   3rd Qu.: 7.718   3rd Qu.:0.0000           3rd Qu.:2794  
 Max.   :26.720   Max.   :26.720   Max.   :4.9421           Max.   :4900  
                                                                          
   Calories.y     StepTotal            Day    
 Min.   :   0   Min.   :    0   Sunday   :64  
 1st Qu.:1862   1st Qu.: 3992   Monday   :64  
 Median :2138   Median : 7640   Tuesday  :64  
 Mean   :2340   Mean   : 7787   Wednesday:97  
 3rd Qu.:2794   3rd Qu.:10778   Thursday :97  
 Max.   :4900   Max.   :29326   Friday   :97  
                                Saturday :95  
#Plotting a few explorations for d_merge dataframe
#Relation between StepTotal and TotalDistance # positive relation
  ggplot(data=d_merge)+
    geom_smooth (mapping = aes(x=StepTotal, y=TotalDistance)) +
    geom_point(mapping= aes(x=StepTotal,y=TotalDistance, color="orange"))


# Relation between Day and TotalDistance
    ggplot(data = d_merge) + geom_smooth(mapping = aes(x=TotalDistance,y=Day,color="orange"))


#grouping of  user into four categories based on  their activity distance 
  data_by_usertype_d <- d_merge %>%
    summarise(
      user_type = factor(case_when(
        SedentaryActiveDistance > mean(SedentaryActiveDistance) & LightActiveDistance < mean(LightActiveDistance) & ModeratelyActiveDistance < mean(ModeratelyActiveDistance) & VeryActiveDistance < mean(VeryActiveDistance) ~ "Sedentary",
        SedentaryActiveDistance < mean(SedentaryActiveDistance) & LightActiveDistance > mean(LightActiveDistance) & ModeratelyActiveDistance < mean(ModeratelyActiveDistance) & VeryActiveDistance< mean(VeryActiveDistance) ~ "Light",
        SedentaryActiveDistance < mean(SedentaryActiveDistance) & LightActiveDistance < mean(LightActiveDistance) & ModeratelyActiveDistance > mean(ModeratelyActiveDistance) & VeryActiveDistance < mean(VeryActiveDistance) ~ "Moderate",
        SedentaryActiveDistance < mean(SedentaryActiveDistance) & LightActiveDistance < mean(LightActiveDistance) & ModeratelyActiveDistance < mean(ModeratelyActiveDistance) & VeryActiveDistance > mean(VeryActiveDistance) ~ "Very",
      ),levels=c("Sedentary", "Light", "Moderate", "Very")), Calories.x, .group=Id) %>%
    drop_na()
# viz
data_by_usertype_d %>%
    group_by(user_type) %>%
    summarise(total = n()) %>%
    mutate(totals = sum(total)) %>%
    group_by(user_type) %>%
    summarise(Total_Percent = total / totals) %>%
    ggplot(aes(user_type,y=Total_Percent, fill=user_type)) +
    geom_col()+
    scale_y_continuous(labels = scales::percent) +
    theme(legend.position="none") +
    labs(title="User Type Distridution", x=NULL) +
    theme(legend.position="none", text = element_text(size = 20),plot.title = element_text(hjust = 0.5))


Share 

Here is the link of Presentation

Act 


Top Three Recommendations

1. Motivate users to walk farther, giving guidance establishing a healthy sleep pattern.

2. Provide rewards and gifts for people that reach their daily goals and recommended diets.

3. Making the leaf product appealing and comfy so that women can wear it in many settings.



Limitation

The information is applicable to small numbers of  distinct users we need large sample data. 

Demographic information like Age, Gender, Occupation are missing we need that in order to get better understanding.

False BMI manual reports were left out . We need True BMI  manual Reports 



Thursday, March 9, 2023

Case Study How Does a Bike Share Navigate Speedy Success

 

Introduction

Cyclistic introduced a popular bike-share programme in 2016. The initiative has expanded since then to include a fleet of 5,824 bicycles that are geotracked and locked into a system of 692 stations throughout Chicago. The bicycles can at any time be unlocked from one station and brought back to any other station in the network.

Up to this point, Cyclistic's marketing approach focused on raising public awareness and appealing to a wide range of consumer groups. The ability of its price plans to be flexible was one strategy that made these things possible. There are three Pricing Plans offered by Cyclistic Bike Share company.

1. Single-ride passes
2. Full-day passes
3. Annual memberships 

Casual riders : Customers who purchase single-ride or full-day passes.
Cyclistic members: Customers who purchase annual memberships. 

 

Cyclistic’s financial analysts have concluded that annual members are much more profitable than casual riders. As a result, the greater the number of cyclists, the greater the profit or success. In short, there is a positive correlation between cyclical success and cyclical membership.

# SUCCESS =  PROFIT =  CYCLISTIC MEMBERS  > CASUAL RIDERS #

Business Task 

How do annual members and casual riders use Cyclistic bikes differently? How can Cyclistic use digital media to influence casual riders to become members.

Key Stakeholders

Lily Moreno: The director of marketing and  manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels. 

Cyclistic marketing analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data.

Cyclistic executive team: The detail-oriented executive team will decide whether to approve the recommended marketing program.

About Data source : Cyclistic is a fictional company. For the purposes of this case study, the datasets are appropriate. The data has been made available by Motivate International Inc and this is historical trip data.

Data Cleaning and Data Manipulation 

The Zip files were downloaded locally and copy was stored in a new folder named cyclistic project  with a csv extension.

The csv files were opened using Excel and copy of 12 datasets was stored in desktop as folder dtrip_01, dtrip_02 up to dtrip_12. Then each folder was inspected.

12 month datasets have common column names.

These are the column names  

(ride_id), (rideable_type), (started_at), (ended_at), (start_station_name), (start_station_id)

(end_station_name), (end_station_id), (start_lat), (start_lng), (end_lat), (end_lng), (member_casual)

 No duplicates were found.

start_station_name , start_station_id , end_station_name , end_station_id have some blank values.

 rideable_types are of three kinds electricbike , dockedbike , and classic bike .    

Two new column were added in all 12 datasets 

ride_length and days_of_week  column were added 

ride_length = (ended_at - started_at) format HH:MM:SS

days_of_week = weekday(C2,1) format numbers with no decimal points 1=sunday,7=saturday

mode day_of_week of each month was calulated.

average ride_length for memebers  and casual riders and average_ride_length for users by day of week was calculated using pivot tables and  functions.  

rideable_type of each month was calculated  for both casual and members  using pivot table.

Visualization 

For the Visualization purpose I have used line charts and column charts.

Key Findings

Casual riders used bicycles for longer distances, while members used them for shorter distances as compared to casual riders. During FY Apr 2020–Mar 2021, April was the month in which casuals and members had a high average ride length, and January was the month of a low average ride length for casuals. February is the low average ride length for members. Starting in January and February and ending in November and December, the ride length of cyclistic users is shorter than other months. 

                                            

                                           

The most popular bikes among casual and member cyclists are docked bikes, while classic bikes are less well-liked by both groups. Both users did not ride classic bikes from April to October. From June to September, docked bicycles were the most in demand. In July, casual users had made the most use of docked bikes; in August, it was members. In January and February, casual users have used docked bikes the least, while members haven't used any. Instead, they switch to classic and electric bikes.

                                 

                                 

For casual riders throughout the entire year, Sunday is the most frequent maximum rideable day, whereas for members it is Saturday. Members primarily use bicycles during the weekend to cover maximum distance , while casual users primarily use bicycles during the week to cover minimum distance. For both casual riders and members, the minimum ride length is on Monday and Thursday and maximum ride length day is weekend.

Top three recommendations

1. The most well-liked bikes among casual and member cyclists are docked bikes, although both groups find classic bikes to be less appealing. As a result company should concentrate more on docked bikes.

2. For longer distances- promotions , discounts, referrals and packages are advised.

3. During the winter (Jan -Feb & Nov -Dec) , Cyclistic can expand their range of services by offering Cars for longer distance rides.

Limitation

Since the datasets are large, SQL or R will be more efficient than Excel.
Our analysis is constrained because we don't have enough information  about users besides ride id and user type (casual_members).
Some columns were disregarded because the datasets have blank values.                                                

    


                                     

Malnutrition and Cognitive Development: How Early Nutrition Shapes Educational Outcomes – Evidence from Global Studies

  image source: child rights and you  Introduction Malnutrition is a condition where body is either undernourished or over nourished resulti...