Starbucks is one of biggest coffee company and coffeehouse chain around the world. Since the development of internet and smart phone, people’s lives are more and more involved with those technologies. Starbucks also took advantage by sending promotions through those technologies. There was over 10% of in-store purchases made on mobile devices using the Starbucks APP by July 2013, and the number keeps growing. (Source) Once every few days, Starbucks send out an offer to users of the mobile app. An offer can be merely an advertisement for a drink or an actual offer such as a discount or BOGO (buy one get one free). Some users might not receive any offer during certain weeks. Based on those original data, this project’s data were simulated for customer behavior on the Starbucks rewards mobile app. The goal of this project is to help Starbucks to better understand its customers by the following two questions:
1. Which demographic groups respond best to which offer type? (Statistical Application)
2. What are the top 5 features that influence those offer reactions? (Machine learning Application)
There are three data set in Json format, which includes portfolio.json (containing offer ids and meta data about each offer (duration, type, etc.)), profile.json (demographic data for each customer) and transcript.json (records for transactions, offers received, offers viewed, and offers completed). Those were copied into offer, customer and transaction data frame for better operation.
Here is the schema and explanation of each data frame:
Offer data set with 10 records:
• id (string) — offer id
• offer_type (string) — type of offer ie BOGO, discount, informational
• difficulty (int) — minimum required spend to complete an offer
• reward (int) — reward given for completing an offer
• duration (int) — time for offer to be open, in days
• channels (list of strings)
Customer data set with 17000 records:
• age (int) — age of the customer
• became_member_on (int) — date when customer created an app account
• gender (str) — gender of the customer (note some entries contain ‘O’ for other rather than M or F)
• id (str) — customer id
• income (float) — customer’s income
As the Figure 1 shows, there are slightly more male customers than female customers in customer gender distribution.
In Figure 2, the customer registering distribution is not stable through out years. It met a first jump of numbers around 2016. And it kept flat until the middle of 2017. Then it received the second jump of numbers. However, when the time came to 2018 the numbers of registering came to a marginal decrements. Overall, the numbers of customer registering were in a trend of increasing from 2013 to 2019.
Transaction data set with 306534 records:
• event (str) — record description (ie transaction, offer received, offer viewed, etc.)
• person (str) — customer id
• time (int) — time in hours since start of test. The data begins at time t=0
- value — (dict of strings) — either an offer id or transaction amount depending on the record
There are three different aspects for solving this question, which are demographic groups, responds and offer types. Those are also related to customer, offer and transaction data frames. So, it is necessary to merge those three data frames together (As shown in Fig 3).
Since the responds and group’s demographic are related to the offer type. Additionally, the offer type does not change through entire receiving, viewing and completing processes. We should focus on individual offer type researches, which are researches on BOGO, discount and informational offers. In other words, each offer type would have its own responds and group’s demographic. Due to informational offer can only be viewed, there is no research on its offer completing. The results are shown in blown (Table 1).
For BOGO offer, the customers who completed the offer is younger and have more income with even gender distribution than the customers who viewed offer. For discount offer, the customers who completed the offer is younger and have more income than the customers who viewed offer. Overall, the offer viewed rate is higher than offer completed rate for both BOGO offer and discount offer.
Overall, in Fig 4, the number of viewed BOGO and discount offer is much more than the number of viewed informational offer, which is almost twice more for both BOGO and discount viewed offer. However, Fig 5 shows the difference between the number of completed BOGO and discount offer is small, which is around 2000 offers.
As a result, the different offer type does have different responds of different groups’ demographic. For example, although the number of completed BOGO and discount offer are close to each other, the offer completed rate and gender distribution are very different. And more detailed responds and different groups’ demographic for different offer type are in Table 1.