11478 Avenue Brooklyn, New York
info@codeless.co
+445 4110 547

Market Basket Analysis With Google Analytics Data

  • Posted by datavinci
  • On February 1, 2019
  • 0 Comments
  • Google Analytics, Machine Learning with Digital Analytics, Machine Learning with Google Analytics, Market Basket Analytics

Market Basket Analysis with Google Analytics Data is one fusions of Digital Analytics and Machine Learning

The data is easily available and it is fairly easy to clean.

In this post I will share steps to present steps on how to do that.

I am starting first with the code and its logic. And later I have briefly covered some theory.

First, getting the data

You can get the Google Analytics data into R by using the googleAnalyticsR package. Grateful to Mark Edmondson and team for creating this.

http://code.markedmondson.me/googleAnalyticsR/

https://code.markedmondson.me/googleAuthR/articles/google-authentication-types.html

// if package not already installed then install it 
if(!(require(googleAnalyticsR)))install.packages("googleAnalyticsR") if(!(require(googleAuthR)))install.packages("googleAuthR")

//load the functions of the package
library(googleAnalyticsR)
library(googleAuthR)

//Login with Google to grant approval to R to access your GA data ga_auth(new_user=T)

//you can replace the above with new_user=F after initial //authentication
//Get the account data structure - Accounts>Properties>Views
my_accounts<-ga_account_list()

Now, in Market Basket Analysis we essentially want to discover how purchase of a set of items affects the purchase of other set of items

For this we need data presenting information on items bought together in various transactions

We will use “ga:productName” and “ga:transactionId” as dimensions to get the products purchased and their respective transaction IDs

We will use “ga:uniquePurchases” as the metric

You also need to provide a date range for this data in “YYYY-MM-DD” format

You need to provide the viewId which you can get from the account structure which we got using the ga_account_list() function above

//provide the view ID from the account structure above 
//ViewId="UA-XXXXXXXX"
//provide the start and end data Start="2018-12-01" End="2018-12-31"

Table <- google_analytics(ViewId,date_range = c(Start,End),metrics = c("ga:uniquePurchases"),dimensions = c("ga:productName","ga:transactionId"))

//Remove the entries without product Name
Table<-Table[Table$productName!="(not set)",]

//Remove the entries without any purchases
Table<-Table[Table$uniquePurchases!=0,]

//Remove any possible duplicates
Table<-unique(Table)

//Replace unique purchase with 1, we just want the presence of product //in a transaction, we do not want its volume
Table$uniquePurchases<-1

The present structure of the Table is something like this :

But to perform the Market Basket Analysis using Arules we need the structure to be like this :

The transaction Ids along the rows and each product name along the columns. For this we will use the reshape2 package created by the legendary Hadley Wickham

if(!(require(reshape2)))install.packages("reshape2") 
library(reshape2)

//Creating a new data frame with the above logic
dcast<-reshape2::dcast(Table,transactionId~productName)

// Replacing Na values with 0
dcast[is.na(dcast)]<-0

//Creating a duplicate to take row names
dcast1<-dcast

//The apriori function accepts only product entries the transaction..
//..Ids can't be in rows and need to be passed as rownames instead dcast1$transactionId<-NULL
rownames(dcast1)<-dcast$transactionId

//Free up some RAM
rm(dcast,Table)

//The Input to the apriori function needs to be of //datatype"transactions"

dcast1<-as.matrix(dcast1)
dcast1<-as(dcast1,"transactions")

Now our dataset is ready, we just need to input that to the apriori function from the Arules package. The package has been created by Michael Hahsler and team

if(!(require(arules)))install.packages("arules")
library(arules)

//the choice of support and confidence 'll depend of domain knowledge..
//..and business objective
rules = apriori(dcast1, parameter=list(support=0.007, confidence=0.25));

//To view the results in Data Table format you can convert the above
Table1<-DATAFRAME((rules))

//Convert the Support and confidence columns to %
// We will need scales package for this again by Hadley Wickham

Table1$support<-percent(Table1$support)
Table1$confidence<-percent(Table1$confidence)
Table1$lift<-round(Table1$lift,2)

Now, lets look at the result:

We have three terms support, confidence and lift. Lets understand each with the smart art below:

The above presents results for chances of purchase of Milk if Bananas are bought. In general, the you will read the results as chances of purchase of items on Right hand side if items on left hand side are purchased.

I personally like this solution a lot as the data is relatively easily available because of Google Analytics.

It presents quick insights on which items can be clubbed together as bundle.

Which items can be suggested at order confirmation page or through post purchase campaigns.

Which items can be suggested as add on in the purchase journey.

You can get as creative as you want.

Contact us here.

Found it informative? Leave a comment! You can also give us a thumbs up by sharing it with your community. Also did you know that you can light up our day by subscribing to our blog? Subscribe here –

0 Comments

Leave Reply

Your email address will not be published. Required fields are marked *