Market Basket Analysis with Google Analytics Data is one fusions of Digital Analytics and Machine Learning
The data is easily available and it is fairly easy to clean.
In this post I will share steps to present steps on how to do that.
I am starting first with the code and its logic. And later I have briefly covered some theory.
First, getting the data
You can get the Google Analytics data into R by using the googleAnalyticsR package. Grateful to Mark Edmondson and team for creating this.
// if package not already installed then install it
//load the functions of the package
//Login with Google to grant approval to R to access your GA data ga_auth(new_user=T)
//you can replace the above with new_user=F after initial //authentication
//Get the account data structure - Accounts>Properties>Views
Now, in Market Basket Analysis we essentially want to discover how purchase of a set of items affects the purchase of other set of items
For this we need data presenting information on items bought together in various transactions
We will use “ga:productName” and “ga:transactionId” as dimensions to get the products purchased and their respective transaction IDs
We will use “ga:uniquePurchases” as the metric
You also need to provide a date range for this data in “YYYY-MM-DD” format
You need to provide the viewId which you can get from the account structure which we got using the ga_account_list() function above
//provide the view ID from the account structure above
//provide the start and end data Start="2018-12-01" End="2018-12-31"
Table <- google_analytics(ViewId,date_range = c(Start,End),metrics = c("ga:uniquePurchases"),dimensions = c("ga:productName","ga:transactionId"))
//Remove the entries without product Name
//Remove the entries without any purchases
//Remove any possible duplicates
//Replace unique purchase with 1, we just want the presence of product //in a transaction, we do not want its volume
The present structure of the Table is something like this :
But to perform the Market Basket Analysis using Arules we need the structure to be like this :
The transaction Ids along the rows and each product name along the columns. For this we will use the reshape2 package created by the legendary Hadley Wickham
//Creating a new data frame with the above logic
// Replacing Na values with 0
//Creating a duplicate to take row names
//The apriori function accepts only product entries the transaction..
//..Ids can't be in rows and need to be passed as rownames instead dcast1$transactionId<-NULL
//Free up some RAM
//The Input to the apriori function needs to be of //datatype"transactions"
Now our dataset is ready, we just need to input that to the apriori function from the Arules package. The package has been created by Michael Hahsler and team
//the choice of support and confidence 'll depend of domain knowledge..
//..and business objective
rules = apriori(dcast1, parameter=list(support=0.007, confidence=0.25));
//To view the results in Data Table format you can convert the above
//Convert the Support and confidence columns to %
// We will need scales package for this again by Hadley Wickham
Now, lets look at the result:
We have three terms support, confidence and lift. Lets understand each with the smart art below:
The above presents results for chances of purchase of Milk if Bananas are bought. In general, the you will read the results as chances of purchase of items on Right hand side if items on left hand side are purchased.
I personally like this solution a lot as the data is relatively easily available because of Google Analytics.
It presents quick insights on which items can be clubbed together as bundle.
Which items can be suggested at order confirmation page or through post purchase campaigns.
Which items can be suggested as add on in the purchase journey.
You can get as creative as you want.
Contact us here.
Found it informative? Leave a comment! You can also give us a thumbs up by sharing it with your community. Also did you know that you can light up our day by subscribing to our blog? Subscribe here –