The ability to draw out actionable insights from Search Engine Marketing Campaigns means a better return on investments & ultimately gets you customers at the lowest cost possible.

Even small-scale Search Engine Marketing account has quite a large dataset with hundreds of keywords multiplied by the various metrics (Impressions, Clicks, Conversion Rates, Click Through Rates, Cost per conversion). Lying hidden in these numbers are actionable insights would lead to better performance.

We could apply simple machine learning techniques such as KMeans to sieve and tease out these insights

What is KMeans

KMeans groups together similar items by following steps:
1) Choose a number k that we think the data would have
2) Make k random guess to where the center of the clusters should be (It doesn’t matter if we are wrong)
3) Assign all the elements that are closest to the center we guessed
4) Improve the guesses to be the calculated center of clusters of the above.

Repeat step 3 and 4 until no elements get assigned to a different cluster.

In KMeans, the closeness to the center calculated by considering the features selected by our model with a distance formula.

Usage of KMeans in SEM

By applying Kmeans to the various metrics (Impressions, Clicks, Conversion Rates, Click Through Rates, Cost per conversion) we could form up clusters of keywords for further analysis.
The exciting thing is that this is all automated and makes analysis much more scalable and rapid. We would be able to draw amazing insights easily and action on the campaigns much better compared to the standard processes.

As a fictional example, we are working on an SEM campaign selling business cards with the following dataset in a Google Sheet.

After using the elbow method we find that 4 is the ideal number of clusters and we run KMeans with the following output that well will evaluate against the overall average stats:

CPA (Cost per conversion) $17
CTR (Click Through rate) 21.40%
CVR (Conversion Rate) 0.45%
CPC (Cost per click) $0.07

Group 1:
This keyword is an outlier got grouped into its own cluster.
As we can see it is a very broad keyword with extreme levels of impression and very low CTR and CVR
The only reason why this keyword remains profitable is the extremely low CPCs

Possible Actions:
Refine via the Search Query Report into better keywords or add negative keywords for better performance. Or lastly, if CPA is unacceptable to pause off entirely.

Group 2:
This cluster is filled with high CVR keywords which suggest strong intent while CPC is higher due to competition, CPA is kept low due to the high conversion rates.
These keywords may be considered the core intention of searchers that would become customers

Possible Actions:
Improve the volume on these keywords by giving more attention compared to the rest of the keywords.
Other refinement efforts such as ensuring extension are well applied, 1 to 1 adgroups and very refined adcopies could bring up CTR and hence Quality Score.
So these are high potential keywords that deserves as much attention as we can manage due to the very high CVR.
A more aggressive bid strategy could also help with more volume.

Group 3:
These are similar to Group 2 keywords but are slightly less refined. Overall CVR is relatively high. Basically relatively high-quality keywords in terms of capturing the right intent

Possible Actions:
As this is a less refined group (lower CVR) versus group 2, good bid strategy management would help ensure that these keywords remain on the profitable edge of CPAs as CPCs must remain suppressed to keep good CPA performance.

Group 4:
Similar to group 1, these are very high CPA keywords which means it is not well refined. CPCs are low as competition is most likely not buying into these terms as they are quite low quality. However, is not as extreme as group 1 in terms of volume due the longer tailed (more words in keywords).

Possible Actions:
Same as group 1 but with lower priority

Detailed Process

1 – Importing Data

First we would need to import the data from the doc into a pandas dataframe:

from google.colab import auth
import gspread
from oauth2client.client import GoogleCredentials
import numpy as np
import pandas as pd

#authenticate and grab data
auth.authenticate_user()
gc = gspread.authorize(GoogleCredentials.get_application_default())
sh = gc.open(‘SEMData’)
worksheet = sh.worksheet(“data”)
rows = worksheet.get_all_values()
data=pd.DataFrame.from_records(rows)

2 – Preparing Training Set

2.1 Selecting relevant data

Next we select only converted keywords for analysis and create standard SEM metrics, CPA, CTR and CVR.

data.columns=data.iloc[0] data.reindex(data.index.drop(0))
data[[“Clicks”,”Impr”,”Cost”,”Conversions”]] = data[[“Clicks”,”Impr”,”Cost”,”Conversions”]].apply(pd.to_numeric, errors=’coerce’)

convertedData=data[data[“Conversions”]>0] convertedData.loc[:,”CPA”]=convertedData[“Cost”]/convertedData[“Conversions”] convertedData.loc[:,”CTR”]=convertedData[“Clicks”]/convertedData[“Impr”] convertedData.loc[:,”CVR”]=convertedData[“Conversions”]/convertedData[“Clicks”] convertedData=convertedData.reset_index(drop=True)

2.2 Create training set

Now that that the data is more or less ready, we create a dataframe with the relevant features we want to train on: “Clicks”,”Impr”,”Cost”,”CPA”,”CVR”,”CTR”

rawTrainData=convertedData.loc[:,[“Clicks”,”Impr”,”Cost”,”CPA”,”CVR”,”CTR”]]

3 – Transformation/Training

3.1 Transformation of training set

Before we could proceed as K-means clustering produces more or less round clusters due to being “isotropic” in every direction of space. If we leave it as it is, it would be going heavier on variables with smaller variance. StandardScaler helps by standardizing the data:

from sklearn.preprocessing import StandardScaler
stdTrainData= StandardScaler().fit_transform(rawTrainData)

After which we would apply PCA to as suggested by this paper

from sklearn.decomposition import PCA
pca = PCA(4)
principalComponents = pca.fit_transform(stdTrainData)
3.2 Kmeans Clustering

Finally, we would reach the actual KMeans Clustering that would result in our groupings

from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=4)
result = kmeans.fit_predict(principalComponents)

4 – Output data

With all that done would proceed to push the data back into the spreadsheet in the predict tab

import gspread_dataframe as gd

auth.authenticate_user()
gc = gspread.authorize(GoogleCredentials.get_application_default())
ws = gc.open(“SEMData”).worksheet(“predict”)

ws.clear()
gd.set_with_dataframe(ws, finaldata)