As a digital marketer a very common task is to categorize data, be it for trafficking, for further Analysis or just in preparation for a report.

A typical example: From the campaign, name extract the device and targeting type:

Campaign Name Device Category Targeting Type
124124 – desk – Contextual Mobile CTXT
124124 – Mobile – Contextual Mobile CTXT
124224 – Mob – Interest Mobile INT
124124 – Desktop – Contextual Desktop CTXT
124224 – mobile – Interest Mobile INT

Usually, this would mean a long evening ahead – maybe with a bit of tears held back.

A few ways to deal with this commonly: Filter on the campaign columns repeatedly, or a complex series of “ifs” statements with a fairly nasty nested structure or perhaps even eyeballing it.

However, there is a better way – with three GSheet Formulas: =UPPER, =REGEXTRACT, =VLOOKUP, which you would only need to set up once. We would work towards the final solution column by column.

UPPER

Upper turns all character into upper case to standardize some issues like “Mobile vs mobile”. Feel free to use LOWER if Upper case makes it feel like the campaign name is shouting at you.

Campaign Name Campaign Name (UPPER)
124124 – desk – Contextual 124124 – DESK – CONTEXTUAL
124124 – Mobile – Contextual 124124 – MOBILE – CONTEXTUAL
124224 – Mob – Interest 124224 – MOB – INTEREST
124124 – Desktop – Contextual 124124 – DESKTOP – CONTEXTUAL
124224 – mobile – Interest 124224 – MOBILE – INTEREST

REGEXEXTRACT

Regex is a pattern of characters we want to extract from the Campaign name when it matches what we are looking for.

Generally, there are two major classes of regex: pattern, which is patterns of characters to match and operators, characters that have special meaning.

In our case we would want to extract things like “- MOBILE -, – DESKTOP- , – DESK -, – MOB – ” . (patterns of character)

We could do this like so with a “|” – this implements an ” OR ” logic. (operators)

=REGEXEXTRACT (<CELL REFERENCE>, “- MOBILE -| – DESKTOP- | – DESK -| – MOB -“)

We would have the following output:

Campaign Name (UPPER)

Campaign Name (UPPER) RAW CLASS
124124 – DESK – CONTEXTUAL – DESK –
124124 – MOBILE – CONTEXTUAL – MOBILE –
124224 – MOB – INTEREST – MOB –
124124 – DESKTOP – CONTEXTUAL – DESKTOP –
124224 – MOBILE – INTEREST – MOBILE –

While regex is a wide topic to cover, there are only a few patterns that are useful on a day to day basis. With these and you’ll pretty much be able to extract most items.

Operators: These are special symbols in regex that represent rules, for example, ^Hello means match from the start. “Hello World “-   Hello would be extracted. James, Hello – Nothing would be  extracted as Hello did not appear at the start of the input

    • Positional Operators:
      • ^ as per above means match if it appears at the start of the input
      • $ Used as Hello$, means match if it is at the end of the input
    • Match Any Operators: 
      • . Means to match any character on the position . takes, ie .ello would match on inputs such as Bello, Hello, Zello…et
    • Repetition Operators:
      • *  [Match zero or more] Match repeats of the last character as many times as needed. For example, ap* means app appp apppp would all match.  The most usual usage is .* which means match anything for example if we did .*pie this would match things like applepie, pineapplepie
      • + [Match one or more] Match repeats of the last character once or more. Useful for some instances of pluralization, ba+r  matches, e.g., `bar’ and `baaaar’, but not `cr’.
      • ? [Match zero or one]`ca?r’ matches both `car’ and `cr’, but nothing else.

    • List Operators: 
      • [] Matches characters between [] – for example [ab] matches a or b  
      • [^] Negates the match list and acts as an exclusion list. ie[^x] match anything but x 
    • Range Operators: 
      • – Matches character in a range, for example a-z means any character from a to z
      • – Can be used for number too – eg, 0-9 
    • Logical Operators: 
      • | Means to chain together in an OR logic (either match would extract) ie, Ate|Bed would match on inputs that contain either, TAte, Bedding would all match
    • Escape Character
      • \  We’ve saved the most important for last – for any of the above characters these have special meaning in regex, for example, $ means “pattern that ends with”. Hence to match against a dollar sign we would need to escape the sequence like so \$

VLOOKUP

 

After compiling the list of extracted regex, we could create a table of lookup values like so:

 

for lookuprange:

RAW CLASS RAW CLASS
– DESK – DESKTOP
– MOBILE – MOBILE
– MOB – MOBILE
– DESKTOP – DESKTOP
– MOBILE MOBILE

 

 = VLOOKLUP(RAWCLASS Cell, lookuprange, 2) 

Campaign Name CLASS
124124 – desk – Contextual DESKTOP
124124 – Mobile – Contextual MOBILE
124224 – Mob – Interest MOBILE
124124 – Desktop – Contextual DESKTOP
124224 – mobile – Interest MOBILE
  •  

And there we have it – classifying in a repeatable and accurate way.

Set this up once for a repetitive task once and use it forever!