Classify records based on keywords in a text field


Have you ever been tasked to assign customers to a category based on a free form text field? This happens a lot in B2B marketing, using the Title that a prospect has input into a form. It can certainly be done manually, but with larger data sets, automation becomes a must have.

There is no magic wand to perform this classification, but if you already have narrowed down a list of keywords to test with, this can be done pretty easily with Alteryx and a bit of Regex know how. As an example, I will use the file of reviews used in a previous post, which contains a lot of text and many records, to demonstrate the performance of the solution. The scenario is that I need to be able to distinguish reviews of bars vs. reviews of restaurants, and I want some flags so that I can filter them easily.

As you can see below, the workflow is quite simple and processes 27,290 records in 1.4 seconds, with 5,833 reviews classified as related to bars, of which 4,434 mention a restaurant, and 1,399 don’t.

Step 1

Once I have my list of keywords, I need to group them into the IN group, which will include in the category I am researching, and the OUT group, which are reasons to exclude the records from the category. Any number of keywords can be used, and each list will be used to search for the presence of any of those keywords. They are simply input or pasted as text into a Text Input tool:

Step 2

We use the Summarize Tool to concatenate all those keywords into a single string to be used by the Regex:

Note the separators

Step 3

Formula with the REGEX_Match(string,pattern,icase) function, simply set to:
REGEX_Match([text],[Keywords IN]) for a Boolean new field.
Note that configured that way, the function is not case sensitive, which saves us a lot of potential keywords to input.

And that’s it, packaged workflow can be downloaded from here.

This entry was posted in Alteryx, Marketing, Quick & Dirty and tagged . Bookmark the permalink.