Whether you work in B2B or B2C, getting a better understanding of customer or account transactions in sequence is core. Customer cohorts will show you how well you are doing acquiring new customers, but also do they conduct repeat business with you. Product sequence will help you identify which products are getting traction with your customers: are the sales for one product or for a category just first time trials or do customers come back for repeat purchases of the same product? Working with many references / SKUs, many orders and many customers will bring Excel down to its knees before you know it… You need a solution that can scale!
In this post, I will use Snowflake as a database and Tableau with its standard demo SuperStore data set, but this logic is applicable to any SQL database and any BI product really… First part will show how to easily distinguish new customers from repeat business; second part will show how to identify products within orders by their sequence, and that is less easy… Also, note that I chose to develop the necessary calculations in SQL, but there are certainly ways to perform those calculations in memory in Tableau with Table Calcs, but that is out of scope for today.
Finding duplicate entities at scale in large databases using only names coming from free text boxes is always a challenge in Marketing, common in B2C, often ignored in B2B. Let’s take a simple scenario: I have a very long list of Contacts/Leads/Customers who self identify, typing their name, organization and email address in open text fields. I need to dedupe those contacts to make sure I don’t email them the same content several times, and I can’t trust the email address as unique identifier as I know some of them use their personal email address, or disposable email addresses and also their professional email address. I will assume here that the combo first + last name is consistent and reliable. Last name alone would not work, as too many last names are common. First+last name is better, but still not unique enough, so I will resort to first+last+org. The challenge is that organizations tend to have multiple spellings, and that is the object of the fuzzy logic to match those. For illustration, I will use a list of 1,826 organizations spelled distinctly downloaded from here, with many good examples from real life:
I will pretend that I have the same single first+last name tied to all those organizations to illustrate how fuzzy logic will perform. Therefore, I will need to compare 1,826*1,825=3,332,450 pairs of organizations and retain those that are close enough in their spelling, to be considered for consolidation. That is a lot of data to crunch, more than what Excel can handle, but as we will see, a walk in the park for Tableau paired with a data base…
If you have dealt with hierarchies from Salesforce (SFDC), or from most normalized transactional systems, you have faced some challenges to use those hierarchies in Tableau. Those systems treat hierarchies as a recursive model, storing Parent to Child relationships in only 2 columns. The recursive model approach to hierarchies is very efficient in terms of storage, and flexible as it allows for a variety of levels and branches of different lengths. Yet, usage relies on User Defined Functions which are not available in Tableau, as it would break its columnar aggregation logic. There is a quick and dirty approach in Tableau: rebuild the hierarchy using Groups, but it is limited to 2 levels and requires manual maintenance in Tableau, not ideal… A real hierarchy in Tableau requires each level to be defined as its own column in the data source. Breaking down the hierarchy in levels is very desirable for Tableau or any other BI tools really, as it enables filtering and aggregating by any node.
Have you ever been tasked to assign customers to a category based on a free form text field? This happens a lot in B2B marketing, using the Title that a prospect has input into a form. It can certainly be done manually, but with larger data sets, automation becomes a must have.
There is no magic wand to perform this classification, but if you already have narrowed down a list of keywords to test with, this can be done pretty easily with Alteryx and a bit of Regex know how. As an example, I will use the file of reviews used in a previous post, which contains a lot of text and many records, to demonstrate the performance of the solution. The scenario is that I need to be able to distinguish reviews of bars vs. reviews of restaurants, and I want some flags so that I can filter them easily.