Data for analytics can be structured in two different ways: Wide or Long. Here is an illustration of the two types of format for the same data set:
When would you use one vs. the other? In general, wide models work better for analytical models and offer a better visual layout for humans, whereas the long model works better for columnar DBs like Snowflake, Redshift or Tableau. Their engine is optimized to crunch data at scale within columns.
Now suppose you use Snowflake as a staging area for your analytics, which I highly recommend. You ought to write your data in the Long format. How do you perform that conversion, sometimes called reshaping a table from wide to long? That operation is also called to transpose (Alteryx), unpivot (Snowflake), pivot Columns to Rows (Tableau Prep) or melt a table (Panda). We will also cover the flip side operation, from Long to Wide, also called to cross tab (Alteryx), pivot (Snowflake, Panda) or pivot Rows to Columns (Tableau Prep).
QuickBooks Online delivers a variety of great reports online and the tools it offers to developers are top notch. The reports themselves could use some more visual zazz… Seriously, I need the accounting data available in a data base to serve to a BI tool and take the data set to the next stage, with scenarios such as time series, forecasts or various predictive models. Extracting that data and normalize it is still quite a challenge. I will go through the steps of importing a GL report from a sandbox all the way to Tableau, using QuickBooks API, Alteryx, Snowflake and Tableau. The logic still applies to alternative tools.
Whether you work in B2B or B2C, getting a better understanding of customer or account transactions in sequence is core. Customer cohorts will show you how well you are doing acquiring new customers, but also do they conduct repeat business with you. Product sequence will help you identify which products are getting traction with your customers: are the sales for one product or for a category just first time trials or do customers come back for repeat purchases of the same product? Working with many references / SKUs, many orders and many customers will bring Excel down to its knees before you know it… You need a solution that can scale!
In this post, I will use Snowflake as a database and Tableau with its standard demo SuperStore data set, but this logic is applicable to any SQL database and any BI product really… First part will show how to easily distinguish new customers from repeat business; second part will show how to identify products within orders by their sequence, and that is less easy… Also, note that I chose to develop the necessary calculations in SQL, but there are certainly ways to perform those calculations in memory in Tableau with Table Calcs, but that is out of scope for today.
Finding duplicate entities at scale in large databases using only names coming from free text boxes is always a challenge in Marketing, common in B2C, often ignored in B2B. Let’s take a simple scenario: I have a very long list of Contacts/Leads/Customers who self identify, typing their name, organization and email address in open text fields. I need to dedupe those contacts to make sure I don’t email them the same content several times, and I can’t trust the email address as unique identifier as I know some of them use their personal email address, or disposable email addresses and also their professional email address. I will assume here that the combo first + last name is consistent and reliable. Last name alone would not work, as too many last names are common. First+last name is better, but still not unique enough, so I will resort to first+last+org. The challenge is that organizations tend to have multiple spellings, and that is the object of the fuzzy logic to match those. For illustration, I will use a list of 1,826 organizations spelled distinctly downloaded from here, with many good examples from real life:
I will pretend that I have the same single first+last name tied to all those organizations to illustrate how fuzzy logic will perform. Therefore, I will need to compare 1,826*1,825=3,332,450 pairs of organizations and retain those that are close enough in their spelling, to be considered for consolidation. That is a lot data to crunch, more than what Excel can handle, but as we will see, a walk in the park for Tableau paired with a data base…
If you have dealt with hierarchies from Salesforce (SFDC), or from most normalized transactional systems, you have faced some challenges to use those hierarchies in Tableau. Those systems treat hierarchies as a recursive model, storing Parent to Child relationships in only 2 columns. The recursive model approach to hierarchies is very efficient in terms of storage, and flexible as it allows for a variety of levels and branches of different lengths. Yet, usage relies on User Defined Functions which are not available in Tableau, as it would break its columnar aggregation logic. There is a quick and dirty approach in Tableau: rebuild the hierarchy using Groups, but it is limited to 2 levels and requires manual maintenance in Tableau, not ideal… A real hierarchy in Tableau requires each level to be defined as its own column in the data source. Breaking down the hierarchy in levels is very desirable for Tableau or any other BI tools really, as it enables filtering and aggregating by any node.