One of the many appeals of using Snowflake for me, is the user interface in the browser, which allows not only to perform many admin tasks, but also to poke around data. Yet, that legacy environment, while the envy of many competitors (Hi Redshift and BigQuery!), is still lacking many features necessary to perform analytical tasks, most notably the visualization aspects: no graph possible. Nevertheless, pursuing its vision towards becoming a one stop shop data cloud, Snowflake introduced in June 2020 a free preview for an exciting feature: Snowsight.
Snowsight originates from Snowflake’s first acquisition in March 2019, Numeracy. the goal is to enhance the experience by easing data exploration, SQL coding and provide visualization, still within the browser. I have to say, while this is still a preview, it is already very compelling… Let’s take a quick tour after selecting the preview from your regular Snowflake browser UI:
Data for analytics can be structured in two different ways: Wide or Long. Here is an illustration of the two types of format for the same data set:
When would you use one vs. the other? In general, wide models work better for analytical models and offer a better visual layout for humans, whereas the long model works better for columnar DBs like Snowflake, Redshift or Tableau. Their engine is optimized to crunch data at scale within columns.
Now suppose you use Snowflake as a staging area for your analytics, which I highly recommend. You ought to write your data in the Long format. How do you perform that conversion, sometimes called reshaping a table from wide to long? That operation is also called to transpose (Alteryx), unpivot (Snowflake), pivot Columns to Rows (Tableau Prep) or melt a table (Panda). We will also cover the flip side operation, from Long to Wide, also called to cross tab (Alteryx), pivot (Snowflake, Panda) or pivot Rows to Columns (Tableau Prep).
QuickBooks Online delivers a variety of great reports online and the tools it offers to developers are top notch. The reports themselves could use some more visual zazz… Seriously, I need the accounting data available in a data base to serve to a BI tool and take the data set to the next stage, with scenarios such as time series, forecasts or various predictive models. Extracting that data and normalize it is still quite a challenge. I will go through the steps of importing a GL report from a sandbox all the way to Tableau, using QuickBooks API, Alteryx, Snowflake and Tableau. The logic still applies to alternative tools.
Whether you work in B2B or B2C, getting a better understanding of customer or account transactions in sequence is core. Customer cohorts will show you how well you are doing acquiring new customers, but also do they conduct repeat business with you. Product sequence will help you identify which products are getting traction with your customers: are the sales for one product or for a category just first time trials or do customers come back for repeat purchases of the same product? Working with many references / SKUs, many orders and many customers will bring Excel down to its knees before you know it… You need a solution that can scale!
In this post, I will use Snowflake as a database and Tableau with its standard demo SuperStore data set, but this logic is applicable to any SQL database and any BI product really… First part will show how to easily distinguish new customers from repeat business; second part will show how to identify products within orders by their sequence, and that is less easy… Also, note that I chose to develop the necessary calculations in SQL, but there are certainly ways to perform those calculations in memory in Tableau with Table Calcs, but that is out of scope for today.
Finding duplicate entities at scale in large databases using only names coming from free text boxes is always a challenge in Marketing, common in B2C, often ignored in B2B. Let’s take a simple scenario: I have a very long list of Contacts/Leads/Customers who self identify, typing their name, organization and email address in open text fields. I need to dedupe those contacts to make sure I don’t email them the same content several times, and I can’t trust the email address as unique identifier as I know some of them use their personal email address, or disposable email addresses and also their professional email address. I will assume here that the combo first + last name is consistent and reliable. Last name alone would not work, as too many last names are common. First+last name is better, but still not unique enough, so I will resort to first+last+org. The challenge is that organizations tend to have multiple spellings, and that is the object of the fuzzy logic to match those. For illustration, I will use a list of 1,826 organizations spelled distinctly downloaded from here, with many good examples from real life:
I will pretend that I have the same single first+last name tied to all those organizations to illustrate how fuzzy logic will perform. Therefore, I will need to compare 1,826*1,825=3,332,450 pairs of organizations and retain those that are close enough in their spelling, to be considered for consolidation. That is a lot data to crunch, more than what Excel can handle, but as we will see, a walk in the park for Tableau paired with a data base…