Suppose you have a set of transactions, such as Superstore orders for instance, each with one to many line items, with each line item belonging to a product category:
Note that this is a single order with actually 14 line items, but my data set has 10,000 records with 5,009 distinct orders. My goal is to obtain one row per order, a count of line items, and most important, a single field with all categories found in that order to enable filter by keyword:
What I mean by history table is what is commonly called a Slowly Changing Dimension, or SCD table of type 2, which retains efficiently all the previous values I decide to track, for accurate historical reports. Why do we care for historical values? In a SFDC CRM context, suppose you are reporting on the number of opportunities processed by each rep over a year, based on the opportunity creation date. If the opportunity owner changes over time before close, you will assign the opportunity to the bucket of the most recent owner, instead of the legitimate first owner, who won’t get the credit. A SCD type 2 setup will archive all the owner values each opportunity took over time, and will store those records only if there was a change. That represents much less storage and processing resource than a daily snapshot table. I need to decide early which columns I want to track if they are likely to change, here in blue for that example:
Now suppose you have a simple setup of type: Fivetran copies SFDC table Opportunity to Snowflake, and overwrites all changes originating from SFDC. How do you build that history table?
One of the many appeals of using Snowflake for me, is the user interface in the browser, which allows not only to perform many admin tasks, but also to poke around data. Yet, that legacy environment, while the envy of many competitors (Hi Redshift and BigQuery!), is still lacking many features necessary to perform analytical tasks, most notably the visualization aspects: no graph possible. Nevertheless, pursuing its vision towards becoming a one stop shop data cloud, Snowflake introduced in June 2020 a free preview for an exciting feature: Snowsight.
Snowsight originates from Snowflake’s first acquisition in March 2019, Numeracy. the goal is to enhance the experience by easing data exploration, SQL coding and provide visualization, still within the browser. I have to say, while this is still a preview, it is already very compelling… Let’s take a quick tour after selecting the preview from your regular Snowflake browser UI:
Data for analytics can be structured in two different ways: Wide or Long. Here is an illustration of the two types of format for the same data set:
When would you use one vs. the other? In general, wide models work better for analytical models and offer a better visual layout for humans, whereas the long model works better for columnar DBs like Snowflake, Redshift or Tableau. Their engine is optimized to crunch data at scale within columns.
Now suppose you use Snowflake as a staging area for your analytics, which I highly recommend. You ought to write your data in the Long format. How do you perform that conversion, sometimes called reshaping a table from wide to long? That operation is also called to transpose (Alteryx), unpivot (Snowflake), pivot Columns to Rows (Tableau Prep) or melt a table (Panda). We will also cover the flip side operation, from Long to Wide, also called to cross tab (Alteryx), pivot (Snowflake, Panda) or pivot Rows to Columns (Tableau Prep).
QuickBooks Online delivers a variety of great reports online and the tools it offers to developers are top notch. The reports themselves could use some more visual zazz… Seriously, I need the accounting data available in a data base to serve to a BI tool and take the data set to the next stage, with scenarios such as time series, forecasts or various predictive models. Extracting that data and normalize it is still quite a challenge. I will go through the steps of importing a GL report from a sandbox all the way to Tableau, using QuickBooks API, Alteryx, Snowflake and Tableau. The logic still applies to alternative tools.