How to filter transactions by a calculated category using SQL Window Functions

Have you ever found yourself in a situation where you need to filter a set of transactions based on categories which need to be calculated from that same set?

Let’s say you have a data set containing all the customer responses triggered by your Marketing activity and you need to rank Marketing campaigns performance based on the number of first touch responses they originated.
Another example more familiar to Tableau users: you have a full list of orders with line items from the Superstore data set and you need to filter customers based on the largest item’s product family they bought on their most recent order… Or how about showing on a map the revenue of customers who have been inactive for a year or more, based on the shipping address of their last order?

Sounds like an awful lot of SQL steps, doesn’t it?
All those scenarios revolve around the same meta problem: the need to switch between different levels of aggregated data within the same process. In other words, the filter is computed on a subset of the transactions, and we need to run that filter without destroying any information, nor storing the same information multiple times, which could work, but would be resource intensive and frankly inelegant.
The better approach requires a Data Base engine (sorry Excel cave folks!): by combining a series of INDB tricks and SQL functions, this can be done in a couple of easy clicks with great performance, at no additional storage cost. Continue reading

Posted in Alteryx, Marketing, Performance, Redshift, Snowflake, SQL | Tagged , | Leave a comment

How to Load Data in Bulk to Snowflake with Alteryx

Why a post about Snowflake? What does Alteryx and Tableau have to do with Snowflake?

Here is some context: as I was planning to launch my Data Lake consulting business, which is now launched, I was looking for a place to store and share prototypes, which need to be queried from Tableau with high performance. Cloud databases offer the flexibility in terms of capacity, performance, pricing and portability that I was looking for.

The leaders in that field, AWS and Azure, offer a lot of solutions, but I decided to take a closer look at Snowflake, not only because it is a hot startup, but also because it promises Uncompromising simplicity, read: no need for a DB Admin! I consider that feature a huge advantage over Redshift, which takes quite some technical skills to get even started…

My experience testing the platform proved that that specific promise of easy administration is met, even though that simplicity does not extend to the overall experience. To put it another way, Snowflake is not Citizen Analyst ready yet, but with a bit of tweaking, and the research I will share below, it comes pretty close!

To illustrate the step by step process, I will use the business scenario of a previous post: storing 27k rows of review data for a Word Cloud analysis.
Continue reading

Posted in Alteryx, Performance, Snowflake, SQL, Tableau | Tagged , , , , | Leave a comment

Access and Process Leads Activity History with Marketo Bulk Extract APIs

Marketo’s core expertise is Marketing Automation, a software category it pioneered along with Eloqua, Pardot, Hubspot, Adobe and now 208 additional vendors and counting. Marketo still holds a solid market share, especially on the West Coast, especially in Tech companies. For analysts and executives who love data, Marketing Automation platforms offer an attractive perspective besides operations: they capture large amounts of information on the individuals who comprise customer and prospect organizations, from the details of the websites and other assets they consume, to the forms they populate, the emails they receive, open, forward and many more indicators. Marketing Analytics are a fast growing field part of the sprawling Martech industry, necessary to allocate Marketing resource and improve performance. Alas, Analytics with Marketo have been mostly a tantalizing promise until June 2017. If all this data is indeed stored on the Marketo instance, users who stay within the Marketo interface can query the Lead database only, and NOT their Activities DB, that is the details of the transactions in time stamped sequences of identified individuals. The analyst willing to go beyond those restrictions was left with two options: either invest in the Marketo costly analytics upsell solution (no blending of external data allowed!), or leverage the APIs to get the data his organization owns, out to a serious analytics platform of choice.

Marketo has been offering access to the data generated in those databases through a SOAP API, then a REST API, better suited to extract volumes of transactions. The current version of the Marketo connectors supplied out of the box by Alteryx, as of V11, and I believe by Tableau as well, are still using those transactional REST APIs, which are really meant by Marketo to be used to sync with a CRM, and not for analytics. As a result, as I mentioned in the comments for my previous post on Marketo APIs, the traditional REST APIs  have been gradually stripped over 2016 of the capabilities to extract any decent volume of data at once. So far, most users of those APIs and connectors willing to extract more than a handful of records per session have been facing a dreaded series of errors:

1. Marketo REST API Error code 606 (Rate limit): Marketo limited throughput of records extracted within 20 seconds to 100, and kills the connection if higher
2. Marketo REST API Error code 615 (Concurrency limit): Marketo limits the number of concurrent requests to their API to 10, limiting parallel processing
3. Marketo REST API Error code 6XX (Daily Quota): 10,000 API calls maximum per day.

Furthermore, to definitely discourage the use of Analytics outside of the Marketo moat,  authentication tokens are only valid for 60 minutes, killing any extraction job running >60 minutes. As an indication of scale, in my own experience in my org, one hour is what it takes to extract 6 days of website visits, that is just one Activity type, out of the 18, I had planned to work on. And last but not least, Marketo does not offer a DATE TO filter for transactions in that API, effectively preventing from running updates on data sets. For instance, for a daily refresh of 2016 transactions, a full update for all of 2016 data was required… I wrote WAS because, since June 2017, Marketo has finally released Bulk Extract APIs for People and Activities, with surprisingly little fanfare for a Marketing company…

This is great news, but Marketo has not suddenly turned into a land of milk and honey.  The new bulk approach is not as easy as one could expect and is still riddled with hoops to jump through. However, if you have read that far, you must already get why it’s a much bigger deal than Marketo makes it sound, (Campaigns Golden Path analysis anyone?) and you will be able by the end of this post to take immediate advantage of the new API. You will finally get your hands on stacks of your data , instead of just trickles.

Continue reading

Posted in Alteryx, Marketing, Tableau | Tagged , | 2 Comments

Word clouds with Alteryx and R

In a previous post, I explained how to build word clouds easily with a combination of Alteryx and Tableau. But what happens without Tableau? It turns out that Alteryx can leverage the power of existing R libraries to easily generate pretty nice word clouds too. I realized that potential after completing my previous post, thanks to the great work of Dan Magnus, who posted an Alteryx app you can use for free here, even without Alteryx.

Nevertheless, if you want more control, you should learn how to build your own workflow, and benefit of extensive controls on your output. Continue reading

Posted in Alteryx, Automation | Tagged , , | Leave a comment

Navigating Alteryx Maze of Tools

Alteryx is objectively an amazing tool to manipulate data and models, without coding (much). As I mentioned in earlier posts, that simplicity involves trade offs. One of those is the initial difficulty to apprehend the potential and capabilities of the solution. One can feel lost when facing the sheer number of tools available, 170 and counting, without even mentioning the 3rd party tools. Of course, a new user will start with a set of 10-20 tools, and will expand gradually, as she discovers new tools to address new use cases.

Well the great news is that there is a way to accelerate that process, and therefore the overall ramp up to proficiency. During Inspire 2017, Alteryx annual customer event, a map was released which comes really handy:

It is designed as a periodic table and helps discover tools, based on their popularity (the wisdom of crowds…), their use case or type.

Without further ado, here is the link where to download the full size PDF, at the bottom of the post:
https://community.alteryx.com/t5/Engine-Works-Blog/The-Periodic-Table-of-Alteryx-tools/ba-p/64120

The whole post is well worth a read; the talented creator gives the full story behind the inception and the execution…

 

Posted in Alteryx | Leave a comment