How to Load Data in Bulk to Snowflake with Alteryx

Why a post about Snowflake? What does Alteryx and Tableau have to do with Snowflake?

Here is some context: as I was planning to launch my Data Lake consulting business, which is now launched, I was looking for a place to store and share prototypes, which need to be queried from Tableau with high performance. Cloud databases offer the flexibility in terms of capacity, performance, pricing and portability that I was looking for.

The leaders in that field, AWS and Azure, offer a lot of solutions, but I decided to take a closer look at Snowflake, not only because it is a hot startup, but also because it promises Uncompromising simplicity, read: no need for a DB Admin! I consider that feature a huge advantage over Redshift, which takes quite some technical skills to get even started…

My experience testing the platform proved that that specific promise of easy administration is met, even though that simplicity does not extend to the overall experience. To put it another way, Snowflake is not Citizen Analyst ready yet, but with a bit of tweaking, and the research I will share below, it comes pretty close!

To illustrate the step by step process, I will use the business scenario of a previous post: storing 27k rows of review data for a Word Cloud analysis.
Continue reading

Posted in Alteryx, Performance, Snowflake, SQL, Tableau | Tagged , , , , | 4 Comments

Access and Process Leads Activity History with Marketo Bulk Extract APIs

Marketo’s core expertise is Marketing Automation, a software category it pioneered along with Eloqua, Pardot, Hubspot, Adobe and now 208 additional vendors and counting. Marketo still holds a solid market share, especially on the West Coast, especially in Tech companies. For analysts and executives who love data, Marketing Automation platforms offer an attractive perspective besides operations: they capture large amounts of information on the individuals who comprise customer and prospect organizations, from the details of the websites and other assets they consume, to the forms they populate, the emails they receive, open, forward and many more indicators. Marketing Analytics are a fast growing field part of the sprawling Martech industry, necessary to allocate Marketing resource and improve performance. Alas, Analytics with Marketo have been mostly a tantalizing promise until June 2017. If all this data is indeed stored on the Marketo instance, users who stay within the Marketo interface can query the Lead database only, and NOT their Activities DB, that is the details of the transactions in time stamped sequences of identified individuals. The analyst willing to go beyond those restrictions was left with two options: either invest in the Marketo costly analytics upsell solution (no blending of external data allowed!), or leverage the APIs to get the data his organization owns, out to a serious analytics platform of choice.

Marketo has been offering access to the data generated in those databases through a SOAP API, then a REST API, better suited to extract volumes of transactions. The current version of the Marketo connectors supplied out of the box by Alteryx, as of V11, and I believe by Tableau as well, are still using those transactional REST APIs, which are really meant by Marketo to be used to sync with a CRM, and not for analytics. As a result, as I mentioned in the comments for my previous post on Marketo APIs, the traditional REST APIs  have been gradually stripped over 2016 of the capabilities to extract any decent volume of data at once. So far, most users of those APIs and connectors willing to extract more than a handful of records per session have been facing a dreaded series of errors:

1. Marketo REST API Error code 606 (Rate limit): Marketo limited throughput of records extracted within 20 seconds to 100, and kills the connection if higher
2. Marketo REST API Error code 615 (Concurrency limit): Marketo limits the number of concurrent requests to their API to 10, limiting parallel processing
3. Marketo REST API Error code 6XX (Daily Quota): 10,000 API calls maximum per day.

Furthermore, to definitely discourage the use of Analytics outside of the Marketo moat,  authentication tokens are only valid for 60 minutes, killing any extraction job running >60 minutes. As an indication of scale, in my own experience in my org, one hour is what it takes to extract 6 days of website visits, that is just one Activity type, out of the 18, I had planned to work on. And last but not least, Marketo does not offer a DATE TO filter for transactions in that API, effectively preventing from running updates on data sets. For instance, for a daily refresh of 2016 transactions, a full update for all of 2016 data was required… I wrote WAS because, since June 2017, Marketo has finally released Bulk Extract APIs for People and Activities, with surprisingly little fanfare for a Marketing company…

This is great news, but Marketo has not suddenly turned into a land of milk and honey.  The new bulk approach is not as easy as one could expect and is still riddled with hoops to jump through. However, if you have read that far, you must already get why it’s a much bigger deal than Marketo makes it sound, (Campaigns Golden Path analysis anyone?) and you will be able by the end of this post to take immediate advantage of the new API. You will finally get your hands on stacks of your data , instead of just trickles.

Continue reading

Posted in Alteryx, Marketing, Tableau | Tagged , | 2 Comments

Word clouds with Alteryx and R

In a previous post, I explained how to build word clouds easily with a combination of Alteryx and Tableau. But what happens without Tableau? It turns out that Alteryx can leverage the power of existing R libraries to easily generate pretty nice word clouds too. I realized that potential after completing my previous post, thanks to the great work of Dan Magnus, who posted an Alteryx app you can use for free here, even without Alteryx.

Nevertheless, if you want more control, you should learn how to build your own workflow, and benefit of extensive controls on your output. Continue reading

Posted in Alteryx, Automation | Tagged , , | Leave a comment

Navigating Alteryx Maze of Tools

Alteryx is objectively an amazing tool to manipulate data and models, without coding (much). As I mentioned in earlier posts, that simplicity involves trade offs. One of those is the initial difficulty to apprehend the potential and capabilities of the solution. One can feel lost when facing the sheer number of tools available, 170 and counting, without even mentioning the 3rd party tools. Of course, a new user will start with a set of 10-20 tools, and will expand gradually, as she discovers new tools to address new use cases.

Well the great news is that there is a way to accelerate that process, and therefore the overall ramp up to proficiency. During Inspire 2017, Alteryx annual customer event, a map was released which comes really handy:

It is designed as a periodic table and helps discover tools, based on their popularity (the wisdom of crowds…), their use case or type.

Without further ado, here is the link where to download the full size PDF, at the bottom of the post:

The whole post is well worth a read; the talented creator gives the full story behind the inception and the execution…


Posted in Alteryx | Leave a comment

Turn text column reviews into a Tableau word cloud

After getting my hands on a data set containing a large number of user reviews in a column, I was trying to get an overall sense of those reviews with a quick glance. Visually, there are several options to get that perspective. We can start by words used ranked by use count:

But it takes way too much of the screen real estate. We could also use a nice tree map, which I am found of:

Nice and thorough, with each box sized in proportion of the use count of the word. Unfortunately, the average consumer of such analysis is struggling to interpret it, as this style of viz is probably not mainstream enough yet. It’s still way better than the dreaded pie chart, which is impossible to consume with more than 3 slices.
I figured this analysis could also be a great opportunity to use a Word Cloud, a fairly recent functionality which came with Tableau 8, which I find more accessible to users:

With one quick glance, one can clearly visualize that those reviews cover a German restaurant in Charlotte, NC, with a fairly positive tone.

Soon after I got started putting together the Word Cloud, I realized that Tableau offered little guidance on that topic. Word Cloud is not found in the Show Me tab for instance. I searched what fellow bloggers had to offer, and easily found some guidance, like here or there. These are all fine posts, with nice visual steps to follow, but they overlook a crucial point: they assume that the data set has already 1 word per column, which is highly unlikely to happen. It turns out that using a data source with sentences instead of single word, is a major road block to the use of Word Clouds in Tableau, as reflected in some help requests in the community. In order to feed Tableau with something workable, it requires to split the words of the reviews column. This operation implies breaking the granularity of the data source, which will expand the number of rows by the number of words found in each review.

There must be ways to code that conversion in Python, R, SQL or even PowerBI DAX. But as often for me, for a clean repeatable solution without dreaded copy + paste, whipped together in 10 minutes, I will use Alteryx!

Continue reading

Posted in Alteryx, Tableau | Tagged , , | 2 Comments