10 Tricks to adopt Redshift In-DB

If you are still manipulating data in Excel, why should you care about In-DB and Redshift? Whether you are accessing your Redshift data through Alteryx or through Tableau, knowing how to prepare your data within Redshift, with In-DB tool in Alteryx or with a prep query in the Tableau connector to Redshift, will give you access to another level of performance and flexibility. Even though it might sound to you like science fiction, it is not… These are just Redshift functions you can run through ODBC

Having a Database to read from and write to, is really nice when using Alteryx, even though you can make do without, with local .YXDB files.  The key benefit of a dedicated database is the ability to use In-DB tools. Working In-DB saves the time it takes to import and export data to and from the Alteryx engine memory, and leverages the power of the DB engine. Shifting an analyst’s data Field of Play to a Massively Parallel Processing (MPP) Database such as Redshift, brings additional performance improvements. I would deem those gains massive: I have seen 6 minutes Alteryx workflows run in 15 seconds In-DB.  Note that besides Redshift, the following databases are supported In-DB by Alteryx:

  • Cloudera Impala
  • Databricks
  • Hive
  • Microsoft Azure SQL Data Warehouse
  • Microsoft SQL Server
  • Oracle
  • Spark
  • Teradata

For all the wonders of the Alteryx data engine, which never seems to choke on anything sent its way for processing, it cannot really compete with the parallel processing of Redshift, when used In-DB. From an analyst perspective, there are some trade-offs for the additional performance, scale and efficiency. Main one is the steep initial learning curve, since it requires the adoption of a wholly different set of tools. Furthermore, you will quickly find that those tools have limitations, inherent to the underlying SQL operations generated through ODBC. The 10 tricks of this post will hopefully alleviate some of those difficulties.
Continue reading

Posted in Alteryx, Performance, Redshift, SQL, Tableau | Tagged , , , | 1 Comment

TC16 Tableau Conference Highlights

This year’s conference took place in Austin, TX. Yet, our Tableau friends must have felt right at home, as if in Seattle. The constant drizzle and low ceiling were certainly more reminiscent of a Pacific Northwest scenery. It took until Thursday at noon, tail end of the event, to catch a glimpse of sun:

austinrain
Continue reading

Posted in Tableau | Tagged | 3 Comments

Automate custom Tableau email updates

Like it or not, looking at you Slack, email remains the preferred way to disseminate information in the workplace. Volume is still growing, especially business emails volume growing at a cool steady annual rate of 7%. Is usage growing at the same pace? Meaning: are recipients actually reading those emails? That’s another issue…
Among your users, chances are that some are still heavily reliant, not to say tethered to email as they primary vehicle for sharing data. Even if you send a nice link to a perfectly crafted Tableau viz, they won’t click it, complaining it is too difficult to have to authenticate first, and wait for the browser to display, especially on a mobile device.

flying-emailsTherefore, there is still a case to be made for broadcasting data by email, as admitted very recently by Tableau. Starting with Tableau V10, you can finally subscribe on behalf of other users, whereas previous versions only let users subscribe for themselves. Tableau’s implementation is elegant and simple, as usual, but still comes with limitations:

  1. Each recipient must be licensed on the Tableau Server, which can get quite costly in a situation where you only need to send a sales updates to a large number of sales reps
  2. You are limited in terms of formatting options
  3. You cannot leverage existing Distribution Lists (DLs) maintained on your Exchange or other mail server

There is a better way, described in that post, that offers a much wider set of options, with a bit of R&D and very minimal coding. We will use a simple scenario: a large number of sales reps must receive a sales report by email at the start of their business day. This operation entails the following steps:

  1. Gather timely images of the reports out of your Tableau server, to be inserted in the email
  2.  Put together a script to assemble the email and send it to a Distribution List
  3. Automate and schedule the task (optional)

Continue reading

Posted in Automation, Marketing, Sales, Tableau | Tagged , | Leave a comment

How to Calculate Headcount dynamically from Employees Transaction Records

Many business dashboards track headcount figures, since it is quite a vital indicator for most organizations. Conceptually, headcount is a straight forward concept. Technically, it is actually very complex to render for analysis, as it should not be stored, but calculated for a set of circumstances: Time, employee role, location, gender, …

Headcount is indeed very close technically and conceptually to inventory management, with flows of employees coming in and out the organization, and the required snapshot of where it stands. Headcount computing can be also compared to the approach needed for ROI calculation, as addressed in one of my previous posts. At a higher level of abstraction, a very good read by Keith Helfrich is diving into the similarity of those data sets. However, whereas Keith is using Custom SQL and Table Calcs in Tableau to reshape the data, I will take the Alteryx route, which I find cleaner, easier to explain and document. Continue reading

Posted in Alteryx, Tableau | Tagged , , , | 15 Comments

How to address INDB Union Error With Alteryx and Redshift

When playing with Alteryx on a Amazon Redshift database with INDB tools, have you ever come up with the impenetrable error:

Error: Browse In-DB (36): Error SQLPrepare: [Amazon][RedShift ODBC] (30) Error occurred while trying to execute a query: ERROR: UNION types character varying and boolean cannot be matched

Workflow 1

Redshift is kindly trying to tell you that some fields of the same name are formatted as Boolean on one side, and as VARCHAR / V_WSTRING on the other side. Now you are left with your own devices to identify which fields, which can be fun when dealing with tables of 100s of fields, the specialty of Redshift… Add to that challenge that the INDB Select tool cannot present the Data Type of the fields. On top of it, a weird bug, most likely in the ODBC layer, is presenting Boolean fields in Redshift as V_String in Alteryx… Are you about to toss the computer out the window yet? Wait, there is a way out of that misery…
Continue reading

Posted in Alteryx, Redshift, SQL | Tagged , , | Leave a comment