Various sources estimate that Excel has between 750 million and 1 billion users worldwide. Odds are that the vast majority of analysts start manipulating data with Excel. Beyond its ubiquity, and from my days of analytics at Microsoft, I have come to appreciate Excel for its versatility, for its expansion capabilities through plugins, and for a robustness that many software products could learn from. Let’s be clear, Excel is here to stay for many more years, with a deeply involved community.
I have been using Excel a lot, and I even keep writing on the topic, including in this blog. Yet, I find myself using Excel+ PowerPivot less and less. The first strike was when I started to use Tableau and realized that a stacked bar chart did not have to be so painful and time consuming to produce, as it still is with Excel. The second and near fatal strike came when I started using Alteryx, looking for a way to automate repetitive tasks, without tedious Visual Basic, and most important: without Copy & Paste.
If you have a process that requires Copy & Paste, you should know you have outgrown Excel.
As I moved along the experience curve using Alteryx, I discovered 9 additional benefits, less obvious, that I gathered and will detail in that post:
- Build Once, repeat automatically and reliably
- Modular design for sharing
- Change data type
- Obfuscate easily
- Identify what did not match in a Vlookup or an Index/Match
- Import many sheet
- Union of tables
- Reshape Data
- Increase Data Granularity
- Decrease Data Granularity
1. Build once, repeat automatically
That first benefit of switching to Alteryx is pretty self explanatory: develop workflow once, repeat at will. You can even schedule with the proper infrastructure, with either a scheduler license or a Alteryx Server. and you will also take care of the documentation along the way. Alteryx provides connection progress and count of records processed at every stage for reliable monitoring.
2. Modular design
The benefit of Alteryx workflows, formerly known as modules prior to V10, is that you can easily reuse and combine them, in a almost Object-Oriented way. You can also share datasets by writing one process’ output to a local database of the .yxdb type, and that dataset can then be used by one or many subsequent workflows. You can also use Alteryx Macros, as I demonstrated in that earlier post, to maintain centrally a process in a macro, to be shared with many workflows.
3. Change data type
First things first: by default, Excel cells don’t have a data type! Excel tries to smartly assign itEver had an issue converting an Excel cell from Text to Number or vice versa? a simple Google search shows over 13m pages… Not mentioning when cells from the same columns end up having different formats for mysterious reasons. With the Select tool in Alteryx, just set what the column should be, including a variable type, and you are done. You can even automatically optimize your fields data type as described in this post.
4. Obfuscate your data easily
Should you need to present your models outside your organization, some confidentiality requirements will most likely drive the need to mask some business indicators. This can be done with Excel, using formulas, most likely RANDBETWEEN(), but you will need to have the real data coexist in the spreadsheet, at least during the process, and you might miss scrubbing some parts. You’d be much better masking the data in Alteryx and load only the scrambled data in your model. A previous post describes how easily this can be done, and especially when it comes to repeat the process…
5. Identify what did not match in a Vlookup or an Index/Match
It is one thing to be able to map 2 tables properly in Excel, hopefully by using Index/Match, but another to identify what did not match, the “leftovers” from one side or from the other side. Why you should bother checking those rejects? At the design stage, or even in production, you should monitor the rejects to ensure they are matching what you would expect from what you know of your dataset. This is where you will collect undocumented / unknown cases. Excel would conceal those as it ignores the rejects. Quite unforgiving…
For illustration, suppose you do a lookup of your invoices against the Account Master Table to assign an Account Type. You should expect every account to have an Account Type:
But unbeknown to you, Account 7 is not found in the Account Table. This will be easy to spot in Alteryx when you run the workflow using the Join tool:
Whereas an Excel VLOOKUP will get you only 4 matches, circled in blue, Alteryx will draw your attention to the red figures. I would expect the 5 red unmatched, as not every account has an invoice, but the one red Left join, the Account 7 issue, is unexpected…
6. Import many sheet
If you ever had to bring data from many sources, such as a series of CSVs broken for size reasons, or by regions, you will recall the task as painful or deprived of any interest… And suppose you need to do it over every week with a fresh new dataset, you should promptly start to curse at Excel, and write some fancy Visual Basic script, which will keep breaking with every source file evolution. There too, Alteryx has you covered, with many different ways to automate the process, depending on your needs:
- http://community.alteryx.com/t5/Engine-Works-Blog/The-Ultimate-Alteryx-Holiday-gift-of-2015-Read-ALL-Excel-Macro/ba-p/9988This brings us to the next step: once you have smoothly imported those sheet, how to collapse them into one?
7. Union of tables
Correct me in the comments if I’m wrong, but the only way I know to stack / Union 2 tables in Excel, is to copy the second and paste it at the bottom of the first… Not exactly a desired approach. Even Tableau has spotted the gap and introduced with Version 9.3 a way to perform the Union in Tableau:
Alteryx has offered a no-brainer Union tool since its very early versions, which offers a variety of options:
8. Reshape Data
Let’s face it, most data you will obtain in Excel format, is not properly shaped for analysis, not even ready for an Excel Pivot, in most cases. For instance, it has subtotals, headers and more data irrelevant to most BI tools, which need to be scrapped. Furthermore, the table is usually wide instead of being tall, with less columns but more rows. A tall format offers much better performance for aggregation engines, which can offer totals or subtotals on the fly. Therefore, the whole process of preparing the typical Excel shape data into BI tool friendly format is called reshaping. Here is a good field guide published by Tableau, aware of the gap that most Excel users face. Tableau went as far as offering an excellent free addon for Excel to automate some of those tasks:
If you already use Alteryx, the process is even simpler using the Transpose tool:
9. Increase Data Granularity
Sometimes you can be faced with a situation where your dataset is too aggregated for your purpose. For illustration, I need to blend 2 data sources, one has daily transactions and the other one has monthly targets, and I need to present the attainment against target by week. The only way is to break down the target dataset into daily values, that is increasing the granularity from Month to Day, effectively increasing the number of rows by a factor of over 30.
That scenario can be fairly easily tackled and repeated using the Generate Rows tool, as demonstrated in this post. There are many of such padding scenarios that are very difficult to approach with Excel, and quite a breeze with Alteryx. On the flip side, a much more common scenario is to get rid of irrelevant details and Decrease Data granularity:
10. Decrease Data Granularity
Last but not least, there are many reasons to discard irrelevant level of details, for easier processing by users, for increased system processing performance by getting rid of the girth, or simply for storage optimization.
Why not let the front end BI tool take care of it, with their optimized engines, fancy columnar database and other creative features? Because it’s better to aggregate once upfront while in Alteryx, than tasking the front end tool and taxing its resource for each end user navigation step.
In Excel, the way to proceed is to build a pivot, after reshaping the data to a pivot friendly format, and copy & paste (sigh) the resulting pivot aggregated table into a separate tab. the Tableau Excel add in also assists that last stage with that function:
Not elegant, not reliable, not repeatable. A much better approach is to use Alteryx with the Summarize tool:
The tool offers every aggregating function an Excel pivot does and more, such as Count Distinct:
There are situations where calculations are not needed, such as picking a Min/Max/First/Last from a series of transactions, where the Summarize tool can be a bit tedious to configure, and where it becomes much more efficient to deploy my favorite tool combo: Sort + Unique:
If you sort properly your dataset, Unique tool will pick the first transaction from a series you configure, really easily, effectively aggregating your dataset by discarding the non relevant. The Alteryx engine will also process that operation with Sort+Unique much faster than the Summarize tool.
As a last note, if you remain stuck with Excel because you are not yet convinced you could boost your productivity and enrich your tasks, or because you can’t invest in an Alteryx license yet, do yourself a favor and make sure you are using the 64 bits (X64) version of Excel. If you are using the 32 bits, you are restricting Excel to access only 3 Gigs of your RAM, which is a pity when you manipulate data. the x64 has no such restrictions, and no downside besides the occasional lack of compatibility with a very outdated plug in. That is not the case of the Tableau Plug In which works fine on Excel 64. The license for Excel 32bits includes the x64 version, no extra cost. If you are unsure of the version you are running, go to File / Account and press the about Excel button, the top line will state whether it’s the 32 or 64 bits version.
If this post convinced you that your next step is to get started with Alteryx, and you are not sure where to start, here is some guidance:
<Addendum April 2016: A week after publishing, I stumbled upon a very good post by the talented folks over at the Information Lab, which covers almost the same topic. Worth reading for an additional take: