Field width is not a very sexy topic compared to Visualization or Spark R. That’s a very good reason to get that basic topic out of the way with an approach that will save you time to spend on much more exciting topics. This post is all about efficiency!
It’s accepted that searching a DB table on an integer column is faster than on a string column, especially a wide one. As Interworks puts it succinctly in their Tableau Performance Checklist:
“Number and Boolean > date > string calculations when it comes to performance.”
What are the common situations where the problem is likely to happen?
- you might have got your hands on a CSV flat file which has lost all of its formatting, defaulting to Strings
- You might have to extract a table from a source system that is less than optimized, with fields given way too much real estate.
I will illustrate with the second situation example, extracting a standard Campaign table from SFDC, which has been appended with custom fields:
In that example, storing, transforming and querying the 4 digits of the Fiscal Year through a 255 Char String is highly inefficient. We also see that Vendor Sponsored is most likely a question obtaining a True/False response, where a Boolean format would be more efficient.
The ideal solution would be to obtain an adjustment in the source system, but it is not always practical, and most certainly will not be timely. You could also probably ignore this issue and press on, since modern solutions like Alteryx and Tableau are quite forgiving. But why pay a performance tax when this can be addressed on the spot in a couple of clicks?
- Add the two tools to your data import in this specific order:
- Here is the Before picture displayed by the Select tool:
- You should run the full extract in your workflow or just a sample large enough if you are in a hurry by using the Record Limit feature of the Connector:
- Once the workflow has run, the Auto Field has run through all the submitted records and picked the best field format, as reflected in the Select view for the After picture:
- Now you might be tempted to remove the Auto Field, as it is unnecessary for it to run each time the workflow runs, taxing performance:
If you were to do that right away, the fields would reset themselves at the next run, not a desirable outcome… Before you delete the Auto Field tool, you need to take the following extra steps
- Go to Select and tweak the results to an optimal, as you might have information Auto Field does not have:
- Then Save the optimized field configuration:
- You can now proceed to delete the Auto Field and reinstate the Field Configuration in Select:
That’s all there is to it!