Over the years, I have been monitoring the analytics software market with a quite simple requirement: I am (still) looking for a tool to let me generate full data pipelines quickly from a browser with the least amount of code. As an analyst, I don’t have the luxury of time that developers and data scientists have, I need to crank out pipelines quickly and can’t waste my clients’ time futzing with a Command Line Interface. A nice UI makes the documentation of a complex process much easier. I also want the ability to look at the actual data at each stage while I develop a long process, without having to devise each time a fancy query just for testing. Last but not least, I need frictionless provisioning and deployment, à la Snowflake: just sign up on a webpage and start using without having to summon any AWS engineers nor other IT / Dev Ops rare resource. Believe it or not, that market is still up for grab, as I haven’t found a decent solution yet. All the products I tried have significant flaws. Here are a few, I have evaluated:
- Alteryx Designer Cloud
- Fivetran
- SnapLogic
- Dell Boomi
- DBT Cloud
- Matillion
- Denodo
- Paxata
- Pentaho
- Rivery
- Unifi
- Xplenty
- Lyftron
- Prophecy
- Getlore
- Domo Magic ETL
- Dataiku
- Tableau Prep
- Databricks Delta Lake
I would say the ones coming closer to that vision are Matillion, Dataiku and Domo in its recent iterations with Magic ETL. All the others fail, in their current state, by being too expensive, too limited or way too complex (looking at you Delta Lake). There is still room for new entrants as this market segment is far from being satisfied. Analysts are usually happy to pay within reason for solutions that boost their productivity.
I have been paying close attention to the solutions available, monitoring fairly easily the relevant ones going to market with Snowflake, through their Partner Connect awesome feature. Yet, I recently came across one of those new entrants, a solution that is Snowflake centric, but not yet present on Snowflake Partner Connect, called Datameer. It offers what I deem a compelling positioning statement:
Working from a browser without having to install any application, nor patches, that is refreshing. Now, is Datameer delivering on the promise? Read on for my test of an early release…
After inputting Snowflake credentials, Datameer offers a quite polished UI that lets you explore the Snowflake content you have access to. A nice feature lets you easily upload CSV files into tables, without having to create the DML code for table definition:
Nonetheless, I have not been able to complete the load of the Superstore 2mb CSV… No error message… I suspect the job does not make use of the Internal Stage, despite Snowflake’s guidance, and therefore is very slow… To give you an idea of the UI, I built a very simple workflow:
Step 1: Select a transaction table, Orders from Superstore and select relevant columns:
Step 2: Perform a join with a master data table with Sales Reps tied to the sale Region. Note the nice join suggestion feature, which nailed it in this case:
Step 3; Add a formula to calculate line item price, with instant result view is pretty slick:
Step 4: Publish the workflow to Snowflake, which actually means generate SQL code to create a View in Snowflake:
Now checking in Snowflake what happened:
As you can see above, the generated SQL with its CTEs is pretty clean to read, contrary to what Alteryx INDB outputs. How about a button to display that SQL directly in the Datameer UI? At that point, you are limited to generate only views, but Tables should come soon, which will require some type of scheduler, even though this could also be done using Snowflake Tasks. Hopefully, the scheduling functions will make it faster to manage dependencies than Tasks do.
Here is a comparison with Alteryx INDB, which can only be done using Alteryx Designer Windows client at that point:
Then you can take that code and create your own view in Snowflake:
I really like that contrary to Alteryx INDB, you can insert SQL code at any step of the workflow, useful to repurpose existing code. As it is to be expected at such early stage of the product, I noticed some missing features. I hope they will get implemented soon:
- Formula does not support the LIKE SQL command, only the REGEXP_LIKE. Why is that? Can’t this be passed straight to Snowflake ? LIKE is more analyst friendly than the powerful, but often confounding Regex commands.
- No UNION function that I can tell.. This is a real bummer as those Unions take time to input in SQL… Alteryx and Domo have slick solutions to automate those Union operations…
- I wish also there were simplified Pivot/Unpivot functions, which are so common and time consuming in SQL for data prep.
- Window functions deserve their tool as well to facilitate their manipulation, as they come handy so often in data prep.