Marketo’s core expertise is Marketing Automation, a software category it pioneered along with Eloqua, Pardot, Hubspot, Adobe and now 208 additional vendors and counting. Marketo still holds a solid market share, especially on the West Coast, especially in Tech companies. For analysts and executives who love data, Marketing Automation platforms offer an attractive perspective besides operations: they capture large amounts of information on the individuals who comprise customer and prospect organizations, from the details of the websites and other assets they consume, to the forms they populate, the emails they receive, open, forward and many more indicators. Marketing Analytics are a fast growing field part of the sprawling Martech industry, necessary to allocate Marketing resource and improve performance. Alas, Analytics with Marketo have been mostly a tantalizing promise until June 2017. If all this data is indeed stored on the Marketo instance, users who stay within the Marketo interface can query the Lead database only, and NOT their Activities DB, that is the details of the transactions in time stamped sequences of identified individuals. The analyst willing to go beyond those restrictions was left with two options: either invest in the Marketo costly analytics upsell solution (no blending of external data allowed!), or leverage the APIs to get the data his organization owns, out to a serious analytics platform of choice.
Marketo has been offering access to the data generated in those databases through a SOAP API, then a REST API, better suited to extract volumes of transactions. The current version of the Marketo connectors supplied out of the box by Alteryx, as of V11, and I believe by Tableau as well, are still using those transactional REST APIs, which are really meant by Marketo to be used to sync with a CRM, and not for analytics. As a result, as I mentioned in the comments for my previous post on Marketo APIs, the traditional REST APIs have been gradually stripped over 2016 of the capabilities to extract any decent volume of data at once. So far, most users of those APIs and connectors willing to extract more than a handful of records per session have been facing a dreaded series of errors:
1. Marketo REST API Error code 606 (Rate limit): Marketo limited throughput of records extracted within 20 seconds to 100, and kills the connection if higher
2. Marketo REST API Error code 615 (Concurrency limit): Marketo limits the number of concurrent requests to their API to 10, limiting parallel processing
3. Marketo REST API Error code 6XX (Daily Quota): 10,000 API calls maximum per day.
Furthermore, to definitely discourage the use of Analytics outside of the Marketo moat, authentication tokens are only valid for 60 minutes, killing any extraction job running >60 minutes. As an indication of scale, in my own experience in my org, one hour is what it takes to extract 6 days of website visits, that is just one Activity type, out of the 18, I had planned to work on. And last but not least, Marketo does not offer a DATE TO filter for transactions in that API, effectively preventing from running updates on data sets. For instance, for a daily refresh of 2016 transactions, a full update for all of 2016 data was required… I wrote WAS because, since June 2017, Marketo has finally released Bulk Extract APIs for People and Activities, with surprisingly little fanfare for a Marketing company…
This is great news, but Marketo has not suddenly turned into a land of milk and honey. The new bulk approach is not as easy as one could expect and is still riddled with hoops to jump through. However, if you have read that far, you must already get why it’s a much bigger deal than Marketo makes it sound, (Campaigns Golden Path analysis anyone?) and you will be able by the end of this post to take immediate advantage of the new API. You will finally get your hands on stacks of your data , instead of just trickles.
What I share here today, is a set of macro tools to operate immediately that Bulk API, before vendors catch up and release more polished connectors, for a more streamlined experience.
That Alteryx workflow I am publishing is actually a set of building blocks you can combine and re-purpose to build the perfect logic and serve your custom needs. They offer the additional benefit of being educational about the way those APIs operate. I remain open to technical suggestions to combine them into a single workflow without interruptions, as I am stumbling upon the detection of the Bulk job completion, between Steps 2 and 3.
Let’s start with the restrictions left with the Bulk API, that Marketo now documents in a clear way:
1. Max Concurrent Export Jobs: 2
2. Max Queued Export Jobs (inclusive of currently exporting jobs): 10
3. File Retention Period: 10 days
4. Max Size of Daily Export: 500MB
5. and that one hurts the most: Max Time Span for Date Range Filter (createdAt or updatedAt): 30 days
And of course, the authentication token is still restricted to 60 minutes, but since scheduling a bulk job takes no time, this restriction becomes moot.
First, let’s start adjusting the tools to operate with your instance of Marketo. if you haven’t done so already for other tools, create a custom API service by following these instructions. You should then obtain a Client Id as well as a Client Secret, looking like this:
We will then need the Endpoint URL, which drives the API to the proper instance, following those instructions, you will get a URL looking like that:
Here are the instructions provided by Alteryx for the same gathering of connection settings:
URL: The Marketo REST Instance to retrieve records from. This information is found in Marketo (Admin > Integration > Web Services > REST API).
Client ID: Client ID for the API Role user. This information can be found in Marketo (Admin > Integration > LaunchPoint).
Client Secret: Client Secret for the API Role user. This information can be found in Marketo (Admin > Integration > LaunchPoint).
Equipped with the information, open the Refresh Authorizations Macro, located on the top of the Step 0, this way with a right click:
and go edit the Marketo credentials, by inputting your own URL and URL Auth which will comprise of the 3 identifiers. Then save and close the macro.
Generate List of Leads Attributes and Activities
Now that you have the authorizations covered, which was the hardest part, really, you can Disable the Containers for Steps 1,2,3 and launch the workflow. You will obtain a list of the Leads attributes as configured in your Marketo instance, looking like this:
and a list of Activities looking like this:
Think of those as the menus du jour of what’s available to extract, with the IDs and the corresponding REST_Name you will need to direct the APIs. Feel free to copy them to Excel or anywhere you see fit, and you can disable the Step 0 container, as you are not likely to need to run those workflows again, they are quite static.
As explained by Marketo, getting your data in Bulk is then a 3 steps process:
1. Create the Job
2. Queue the Job
3. Download data when job has completed
Step 1: Start Jobs
Note that all the dates you input in those macros assume a time of Midnight in the Pacific time Zone (PST). Should you need a different setting, open the macro and edit the Dates Formatting formula to add or remove hours.
Input DateFrom and DateTo as well with no more than 30 days in between, and the Activities you need using their id as per the list generated above, not quoted, separated by a comma.:
1,2 in the illustration above gets all the webpages visited, as well as all the forms filled out. Overall, I advise to think big, but start small, with a test of 1 activity and 1 day, to see if the data looks like what you expect, before you blow through your daily quota… Note that those macros will also enqueue the jobs to run directly. You can then disable container Step 1 to run Step 2
Step 2: Monitor Jobs completion
This macro will populate a list of all the bulk jobs that have run in your Marketo entity sorted by most recent Creation Date:
If that list grows too long over time, it is very easy to edit that macro and insert filters that will get you the relevant information. You might need to run it a few times before you obtain the Status: Completed you need for the next step: downloading the data locally. Before you do, click on the exportId value of the job(s) you want to download in the job list, and Copy Selected Cells without Headers:
Step 3: Collect CSV Files
You will paste the job ExportId you just copied from the list into the settings of either the Leads or the Activities download.
Besides the ExportId the API requires, you can input a specific File Name and a DateFrom + DateTo, which will be used to name the export file, which is written by default in the same folder the macros are saved:
If needed, you can of course edit this approach to file naming to fit your needs, by opening the macro and find the code in the Create Processing Request formula. Note also that the macro is using the UpdatedAt time filter rather than the CreatedAt, as this seems much more relevant to match a specific time frame of activities, but you can also edit that pretty easily, should you need to.
Upon running those macros, once the jobs attained the Completed Status, you will get swiftly a CSV file on your hard drive. Each of those files will comprise a id column, which represents the Lead id and that you will use to Join the files. If you need to use over 30 days of data, several files over different date selections will be really easy to Union, if you keep the other settings consistent. Notes that the last column of the CSV flat file, called attributes, is formatted in JSON, since the number of attributes vary with each Activity Type. Whenever you need to use the data contained in those flat files, just insert a JSON Parse tool, a very easy task.
Last but not least, all those macros are built to request an authorization token each time they run, so you never have to worry about those pesky 3600 seconds time outs, as experienced in the current versions of the connectors which all leverage the standard Marketo REST API…