Many business dashboards track headcount figures, since it is quite a vital indicator for most organizations. Conceptually, headcount is a straight forward concept. Technically, it is actually very complex to render for analysis, as it should not be stored, but calculated for a set of circumstances: Time, employee role, location, gender,etc. The question is, how do we present this data? Many businesses will flock to a data visualization site, like this online histogram maker, to present data like this.
Headcount is indeed very close technically and conceptually to inventory management, with flows of employees coming in and out the organization, and the required snapshot of where it stands It can also be used to see if improvements can be brought in to help motivate employees (such as the https://www.blueboard.com/blog/top-16-experiential-rewards-to-promote-employee-wellness systems that have started to appear across many industries). Headcount computing can be also compared to the approach needed for ROI calculation, as addressed in one of my previous posts. At a higher level of abstraction, a very good read by Keith Helfrich is diving into the similarity of those data sets. However, whereas Keith is using Custom SQL and Table Calcs in Tableau to reshape the data, I will take the Alteryx route, which I find cleaner, easier to explain and document.
I can get to this very dynamic way to report on headcount:
Now, how to get there? Let’s start with the source data: a set of transactions records for each employee, that you can download from here, looks like this:
We need to reshape that data before we load it into Tableau, so that we can count at any point of time how many employees are active in the headcount:
The approach consists in processing the hires as positive additions to the headcount using these formulae for the inflow (=hires):
and creating additional new entries for the outflow (=terminations) of employees which will decrease the headcount only after the terminations occur:
After running the workflows, the data will look like this:
Note how the business date is using hire date for inflow, and termination date for outflow. Furthermore, the granularity of the transactions increases:
I start with 110 employee records, and add 22 terminations to end at 132 rows loaded into Tableau.
In Tableau, I need a couple of calculated fields to get the dynamic headcount and various additional relevant indicators:
Headcount = Running_Sum(sum([Flow]))
Hires/Arrivals = if [Flow Type]=’In’ then [Flow] END
Terminations/Departures = if [Flow Type]=’Out’ then -[Flow] END
Departures Voluntary = if [Flow Type]=’Out’ and [Termination Type]=’Voluntary’ then -[Flow] END
Turnover = sum([Departures])/[Headcount]
Tenure in Days = DATEDIFF(‘day’,[Hire Date],iif(isnull([Termination Date]),today(),[Termination Date]))/365
Tenure in Years = DATEDIFF(‘year’,[Hire Date],today())
Age = DATEDIFF(‘year’,[Birth Date],today())
Now there is an important additional modeling challenge to address: headcount being a Running Sum table calculation, it needs the full history of employee records since the beginning of the organization, or at least a starting point, which is most likely further in the past than your visualization would require. In other words, if I am reporting on Microsoft headcount, I do not necessarily want to display the headcount all the way back to 1975, the last 2 or 3 years are usually what is relevant to display. The challenge is that if I add a filter on Business Date for the last 3 years, my headcount will be wrong, counting only the flows happening over the last 3 years. Hence the need to create in Tableau an extra Reporting Date Filter filtering the Table Calcs without filtering Business Date:
Now if you have been following my posts, we are entering with this filter what I have been calling Tableau post wall territory in terms of complexity, so buckle up!
First things first, it is important to understand in which order the Tableau engine performs calculations. As Bora Beran explained in his blog, Table Calcs like Running_Sum are performed last:
That works well for our purpose, now which formula to use? That good old Lookup() Jedi Trick:
Reporting Date Filter = lookup(max(([Business Date])),0)
This is simple to implement, and will work really well for most purpose, like on the Headcount or the Headcount progress tabs above, because the data underlying is dense enough, that is you have at least one entry per business date month. However, should you need to display a data set not dense enough, such as headcount per location per quarter, as illustrated by the Locations Wrong tab above, all hell breaks lose, and instead of getting this:
You are instead getting this:
It took none other than Joe Mako himself to help me find the proper formula that does address this data densification issue:
Reporting Quarter Filter = DATE(DATEADD(‘quarter’,INDEX()-1,DATETRUNC(‘quarter’,#First business date in your dataset#)))
Make sure you check Show Missing Values from the Quarter drop down options. If you want to push further your understanding of data densification issues, beyond the purpose of this blog post and at the frontier of my abilities, I highly recommend this material put together by Joe, in Jedi Master territory. Thanks again Joe!