ACCIDENTAL BIG DATA (FOR REALS)

Bill Skelly

Data, DataWarehouse, USQL

February 8, 2017

If only we saw this data problem coming…

In December of 2011, I remember reaching out to O’Reilly Publishing with an idea I had for a book. Essentially, I was staring down the barrel of an enormous RNC Voter File and the 2012 Presidential Election and asking myself, “how the heck did we get here?” What I quickly realized was that we weren’t alone in our accidental encroachment into the world of Big Data. My idea for the book (which I had pitched as “Accidental Big Data”) was picked up but I never had the time in 2012 to actually write it (go figure).

Fast forward four years and despite having won the White House and having Causeway Solutions as the leading implementer of a Big-Data driven approach to voter targeting, I find myself in the exact same position as I was back in 2011.

By the end of this past year we were moving 9.5 BILLION records of predictive analytics into-and-out of a series of data warehouses and reporting infrastructures – all of which we were rapidly outgrowing. I sense that this is a common cycle.

Starting in 2017, Causeway Solutions is pivoting from our traditional RDBMS approach to the voter file warehousing into a Data Lake solution. We’re not saying goodbye to SQL Server altogether – in fact it will still be a major component of our solution. We are, however, moving the bulk of the underlying data into Azure Data Lake Analytics. We’re not alone in this migration – the tool is being adopted at a rapid pace by many warehousers – but we are on a trajectory that slightly outpaces existing documentation and analyses.

So… we’re going to create our own. And… we’re going to document it here.

At a minimum this will be a roadmap for the Causeway Solutions team as we migrate into a new technology in a bit of a vacuum. Hopefully it evolves into something that answers questions for others in the space.

And, who knows, maybe I’ll get to write my book after all.