The Data Science Workflow

Florentino Bexiga

“When dealing with data, it helps to have a well defined workflow. Specifically, whether we want to perform an analysis with the sole intent of “telling the story” (Data Visualisation/Journalism) or build a system that relies on data to model a certain task (Data Mining), process matters. By defining a methodology in advance, teams are in sync and it is easier to avoid losing time trying to figure out what the next step should be. This enables a faster production of results and publication of materials.

With that in mind, and following the previous blogpost about the Ashley Madison leak data analysis, we saw an opportunity to show the workflow that we are currently using. This workflow is used not only to analyse data leaks (such as the case of AshMad), but also to analyse our own internal data. It is important to mention however, that this workflow is a work in progress, in the sense that can be subjected to changes over time in order to obtain results more effectively.”

