Build a free and fussy free data pipeline

Are you someone that process the same data frequently, like weekly or monthly but lack the resources to purchases fancy software to do so. Read on if you’re on the same boat

Image for post
Image for post

How many of us find ourselves in this situation whereby:


If you tick most of the boxes, welcome to the club of “DIY NON IT Business Analyst” and let me guide you through the following: Prepare a FOC data pipeline, Easy to maintain and Understandable by most (hopefully the new generation are good with PowerQuery)

If these are your current workflow and data volume, the effort is worthwhile

Image for post
Image for post
Rough Plan of a data pipeline from Source data to Visualization

Source : Identifying the source of the main data you are using and standard the download format. In terms of the fields and file type, excel,csv or text as these might derail your future steps once you start automating the process.

Image for post
Image for post
Folders, Files[Data Pipeline] [Mini Database]
Image for post
Image for post
Monthly download files within the folder

Pipe it into folder — In this case, I have created a folder “3.PeopleTrackingMovement” and dump all the monthly data into the folder. For the next step, i created a new excel workbook / “Pipe”, “3.PeopleTrackingMovment”, and created a Query to read Folder 3.PeopleTrackingMovement”. Since the file format have been standardize, the query was created once and will clean the data and append it together on a monthly basis.

Image for post
Image for post
Data Models & Dependencies

Bring It together — Creating “Mini Database”, after creating the various “Pipe” from 1–4. In this example, “Consolidation” was created to handle more data wangling from [1] “1.AllTraining” & [3]“3.PeopleTrackingMovment”.
Within [3] “3.PeopleTrackingMovment” is a huge data volume, more than 1mil rows of data so is better to create another file to handle the load.
Some planning is needed here, as you need to know how this file is going to be used for a bigger visualization so at a later stage you don’t do much data waggling.

For this use case, we need to know throughout the years, who [3]have done what [1]training. After much review, these 2 data set isn’t used in other “pipe”

Big data models — At this stage, is either PowerBI or Excel Report, whereby you pull 1 or 2 “Mini Database” and a few “Pipe” to create your final report.

Bringing all the smaller data models to build your reporting or visualization is one of the final steps. As we tend to build many types of reports or visualization, it is easier to prepared smaller data models, read the data models, and build it from there. This way, you will not overload or performance issue on the visualization or reporting.

This is a free and fuss free method to prepare your data for constant uses. Especially if your department doesn’t have a dedicated IT personal or money to fund for specialise ETL software.

If you’re experiencing similar issue preparing data, feel free to drop me a message.

Because less is more, I love using free, simple and powerful tools (Excel, PowerQuery, PowerBI & Python). I do more with less and bring context to analysis :)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store