← Table of Contents
Lesson 01

Ingesting Data

Normally we start with sourcing data. This can come from a variety of places. Actively generated through actions by a user on an application. Someone manually adding records to a spreadsheet. Sometimes passively like sensors publishing a constant stream of events. Listeners tracking every user interaction. Regardless of where, it usually comes in messy.

And it’s our job to wrangle it into a database.

So how will we do this?

First we look at the data. Where do you see the logical separators. Where is the data duplicated? What fields or columns make sense being grouped together? Are any values missing? Any nulls?

The fancy word for this is Normalization and Data Cleaning.

The adage is “garbage in, garbage out”. If our data is trash, any outputs we derive from it are trash.

So how will we actually do this?

Well we’re gonna roll up our sleeves and write some code.

We’ll first read the raw data.

Then we’ll create the structure of our database.

Then the real magic and we have to use our programming chops to parse the data into the tables we created.

“But my boss said that AI will replace programmers”

Okay?

I’m getting on a soap box

This is a free online course. No one telling you to take it.

I’ve seen a deterioration of the fundamentals in the industry and it brings me rage.

If you want to give this a go and try to build a generation of software developers better than the ones that came before them. I’ll see you in the next lesson. If you don’t, then please save your time.