Introducing Decide
Decide automates data cleaning, to put it simply. If you would like to join our beta sign up here!
For anyone who has ever cleaned data, you know how painful it can be. A few months ago I was cleaning data for a personal project of mine. Long story short, it took me 3 weeks to clean this data. After processing the nightmare I went through, I decided (pun intended) that this is something I never want to experience again.
I searched for solutions, however, none of them really eased the burden of cleaning data. I found myself still having to manually clean the data, with the difference being that instead of coding I was just clicking a whole bunch of buttons.
So, I created Decide.
How does it work?
We clean your data in 3 easy steps. Below I illustrate how Decide works with a very dirty, but realistic, dataset I created based on different issues I have faced cleaning different data sets in the past.
Issues include:
- Null values, null values and more null values.
- Inaccurate data
- Inconsistent formatting
- Multiple variations of the same thing
- Duplicates
- Anomalous data
Step 1 Upload your data
Step 2 Label your data
Let Decide know what type of data your column should be. For example, “Ethnicity” should be a “Text” column, however the values do not need to be unique. Whereas “Email” should be a “Text” column that contains unique values.
Step 3 Label your relationships
Here is where the magic happens.
Step 4 Magic!
Ta-da! Your data is cleaned.
Step 5 Understand your data
Don’t fret.We recognise that the purpose of cleaning data isn’t solely to clean data, so we always show what happens behind the scenes.
Step 6 Just incase, you decide
In the instance that we are unable to clean all your data with 100% confidence, we flag the items in question for you to inspect.
In the case of the duplicate entry, as there is no other relevant data that will help us determine which is the correct entry, it is near impossible for anyone to figure out what the correct row is, so we let the you decide (haha, get it?).
And that’s it. A real easy and low effort way to clean data.