Updated: Sep 24
Quality data is the foundation of data analysis. As an analyst, I always had the urge to start analyzing once I have an idea or a question pop up in my head. It is fairly easy to obtain raw data these days but a lot of data cannot be used as it exists. There are three main problems from my own experience. One is that data entry is rarely consistent within an organization. It doesn’t matter how sound your analysis is, your result is only as good as your input (Garbage In, Garbage Out). These inconsistencies can be reduced by limiting free text for data entry or requiring specific formats before an entry is accepted by the system.
Here are 7 MUST KNOW Excel Functions for data preparation:
LEFT – this function returns the specified number of characters from the left side of the string. For example =LEFT(“Apple”, 2) returns “Ap”.
RIGHT – this function returns the specified number of characters from the right side of the string. For example =RIGHT(“Apple”, 3) returns “le”
UPPER – this function returns a string in all uppercase letters
LOWER – this function returns a string in all lowercase letters
MID – this function returns the characters from the middle of a string, given a starting position and length. For example =MID(“Apple”, 2, 3) returns “ppl”
LEN – this function returns the number of characters in a string
CONCATENATE – this function combines two or more strings into one. For example =CONCATENATE(“Ap”, ”p”, ”le”) returns Apple. The symbol & can also combine strings. For example = “Ap”&“ple” will also return Apple
A second problem is that outliers are often found in data and they skew the results if you are looking for patterns. While you would take out outliers in the analytical models, the outliers themselves have valuable information. Outliers are often clues to the risks and opportunities of the business. Another tier of research and analysis may unveil useful information. When looking at data, consider both the accuracy and precision of the data. Taking out outliers will make your dataset more precise but not necessarily more accurate.
This leads us to a third concern when looking at data. Sometimes data itself doesn’t necessarily mean what you think it means. It is important to interview Subject Matter Experts (SMEs) and field personnel to confirm how the raw data came about and that it accurately reflects what is happening on the ground.
Even though you may be excited about doing analysis. I have often found data cleaning taking up a bulk of my efforts. You may be able to impress managers and executives with fancy analysis and graphs without doing due diligence on data but it is critical if you actually want to provide valuable insights into the business. I was once told that the person who knows how to do something always works for the person who knows why they are doing something. Don’t just go through the motions of analyzing data but ask deep questions about the business before and throughout the process.