If you want to start analyzing data in a more structured and repeatable way here are the basics I think you need to understand and practice.
I’ve added links where I can, and will add more over time.
- Understanding Data in Context
- Dats, Information, Knowledge
- Data is any content you can store – numbers, text, pictures, etc
- Data becomes Information when you have context
- A list of numbers is useless if you don’t know what is being described
- When most people say “data” they are (hopefully) talking about information
- Information becomes knowledge through analysis
- What does a row mean?
- What does it mean in the real world?
- Is there variation over time, or is this at one point in time?
- Maybe it is an employee
- Maybe it is an employee on a monthly basis
- What does a column mean?
- Is that salary planned, or salary actually paid? Did it included bonus?
- How well organized is the data?
- Are all the salary amounts in dollars or are some in euros?
- Are the employee names capitalized the same?
- Are department names sometimes abbreviated?
- Dats, Information, Knowledge
- Understanding Parts of a Data Set
- Dimensions
- Also called Columns, Characteristics, Etc
- These are data elements you can group by, sort, or filter on
- You (usually) can’t do math on these
- Examples:
- Color
- Name
- Country
- Zip Code (It looks like a number sometimes, but it doesn’t make sense to add them!)
- Measures
- Also called Metrics, Facts, Key Figures, Etc
- These are numeric things you should do math on, such as sum, average, maximum, etc
- You aggregate them based upon the dimensions you have chosen to display
- You can sort and filter on these, but you still call them measures
- There is always an implicit measure – Count of rows.
- Examples
- Salary
- Price
- Number of Employees
- Aggregations
- This is when you take a group of measures and calculate based upon them.
- Sum, average, maximum, minimum are common examples
- Dimensions
- Understand Data Analysis Workflow
- Get some data
- Record where you got it!
- Data lineage is critical.
- Prepare Data
- Have a good idea what you are looking at.
- Remember – What does a row mean?
- Make sure the data you need is standardized – aka cleaned
- The same things are spelled the same
- Things like phone numbers are formatted the same
- Hint: Don’t say “Normalized” – that means something different when talking about data.
- Have a good idea what you are looking at.
- Keep Data and Analysis separate!!!
- Power BI and Tableau both *force* you to keep these separate
- Excel lets you mix these. This is one reason analysis on Excel can be problematic.
- Start with a simple question
- What is the average salary per department?
- Who makes the most money?
- Analyze Prepared Data
- Use one of the tools mentioned below, or some other tool
- Ask someone to review your work!
- Always get another set of eyes on your work before you publish it widely. Maybe you made a mistake. Maybe you labeled something in an unclear way.
- Repeat as needed.
- Get more data!
- Ask more questions!
- Get some data
- Understand Basic Data Manipulation
- Select
- What columns do you want to see for each row?
- Filter
- What values do you want to look at, or what values do you want to exclude?
- This will affect what rows you see.
- Choose your chart
- Table
- Bar chart
- Etc
- Aggregate
- How should you handle measures?
- Average salary per department?
- Total salary per department?
- Maximum salary per department?
- Sort
- What order do you want your results to display?
- What should you sort by? Country name? Population?
- How should you sort? Largest to smallest?
- Select
- Learn an analysis tool
- Excel
- Most corporations have this
- Power Pivot and Power Query are great!
- YOU have to keep your data and your analysis separate.
- Separate workbooks
- Separate sheets
- Put field names in the top row!
- Links
- Tableau
- Free Version – Tableau Public
- If you post your results publicly, this is a free tool
- Unable to save to desktop
- Paid Version – Tableau Desktop
- $70 a month at time of writing
- Links
- Free Version – Tableau Public
- Power BI
- Free Version – Power BI Desktop
- Free to use and save on your desktop
- Free to save files and send to friends
- Paid Version – Power BI Pro
- Allows you to share and collaborate on the web
- $10 a month at time of writing
- Links
- Free Version – Power BI Desktop
- Google Data Studio
- Free to use
- Links
- Excel
- Start working!
- You can only really understand these concepts when you use them.
- Get data you care about and then re-read this outline.
- Ask questions!
- Get answers!
- Ask more questions!
Remember: Make sure you know what a row means! Is it a person? Ship? Doctor visit? Appointment?
