Data Learning 1 – Data Analysis Basics

If you want to start analyzing data in a more structured and repeatable way here are the basics I think you need to understand and practice.  

I’ve added links where I can, and will add more over time.

  1. Understanding Data in Context
    1. Dats, Information, Knowledge
      1. Data is any content you can store – numbers, text, pictures, etc
      2. Data becomes Information when you have context
        1. A list of numbers is useless if you don’t know what is being described
      3. When most people say “data” they are (hopefully) talking about information
      4. Information becomes knowledge through analysis
    2. What does a row mean?
      1. What does it mean in the real world?
      2. Is there variation over time, or is this at one point in time?
        1. Maybe it is an employee
        2. Maybe it is an employee on a monthly basis
    3. What does a column mean?
      1. Is that salary planned, or salary actually paid?  Did it included bonus?
    4. How well organized is the data?
      1. Are all the salary amounts in dollars or are some in euros?
      2. Are the employee names capitalized the same?
      3. Are department names sometimes abbreviated?
  2. Understanding Parts of a Data Set
    1. Dimensions
      1. Also called Columns, Characteristics, Etc
      2. These are data elements you can group by, sort, or filter on
      3. You (usually) can’t do math on these
      4. Examples:
        1. Color
        2. Name
        3. Country
        4. Zip Code (It looks like a number sometimes, but it doesn’t make sense to add them!)
    2. Measures
      1. Also called Metrics, Facts, Key Figures, Etc
      2. These are numeric things you should do math on, such as sum, average, maximum, etc
      3. You aggregate them based upon the dimensions you have chosen to display
      4. You can sort and filter on these, but you still call them measures
      5. There is always an implicit measure – Count of rows.
      6. Examples
        1. Salary
        2. Price
        3. Number of Employees
    3. Aggregations
      1. This is when you take a group of measures and calculate based upon them.
      2. Sum, average, maximum, minimum are common examples
  3. Understand Data Analysis Workflow
    1. Get some data
      1. Record where you got it!   
      2. Data lineage is critical.
    2. Prepare Data
      1. Have a good idea what you are looking at.  
        1. Remember – What does a row mean?
      2. Make sure the data you need is standardized – aka cleaned
        1. The same things are spelled the same
        2. Things like phone numbers are formatted the same
      3. Hint:  Don’t say “Normalized” – that means something different when talking about data.
    3. Keep Data and Analysis separate!!!
      1. Power BI and Tableau both *force* you to keep these separate
      2. Excel lets you mix these.  This is one reason analysis on Excel can be problematic.
    4. Start with a simple question
      1. What is the average salary per department?
      2. Who makes the most money?
    5. Analyze Prepared Data
      1. Use one of the tools mentioned below, or some other tool
    6. Ask someone to review your work!
      1. Always get another set of eyes on your work before you publish it widely. Maybe you made a mistake.  Maybe you labeled something in an unclear way.
    7. Repeat as needed.
      1. Get more data!
      2. Ask more questions!
  4. Understand Basic Data Manipulation
    1. Select
      1. What columns do you want to see for each row?
    2. Filter
      1. What values do you want to look at, or what values do you want to exclude?
      2. This will affect what rows you see.
    3. Choose your chart
      1. Table
      2. Bar chart
      3. Etc
    4. Aggregate
      1. How should you handle measures?
      2. Average salary per department?
      3. Total salary per department?
      4. Maximum salary per department?
    5. Sort
      1. What order do you want your results to display?
      2. What should you sort by?  Country name? Population?
      3. How should you sort? Largest to smallest?
  5. Learn an analysis tool
    1. Excel
      1. Most corporations have this
      2. Power Pivot and Power Query are great!
      3. YOU have to keep your data and your analysis separate.
        1. Separate workbooks
        2. Separate sheets
      4. Put field names in the top row!
      5. Links
        1. https://www.excel-easy.com/data-analysis.html
        2. https://www.coursera.org/learn/excel-data-analysis
    2. Tableau
      1. Free Version – Tableau Public
        1. If you post your results publicly, this is a free tool
        2. Unable to save to desktop
      2. Paid Version – Tableau Desktop
        1. $70 a month at time of writing
      3. Links
        1. https://www.youtube.com/watch?v=gWZtNdMko1k&list=PLWPirh4EWFpGXTBu8ldLZGJCUeTMBpJFK
        2. https://www.tableau.com/learn
    3. Power BI
      1. Free Version – Power BI Desktop
        1. Free to use and save on your desktop
        2. Free to save files and send to friends
      2. Paid Version – Power BI Pro
        1. Allows you to share and collaborate on the web
        2. $10 a month at time of writing
      3. Links
        1. https://powerbi.microsoft.com/en-us/desktop/
    4. Google Data Studio
      1. Free to use
      2. Links
        1. https://datastudio.google.com
  6. Start working!
    1. You can only really understand these concepts when you use them.
    2. Get data you care about and then re-read this outline.
    3. Ask questions!  
    4. Get answers!
    5. Ask more questions!

Remember: Make sure you know what a row means!  Is it a person? Ship? Doctor visit? Appointment?  

Leave a comment