Data Learning 1 – Data Analysis Basics

If you want to start analyzing data in a more structured and repeatable way here are the basics I think you need to understand and practice.  

I’ve added links where I can, and will add more over time.

  1. Understanding Data in Context
    1. Dats, Information, Knowledge
      1. Data is any content you can store – numbers, text, pictures, etc
      2. Data becomes Information when you have context
        1. A list of numbers is useless if you don’t know what is being described
      3. When most people say “data” they are (hopefully) talking about information
      4. Information becomes knowledge through analysis
    2. What does a row mean?
      1. What does it mean in the real world?
      2. Is there variation over time, or is this at one point in time?
        1. Maybe it is an employee
        2. Maybe it is an employee on a monthly basis
    3. What does a column mean?
      1. Is that salary planned, or salary actually paid?  Did it included bonus?
    4. How well organized is the data?
      1. Are all the salary amounts in dollars or are some in euros?
      2. Are the employee names capitalized the same?
      3. Are department names sometimes abbreviated?
  2. Understanding Parts of a Data Set
    1. Dimensions
      1. Also called Columns, Characteristics, Etc
      2. These are data elements you can group by, sort, or filter on
      3. You (usually) can’t do math on these
      4. Examples:
        1. Color
        2. Name
        3. Country
        4. Zip Code (It looks like a number sometimes, but it doesn’t make sense to add them!)
    2. Measures
      1. Also called Metrics, Facts, Key Figures, Etc
      2. These are numeric things you should do math on, such as sum, average, maximum, etc
      3. You aggregate them based upon the dimensions you have chosen to display
      4. You can sort and filter on these, but you still call them measures
      5. There is always an implicit measure – Count of rows.
      6. Examples
        1. Salary
        2. Price
        3. Number of Employees
    3. Aggregations
      1. This is when you take a group of measures and calculate based upon them.
      2. Sum, average, maximum, minimum are common examples
  3. Understand Data Analysis Workflow
    1. Get some data
      1. Record where you got it!   
      2. Data lineage is critical.
    2. Prepare Data
      1. Have a good idea what you are looking at.  
        1. Remember – What does a row mean?
      2. Make sure the data you need is standardized – aka cleaned
        1. The same things are spelled the same
        2. Things like phone numbers are formatted the same
      3. Hint:  Don’t say “Normalized” – that means something different when talking about data.
    3. Keep Data and Analysis separate!!!
      1. Power BI and Tableau both *force* you to keep these separate
      2. Excel lets you mix these.  This is one reason analysis on Excel can be problematic.
    4. Start with a simple question
      1. What is the average salary per department?
      2. Who makes the most money?
    5. Analyze Prepared Data
      1. Use one of the tools mentioned below, or some other tool
    6. Ask someone to review your work!
      1. Always get another set of eyes on your work before you publish it widely. Maybe you made a mistake.  Maybe you labeled something in an unclear way.
    7. Repeat as needed.
      1. Get more data!
      2. Ask more questions!
  4. Understand Basic Data Manipulation
    1. Select
      1. What columns do you want to see for each row?
    2. Filter
      1. What values do you want to look at, or what values do you want to exclude?
      2. This will affect what rows you see.
    3. Choose your chart
      1. Table
      2. Bar chart
      3. Etc
    4. Aggregate
      1. How should you handle measures?
      2. Average salary per department?
      3. Total salary per department?
      4. Maximum salary per department?
    5. Sort
      1. What order do you want your results to display?
      2. What should you sort by?  Country name? Population?
      3. How should you sort? Largest to smallest?
  5. Learn an analysis tool
    1. Excel
      1. Most corporations have this
      2. Power Pivot and Power Query are great!
      3. YOU have to keep your data and your analysis separate.
        1. Separate workbooks
        2. Separate sheets
      4. Put field names in the top row!
      5. Links
        1. https://www.excel-easy.com/data-analysis.html
        2. https://www.coursera.org/learn/excel-data-analysis
    2. Tableau
      1. Free Version – Tableau Public
        1. If you post your results publicly, this is a free tool
        2. Unable to save to desktop
      2. Paid Version – Tableau Desktop
        1. $70 a month at time of writing
      3. Links
        1. https://www.youtube.com/watch?v=gWZtNdMko1k&list=PLWPirh4EWFpGXTBu8ldLZGJCUeTMBpJFK
        2. https://www.tableau.com/learn
    3. Power BI
      1. Free Version – Power BI Desktop
        1. Free to use and save on your desktop
        2. Free to save files and send to friends
      2. Paid Version – Power BI Pro
        1. Allows you to share and collaborate on the web
        2. $10 a month at time of writing
      3. Links
        1. https://powerbi.microsoft.com/en-us/desktop/
    4. Google Data Studio
      1. Free to use
      2. Links
        1. https://datastudio.google.com
  6. Start working!
    1. You can only really understand these concepts when you use them.
    2. Get data you care about and then re-read this outline.
    3. Ask questions!  
    4. Get answers!
    5. Ask more questions!

Remember: Make sure you know what a row means!  Is it a person? Ship? Doctor visit? Appointment?  

Tableau Specialist Study Guide

I want to get my certification as a Tableau Specialist. I am working on this with some co-workers so I have built a guide. I copied and pasted the subjects covered from the official guide to the exam, but added links for many of the sections and made it editable so I can track my progress.

I’ve posted it in Google Drive. Please copy it into your own drive, or pull it down as Excel if you want to use it to track your progress.

https://drive.google.com/open?id=1Hri3njMmkyJgk6pL9OIgYwrd94xxEPyp

Organizing Your Power Query Steps

Power Query is a great part of Power BI. Recently I found myself building long power query scripts in Power BI.   I wasn’t as methodical as I should have been and so the scripts got fairly messy.  I’ve developed the following workflow order to improve readability and maintainability in my data preparation.

All steps are done left to right…

1) Split any fields that need to be split.

2) Delete any fields that are not needed.

3) Rename fields as needed.

4) Replace any values as needed.

5) Convert any field types as needed.

6) Add any new columns at the right.  

Remember that fewer columns mean faster results.  Consider using a calculated measure instead.

Cat Herding Without a License

At Enterprise Data World 2019 in Boston I first presented “Cat Herding Without a License – Better Data and Report Development Habits Without Formal Authority” and since then I’ve presented it at Boston BI.

I cover things I wish I had known when I started working in business intelligence:

  1. How to build relationships to get more cooperation and collaboration.
  2. Why you need to learn data modeling and metadata basics – and what key concepts you need to know.
  3. How to use examples and screenshots to guide conversations, get action from developers and decision makers, and improve requirements.
  4. How best to use ticketing systems to capture all the work, what stage it is in, and build good team handoffs.

All of this will help you improve the software development lifecycle of data and reporting systems in your organization even if you don’t have “manager” in your title.

While I change the presentation as I find new material or find a new way to present it, I keep the most recent copy in Google Docs.

https://drive.google.com/open?id=1kf1lBqxpgEkIFw2F90Wgvz-aYCR_fKub6_GBbWSqebo