Steps and Best Practices for Data Analysis
Data analysis is the process of exploring, cleaning, transforming, and modelling data with the ultimate goal of discovering useful information, drawing conclusions, and facilitating decision-making.
How To Approach Data Analysis Projects
Step #1: Identify the Problem to Solve
- What are the questions/objectives?
- What are the criteria for success?
Step #2: Obtain the Data
- Identify the correct data you need to solve your problem
- Obtain a copy of the data
- Store and use the data correctly
- Determine the appropriate tools to use to analyze the data
Step #3: Understand the Data
- Read any documentation that came with the dataset
- Verify the quality and completeness of the data
- Tell the story of one row to under the data
- Perform exploratory analysis
Step #4: Prepare the Data for Analysis
- Clean and format the data
- Determine the approach to be used
- Determine what tests or analysis will be done
- Create any new columns that are required for your analysis
Step #5: Analyze the Data
- Perform your tests and calculations to extract answers from your data
- Make notes of what you're doing (keep an audit trail)
- Repeat steps if necessary to answer your questions
- If needed, find other data to complete your analysis
Step #6: Repeat Steps 2-5 as Required to Answer Questions
Step #7: Present the Findings
- Summarize the conclusions and key findings from your data
- Choose the appropriate medium and level of detail for the presentation (e.g. slide deck or email summary or technical report)
- Decide if there are any follow up questions or changes required
- Make a plan of action to act on the data
Best Practices for Working with Spreadsheets
A. Keep a Copy of the Data
B. Organize Data
C. Add Index Columns For Sorting/Filtering
D. Distinguish Raw vs. Processed Data
E. Format & Freeze Headers and Top-Rows
F. Turn Formulas Back to Values
G. Create Named Ranges for the Dataset
H. Document the Steps You Take (Audit Trail)
The Onion Method For Complex Formulas in Spreadsheets ((..))
The Onion Method is a framework by which to approach hard formulas, and consists of these three elements:
- Put each new step of the formula in a new cell
- Label each step with a simple “Step 1”, “Step 2”, etc. in adjacent cells
- Change the background colour of each formula cell, so they can be easily found
Similarly, if you’re trying to understand complex formulas, peel the layers back until you reach the core (which is hopefully a function you understand!). Then, build it back up in steps to get back to the full formula.
Tall vs. Wide Data
The optimal shape is a tall format for the ease in which data is stored, analyzed, and reported on in spreadsheet software. In a "tall" dataset each variable has its own column, whereas in "wide" dataset a variable is measured over multiple columns.
By default DBs optimize storage of data as "tall" dataset. This is helpful as PivotTables, for example, prefer "tall" data prior to fanning out data. Note that chart tools or data visualizations typically prefer "wide" formats which a PivotTable can provide. In summary, consider importing "tall" formatted datasets into spreadsheet software to streamline steps #2, #3, and #4 in the data analysis process.
Unpivoting Wide Data
Stuck with "wide" format? Unpivoting data can be 'flattend' into columns with manual effort and/or code. Below are two helpful resources to accomplish this task:
Google Sheets
https://www.benlcollins.com/spreadsheets/unpivot-in-google-sheets/
MS Excel
https://chloe-po-wong.medium.com/convert-data-from-wide-to-long-format-in-excel-a94930ef088c
Make Spreadsheet Formulas More Readable
Line Breaks and Indentations
This is an example of QUERY() formula in GSheets. Note how writing the query on multiple lines within the cell makes it easier to read. The line space and blank spaces do not alter syntax therefore the output is identical. The result is the same whether the formula is written in long continuous line or with separate line breaks as shown below.

