Human-Readable ◉‿◉

Steps and Best Practices for Data Analysis

Data analysis is the process of exploring, cleaning, transforming, and modelling data with the ultimate goal of discovering useful information, drawing conclusions, and facilitating decision-making.

How To Approach Data Analysis Projects

Step #1: Identify the Problem to Solve

Step #2: Obtain the Data

Step #3: Understand the Data

Step #4: Prepare the Data for Analysis

Step #5: Analyze the Data

Step #6: Repeat Steps 2-5 as Required to Answer Questions

Step #7: Present the Findings


Best Practices for Working with Spreadsheets

A. Keep a Copy of the Data

B. Organize Data

C. Add Index Columns For Sorting/Filtering

D. Distinguish Raw vs. Processed Data

E. Format & Freeze Headers and Top-Rows

F. Turn Formulas Back to Values

G. Create Named Ranges for the Dataset

H. Document the Steps You Take (Audit Trail)

The Onion Method For Complex Formulas in Spreadsheets ((..))

The Onion Method is a framework by which to approach hard formulas, and consists of these three elements:

Similarly, if you’re trying to understand complex formulas, peel the layers back until you reach the core (which is hopefully a function you understand!). Then, build it back up in steps to get back to the full formula.

Tall vs. Wide Data

The optimal shape is a tall format for the ease in which data is stored, analyzed, and reported on in spreadsheet software. In a "tall" dataset each variable has its own column, whereas in "wide" dataset a variable is measured over multiple columns.

By default DBs optimize storage of data as "tall" dataset. This is helpful as PivotTables, for example, prefer "tall" data prior to fanning out data. Note that chart tools or data visualizations typically prefer "wide" formats which a PivotTable can provide. In summary, consider importing "tall" formatted datasets into spreadsheet software to streamline steps #2, #3, and #4 in the data analysis process.

Unpivoting Wide Data

Stuck with "wide" format? Unpivoting data can be 'flattend' into columns with manual effort and/or code. Below are two helpful resources to accomplish this task:

Google Sheets

https://www.benlcollins.com/spreadsheets/unpivot-in-google-sheets/

MS Excel

https://chloe-po-wong.medium.com/convert-data-from-wide-to-long-format-in-excel-a94930ef088c

Make Spreadsheet Formulas More Readable

Line Breaks and Indentations

This is an example of QUERY() formula in GSheets. Note how writing the query on multiple lines within the cell makes it easier to read. The line space and blank spaces do not alter syntax therefore the output is identical. The result is the same whether the formula is written in long continuous line or with separate line breaks as shown below.