Project Structuring for Analysts β Modular Blueprint
π― Purpose¶
This guidebook provides a clean, repeatable folder + file architecture for professional data analytics projects. It is designed to support workflows in SQL, Python, notebooks, and dashboards with audit-ready organization and documentation.
π§± 1. Recommended Folder Structure¶
project_name/
βββ π data/ # Raw, cleaned, or synthetic datasets
β βββ raw/
β βββ interim/
β βββ final/
βββ π notebooks/ # EDA, modeling, delivery-ready notebooks
βββ π scripts/ # Modular Python, SQL, or ETL logic
βββ π outputs/ # Model predictions, plots, summary exports
βββ π reports/ # PDFs, slides, executive summaries
βββ π dashboard/ # Looker Studio links, JSON configs, embed notes
βββ π docs/ # Markdown docs, guidebooks, config guides
βββ README.md # Project overview and setup instructions
βοΈ Add .gitkeep
files or .gitignore
rules to manage each folder
π 2. Notebook Workflow Expectations¶
Cell Group | Purpose |
---|---|
Imports | All packages, function definitions |
Load & Check | Pull in data, show schema, quick null check |
EDA | Visuals, describe(), value_counts() |
Cleaning | Missing handling, re-encoding, outlier flags |
Feature Engineering | New columns, transformations |
Modeling | Train/test split, metrics, model explanation |
Reporting | Plots, markdown summary, export/logging |
π 3. SQL Integration Tips¶
- Store production queries in
/scripts/sql/
- Use views or saved queries in BigQuery, referenced from Python
- Log assumptions in markdown or
.sql
doc headers
πΎ 4. Versioning & Exports¶
- Save
.csv
,.pkl
, or.json
into/outputs/
- Use
joblib
,pickle
, orfeather
to preserve modeling objects - Final dashboards or exports should be copied into
/reports/
π 5. README.md Template¶
# ποΈ Project: Churn Forecasting v1
## Overview
Predict user churn using event and usage logs from JanβMar 2024.
## Folder Guide
- `/data/raw/`: Original CSVs
- `/notebooks/`: One notebook per stage: EDA, modeling, presentation
- `/outputs/`: Cleaned datasets, predictions, plots
## Tools Used
- BigQuery, Pandas, Scikit-learn, Looker Studio
## Author
Garrett Schumacher β 2025
β Project Structuring Checklist¶
- [ ] Folder system initialized and committed
- [ ] README created with project summary
- [ ] Scripts modularized and stored in
/scripts/
- [ ] Notebook(s) follow standard cell progression
- [ ] Dashboard and reports saved to
/reports/
π‘ Tip¶
βA structured project isnβt just easier to share β itβs easier to revisit, debug, and scale.β