---
name: "SaaS Analytics & Reporting"
description: "Build analytics for SaaS: dashboard design, chart types (when to use each), data architecture (pre-aggregation, ClickHouse, materialized views), CSV/PDF export, embedded analytics (Metabase, Cube.js), cohort analysis, real-time dashboards, multi-tenant data isolation."
version: "1.0.0"
author: "skynet"
category: "dev"
agents: ["claude-code", "codex", "gemini"]
tags: ["saas", "analytics", "dashboards", "reporting", "metrics", "charts"]
---

# SaaS Analytics & Reporting

---
name: "SaaS Analytics & Reporting"
description: "Build analytics for SaaS: dashboard design, chart types (when to use each), data architecture (pre-aggregation, ClickHouse, materialized views), CSV/PDF export, embedded analytics (Metabase, Cube.js),"
version: "1.0.0"
author: "skynet"
category: "dev"
tags: ["saas", "analytics", "dashboards", "reporting", "metrics", "charts"]
---

# The SaaS Analytics & Reporting Playbook

Building analytics into a SaaS product is the transition from "providing a tool" to "providing a solution." This guide covers the architectural, design, and implementation patterns required to build a world-class analytics engine.

---

## 1. Dashboard Design & UX
A SaaS dashboard should move from **Summary** to **Insight** to **Action**.

*   **Metric Cards:** The "Hero" section. Top-level KPIs (e.g., MRR, Active Users) with comparison indicators (`+12% vs last month`).
*   **Time-Series Charts:** Primary area for trend analysis. Always include a date-range picker.
*   **Filter Bars:** Global filters (Date, Segment, Team, Region) should update all charts simultaneously. Use URL state for filter persistence.
*   **Progressive Loading:** Don't wait for all data. Load the dashboard skeleton, then fetch metric cards, then heavy charts.
*   **Drill-downs:** Clicking a chart element (e.g., a bar in a "Churn by Reason" chart) should take the user to a filtered table of those specific users.

---

## 2. Chart Types & Libraries
Choose the chart that minimizes "time to insight."

| Chart Type | Best Use Case | Avoid When... |
| :--- | :--- | :--- |
| **Line** | Trends over time (Revenue, DAU) | Comparing non-temporal categories |
| **Bar** | Comparisons (Revenue by Region) | You have >15 categories (use horizontal) |
| **Area** | Volume/Total buildup over time | Comparing many small fluctuations |
| **Donut** | Composition (Market share) | You have >4 segments (hard to read) |
| **Funnel** | Conversion steps (Onboarding flow) | Steps aren't sequential |
| **Heatmap** | Activity density (Feature usage by hour) | Data is sparse |

### Recommended Stack:
*   **UI Components:** [Tremor](https://www.tremor.so/) (React-based, built for dashboards), [shadcn/ui Charts](https://ui.shadcn.com/charts).
*   **Engines:** [Recharts](https://recharts.org/) (Composable), [Chart.js](https://www.chartjs.org/) (Simple/Fast), [D3.js](https://d3js.org/) (Custom/Complex).

---

## 3. Data Architecture: OLTP vs. OLAP
Never run complex analytics directly against your production transactional database (OLTP) in a high-traffic app.

*   **OLTP (Postgres/MySQL):** Optimized for row-level writes/updates. Bad at `SUM()` or `COUNT()` over millions of rows.
*   **OLAP (ClickHouse/StarRocks):** Optimized for columnar reads. Can aggregate billions of rows in milliseconds.
*   **Materialized Views:** Use for "Daily Aggregates." A background job calculates totals once per day and stores them in a separate table.
*   **Lambda Architecture:** 
    *   **Batch Layer:** Historical data (yesterday and older) in an OLAP/Aggregated table.
    *   **Speed Layer:** Today's data queried from the live OLTP table.
    *   **Merge:** Application logic combines both for the "Real-time" view.

---

## 4. Key SaaS Metrics (SQL Patterns)

### Net Revenue Retention (NRR)
The most important metric for growth.
```sql
-- Simplified NRR: (Starting MRR + Expansion - Contraction - Churn) / Starting MRR
SELECT 
  tenant_id,
  (SUM(mrr_start) + SUM(mrr_expansion) - SUM(mrr_churn)) / NULLIF(SUM(mrr_start), 0) as nrr
FROM revenue_log
WHERE report_month = '2024-03-01';
```

### Churn Rate (Logo Churn)
```sql
SELECT 
  (COUNT(DISTINCT CASE WHEN status = 'cancelled' AND month = '2024-03' THEN user_id END)::float / 
   COUNT(DISTINCT CASE WHEN month = '2024-02' THEN user_id END)) * 100 as churn_rate
FROM subscriptions;
```

---

## 5. Cohort Analysis
Understanding retention by "Signup Month."

**Implementation Pattern:**
1.  **Define the Cohort:** Group users by their `created_at` month.
2.  **Define the Activity:** Check if they had an active session in `Month + N`.
3.  **The Query:**
```sql
WITH user_cohorts AS (
    SELECT id, date_trunc('month', created_at) as cohort_month
    FROM users
),
active_months AS (
    SELECT DISTINCT user_id, date_trunc('month', activity_at) as activity_month
    FROM user_activity
)
SELECT 
    c.cohort_month,
    EXTRACT(MONTH FROM age(a.activity_month, c.cohort_month)) as month_number,
    COUNT(DISTINCT c.id) as active_users
FROM user_cohorts c
JOIN active_months a ON c.id = a.user_id
GROUP BY 1, 2;
```

---

## 6. Multi-tenancy & Security
Analytics must be "Tenant-Isolated."

*   **The Golden Rule:** Every analytics query *must* include a `tenant_id` filter in the `WHERE` clause.
*   **Row-Level Security (RLS):** In Postgres, use RLS to ensure a database user (or session variable) can only see data for their specific `tenant_id`.
*   **Shared vs. Isolated:**
    *   **Shared:** All tenants in one table (cheaper, easier to aggregate "global" benchmarks).
    *   **Isolated:** Database-per-tenant (highly secure, expensive, hard to update schema).

---

## 7. Export & Reporting
*   **CSV Streaming:** For large datasets, use Node.js streams or Python generators to pipe DB results directly to the HTTP response. Do not load 100k rows into memory.
*   **PDF Generation:** Use **Puppeteer** to render the actual dashboard UI to a PDF. It ensures the charts in the PDF look exactly like the ones on screen.
*   **Scheduled Emails:** Use a CRON job to trigger a worker that generates a summary and sends it via Postmark/SendGrid.

---

## 8. Embedded Analytics: Build vs. Buy
How much control do you need?

| Option | Pros | Cons |
| :--- | :--- | :--- |
| **Custom (D3/Recharts)** | Infinite control, perfect UI match. | High dev cost, long time-to-market. |
| **Headless BI (Cube.js)** | Handles caching, modeling, and security. | You still have to build the UI. |
| **Embedded (Metabase/Preset)** | Rapid setup, drag-and-drop for users. | Iframe styling is hard; "Look & Feel" is obvious. |
| **Product Analytics (PostHog)** | Autocapture, heatmaps, session recording. | Less focused on "Revenue/Billing" data. |

### The "Smarter" Build: Headless BI
Use **Cube.js**. It sits between your DB and your UI. It handles the SQL generation, caching (Redis), and multi-tenancy logic, while you build the charts using your own UI components (shadcn/Tremor).

---

## 9. Permissions (RBAC)
*   **Owner/Admin:** Full access to financial data (MRR, Churn, Billing).
*   **Manager:** Access to team-level performance and usage metrics.
*   **Standard User:** Access only to their own "Usage" stats or "Personal Productivity" dashboard.
*   **Reporting API:** Provide a Bearer Token for Enterprise customers to pull their data into their own BI tools (Tableau/PowerBI).

---

## 10. Summary Checklist for V1
1.  [ ] **Data Pipeline:** Move events to a separate table/DB.
2.  [ ] **Aggregation:** Set up a daily cron to calculate "Daily Totals."
3.  [ ] **API:** Build endpoints for the 4-5 core KPIs.
4.  [ ] **UI:** Implement a date-range picker and 3-4 primary charts.
5.  [ ] **Export:** Add a "Download CSV" button for raw data tables.
6.  [ ] **Security:** Verify that `tenant_id` is enforced at the query level.

