Skip to content

Scalability Audit

Last updated: 2026-03-05


PriorityCategoryIssuesStatus
P0Critical bottlenecks4✅ All fixed
P1Performance at 100+ teachers6✅ All fixed
P2Future optimization5Documented

Current safe scale: ~50 teachers / 2,500 students Target after fixes: 500+ teachers / 25,000+ students


  • Multi-tenancy with teacherId scoping on all tables
  • Circuit breakers + Redis-backed rate limiting for Google/Stripe APIs
  • Credit ledger idempotency via INSERT ON CONFLICT
  • Webhook idempotency (atomic dedup)
  • Soft-delete pattern with notDeleted() helper (7 tables)
  • Route lazy loading (all frontend routes)
  • Optimistic mutations for low-risk actions
  • BullMQ job queues with retry/backoff

P0-1: Missing Database Indexes — ✅ FIXED

Section titled “P0-1: Missing Database Indexes — ✅ FIXED”
TableIndex AddedQuery Pattern
contact_log(teacher_id, student_id, sent_at DESC)Student list last-contact aggregation
contact_log(teacher_id, sent_at DESC)Teacher contact history
availability_rules(teacher_id, is_active) WHERE is_active = trueSlot generation on every booking
sessions(teacher_id, status, starts_at, ends_at)Conflict detection for bookings
7 soft-delete tables(teacher_id) WHERE deleted_at IS NULLPartial indexes for active records

Migration: 0074_add_performance_indexes.sql — 11 indexes total. Schema files updated for contact_log, availability_rules, class_sessions.


P0-2: Double-Booking Race Condition — ✅ FIXED

Section titled “P0-2: Double-Booking Race Condition — ✅ FIXED”
Filesslot-engine.ts, session-service.ts
FixisSlotAvailable() uses SELECT ... FOR UPDATE SKIP LOCKED when called inside a transaction. 6 session-creation methods now wrap availability check + insert in db.transaction(): bookSession, bookMultipleSessions, createTrialSession, scheduleManualSession, rescheduleSession, createRecurringSessions.

P0-3: Slot Generation Unbounded Memory — ✅ FIXED

Section titled “P0-3: Slot Generation Unbounded Memory — ✅ FIXED”
Fileslot-engine.ts
FixMAX_SLOT_RANGE_DAYS = 14 — date range capped at top of getAvailableSlots(). Prevents 60K+ array elements in memory.

P0-4: Connection Pool Not Configured — ✅ FIXED

Section titled “P0-4: Connection Pool Not Configured — ✅ FIXED”
Filepackages/db/src/index.ts
Fixpostgres(connectionString, { max: 20, idle_timeout: 30, connect_timeout: 5 })

P1 — High (Performance at 100+ Teachers)

Section titled “P1 — High (Performance at 100+ Teachers)”

P1-1: Unbounded Queries Load All Rows — ✅ FIXED

Section titled “P1-1: Unbounded Queries Load All Rows — ✅ FIXED”
FileMethodFix
review-service.tsgetStats()Replaced in-memory aggregation with COUNT(*), AVG(rating), GROUP BY rating SQL
legal-document-service.tscreate()Replaced findMany + reduce with SELECT MAX(sort_order) SQL aggregate

P1-2: Batch Jobs Iterate All Teachers Sequentially — ✅ FIXED

Section titled “P1-2: Batch Jobs Iterate All Teachers Sequentially — ✅ FIXED”
WorkerPatternFix
lifecycle-detection-workerWas: sequential loop, concurrency: 1Dispatcher + per-teacher jobs, concurrency: 5
content-analytics-workerWas: sequential loop, concurrency: 1Dispatcher + per-teacher jobs, concurrency: 5
metric-alerts-workerWas: sequential loop, concurrency: 1Dispatcher + per-teacher jobs, concurrency: 5

Pattern: Cron triggers a dispatcher job (no teacherId) that queries all teachers and enqueues one job per teacher on the same queue. Per-teacher jobs run in parallel with concurrency: 5. Date-based jobId (worker-{teacherId}-{YYYY-MM-DD}) provides daily idempotency. Individual teacher errors throw (proper BullMQ failure tracking) instead of silently continuing.


P1-3: Dashboard Queries Without Cache — ✅ FIXED (frontend)

Section titled “P1-3: Dashboard Queries Without Cache — ✅ FIXED (frontend)”
FixAdded staleTime: 2min to dashboard summary, smart actions, pending sessions, and calendar queries. Settings query cached for 5 min.

Backend Redis cache remains a future optimization.


P1-4: Frontend Queries Missing staleTime — ✅ FIXED

Section titled “P1-4: Frontend Queries Missing staleTime — ✅ FIXED”
RouteQuerystaleTime Added
Studentsteacher-students2 min
Messagesteacher-students-messages2 min
Paymentsteacher-payments, teacher-payments-kpis1 min
Dashboardteacher-dashboard-summary, teacher-dashboard-smart-actions, teacher-sessions-pending, teacher-calendar2 min
Dashboardteacher-settings5 min
Paymentsteacher-settings5 min
Organizationtags5 min

Total: 11 useQuery hooks updated across 5 route files.


P1-5: Calendar Sync in Request Path — ✅ FIXED

Section titled “P1-5: Calendar Sync in Request Path — ✅ FIXED”
Filessession-service.ts, calendar-sync-worker.ts
FixqueueCalendarSync() marks syncStatus='pending' in session_calendar_sync and enqueues a BullMQ job. Calendar-sync worker (5-min cron) processes pending records in batches of 20. Session creation never blocks on Google Calendar API. Circuit breaker skips batch when google-calendar circuit is open.

10 call sites in session-service.ts use queueCalendarSync() for create/update/delete operations.


P1-6: Session Auth DB Lookup on Every Request — ✅ FIXED

Section titled “P1-6: Session Auth DB Lookup on Every Request — ✅ FIXED”
Filessession-service.ts, lib/redis.ts
FixCache-aside pattern: getSession() checks Redis (session:{id} key, 30s TTL) → falls back to PostgreSQL on miss → populates cache on hit. deleteSession() invalidates cache before DB delete. All Redis operations wrapped in try/catch — auth falls back to direct DB queries if Redis is unavailable. Shared ioredis singleton (lib/redis.ts) with lazy connect and exponential backoff.

#IssueNotes
P2-1No list virtualization (students, contacts, payments)Add @tanstack/react-virtual when lists exceed 100 items
P2-2i18n: 5,700 lines in single file, both languages loadedSplit into namespace files, lazy-load secondary language
P2-3Analytics queries without materialized viewsCreate retention_cohorts materialized view, refresh hourly
P2-4No BullMQ queue depth monitoringAdd Prometheus metrics + alerts on depth > threshold
P2-5Teachers table: 38KB avg JSONB per rowEnsure all queries use column selection, not SELECT *

MetricCurrent (~10)100 teachers1,000 teachers
Students~5005K50K
Sessions~5K50K500K
Contact logs~2K20K200K
Auth queries/sec~10~100~1,000 → Redis cached (P1-6 ✅)
Slot generation10ms30ms100ms (was 500ms)
Lifecycle job10s100s~34min with concurrency:5 (P1-2 ✅)