Getting Started
Data Flow Overview
The platform follows an event-driven architecture where data flows through multiple stages:
┌─────────────┐ ┌──────────┐ ┌──────────────┐ ┌─────────────┐
│ Producer │────▶│ Kafka │────▶│ Consumer │────▶│ Sentiment │
│ Service │ │ (Topics) │ │ Service │ │ Service │
└─────────────┘ └──────────┘ └──────────────┘ └─────────────┘
│ │ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ PostgreSQL │ │
│ │ + Redis │ │
│ └─────────────┘ │
│ │ │
▼ ▼ │
┌──────────┐ ┌──────────────┐ │
│ API │◀────│ Kafka Topic │◀────────────┘
│ Service │ │(processed...)│
└──────────┘ └──────────────┘
│
▼ (SSE)
┌──────────┐
│Dashboard │
└──────────┘Stage-by-Stage Description
1. Producer → Kafka (raw-comments)
What happens:- Producer generates mock comments every 100ms-10s
- Comments include text, source (Twitter/Instagram/etc), timestamp
- 5% are intentional duplicates to test deduplication
- Published to
raw-commentsKafka topic
- Kafka acts as a buffer between generation and processing
- Ensures no message loss even if consumer is down
- Allows multiple consumers to process same data
2. Consumer Receives Raw Comment
What happens:- Consumer subscribes to
raw-commentstopic - Receives comment message
- Begins processing pipeline
- Kafka consumer groups enable horizontal scaling
- Each message processed exactly once per group
3. Deduplication Check
What happens:- Check Redis using
commentIdas key (processed:{commentId}) - If found in Redis → comment already processed, discard and log
- If not found → mark as new and continue processing
- Prevents processing the same comment multiple times
- Redis persists across service restarts (3-hour TTL)
- Uses commentId to detect duplicate submissions
4. Sentiment Analysis (gRPC)
What happens:- Hash comment text (SHA256 of lowercase, trimmed text)
- Check LRU cache (100 entries) for cached sentiment result
- If cache hit → use cached tag, skip gRPC call
- If cache miss → call Sentiment service via gRPC
- Sentiment service classifies: positive/negative/neutral/unrelated
- Uses keyword matching (e.g., "amazing" → positive)
- Cache the result for future identical text
- 1 in 32 requests randomly fail to simulate errors
- LRU cache avoids redundant analysis for identical text
- gRPC is faster than REST for service-to-service
- Failure simulation tests retry mechanism
5. Save to Database
What happens:- Consumer stores comment in PostgreSQL
- Includes: text, textHash, tag, source, timestamps, consumerId, retry count
- Marks commentId as processed in Redis for deduplication
- PostgreSQL for persistent storage
- Enables querying by tag, source, date range
- Tracks retry attempts and consumer instance for monitoring
6. Publish to processed-comments
What happens:- Consumer publishes enriched comment to Kafka
- Topic:
processed-comments - Includes sentiment tag and processing metadata
- Decouples consumer from API
- Allows other services to consume processed data
- Event sourcing pattern
7. API Streams Processed Comments
What happens:- API service subscribes to
processed-comments - Receives processed comment from Kafka
- Comment is already in database (saved by Consumer)
- Broadcasts comment via SSE to connected dashboard clients
- API doesn't need direct database writes
- Real-time updates to dashboard via SSE
- Decoupled from consumer service
- Multiple API instances can stream same data
8. Dashboard Receives Updates
What happens:- Browser maintains SSE connection to API
- Receives
commentevents from the stream - Validates comment data with Zod schema
- Inserts into TanStack DB collection (auto-persists to localStorage)
- Components using
useLiveQueryautomatically re-render - Statistics computed on-the-fly from the collection
- SSE simpler than WebSockets for one-way updates
- TanStack DB provides reactive queries without manual state management
- localStorage persistence enables offline support
- Computing statistics from collection ensures data consistency
- No manual polling needed
Error Handling Flow
When sentiment analysis fails:
Consumer → Sentiment (fails)
↓
Retry with delay (1s, 2s, 4s, 8s, 16s)
↓
After 5 attempts → publish to dead-letter-queue
↓
Manual investigation requiredQuick Setup
# Install dependencies
pnpm install
# Start all services
pnpm docker:up
# View logs
pnpm docker:logs
# Access dashboard
open http://localhost:3000Verify Data Flow
- Check Producer - Logs should show "Published comment..."
- Check Consumer - Logs show "Processing comment..." and "Saved to database"
- Check API - Visit http://localhost:3001/api/statistics
- Check Dashboard - See real-time updates at http://localhost:3000
Next Steps
- Setup Guide - Detailed stack explanation
- Producer - How comments are generated
- Consumer - Processing pipeline details
- API & SSE - Real-time streaming