Stream Processing
Definition
Continuously processing data records as they arrive in real time, rather than storing them first and processing in bulk.
Use Cases
- Uber: Processing live trip events, driver locations, and rider demand signals to support ETAs, matching, and marketplace monitoring. — Uber has publicly described using Apache Kafka as a central event streaming platform to move high-volume operational data between services in near real time, enabling downstream analytics and operational systems to react quickly. (Improved ability to make real-time marketplace decisions, support operational visibility, and power time-sensitive rider and driver experiences.)
- Netflix: Monitoring application events and operational telemetry in real time to detect issues and understand user activity. — Netflix has publicly shared its use of Apache Kafka and stream-oriented data pipelines for moving large volumes of event data across services for near-real-time analytics and observability. (Faster detection of operational problems, better visibility into platform behavior, and quicker response to incidents affecting streaming users.)
- LinkedIn: Handling activity streams, log data, and event-driven analytics across a large distributed platform. — LinkedIn originally developed Apache Kafka to publish and consume streams of records across many systems, allowing applications and analytics platforms to process events continuously. (Reliable large-scale event distribution, lower latency for data availability, and a foundation for real-time analytics and event-driven applications.)
Provider Equivalents
- AWS: Amazon Kinesis Data Streams, Amazon Managed Service for Apache Flink, Amazon MSK
- Azure: Azure Stream Analytics, Azure Event Hubs, Azure HDInsight for Apache Kafka
- GCP: Google Cloud Dataflow, Pub/Sub
- OCI: OCI Streaming, OCI Data Flow
Frequently Asked Questions
- What's the difference between Stream Processing and batch processing?
- Stream processing handles data continuously as events arrive, often within seconds or milliseconds. Batch processing collects data over a period of time and processes it later in larger groups. Use stream processing when you need immediate action, such as fraud alerts, live dashboards, or IoT monitoring. Use batch processing when delay is acceptable, such as nightly reports or monthly billing.
- When should I use Stream Processing?
- Use stream processing when your business needs low-latency insights or actions. Common cases include fraud detection, clickstream analytics, sensor monitoring, log analysis, live recommendations, and operational alerting. If your users or systems benefit from reacting to events right away, stream processing is a strong fit. If waiting minutes or hours is acceptable, batch processing may be simpler and cheaper.
- How much does Stream Processing cost?
- Costs depend on data volume, throughput, retention period, processing time, and the number of compute resources running continuously. Managed services may charge for incoming events, streaming shards or partitions, processing units, storage, and network transfer. Costs can rise quickly if you keep long retention periods, overprovision capacity, or run complex stateful jobs. To control spending, estimate event rates carefully, use autoscaling where available, and separate ingestion costs from processing costs.
Category: analytics
Difficulty: intermediate
Related Terms
See Also