Stream processing

What is stream processing?

Batch processing is where the processing happens of blocks of data that have already been stored over a period of time. Batch processing works well in situations where you don’t need real-time analytics results, and when it is more important to process large volumes of data to get more detailed insights than it is to get fast analytics results.

Stream processing allows us to process data in real time as they arrive

Stream processing is the processing of data in motion, or in other words, computing on data directly as it is produced or received.

The majority of data are born as continuous streams: sensor events, user activity on a website, financial trades, and so on – all these data are created as a series of events over time.

Streaming computations can also process multiple data streams jointly, and each computation over the event data stream may produce other event data streams.

The systems that receive and send the data streams and execute the application or analytics logic are called stream processors. The basic responsibilities of a stream processor are to ensure that data flows efficiently and the computation scales and is fault-tolerant.

The stream processing paradigm naturally addresses many challenges that developers of real-time data analytics and event-driven applications face today.

other names: real-time analytics,

Challenges of stream processor:

- to ensure data flows efficiently

- fault-tolerant.

- low latency

Stream processing frameworks:

Apache Kafka, Apache Flink, Apache Storm, Apache Samza, etc.

References: