Rahul Bhartia is an AWS Solutions Architect
Streams of data are becoming ubiquitous today – clickstreams, log streams, event streams, and more. The need for real-time processing of high-volume data streams is pushing the limits of traditional data processing infrastructures. Building a clickstream monitoring system, for example, where data is in the form of a continuous clickstream rather than discrete data sets, requires the use of continuous processing rather than ad-hoc, one-time queries.
Developers can use Apache Storm and Amazon Kinesis to quickly and cost-effectively build an application that continuously processes very high volumes of streaming data. To help developers integrate Apache Storm with Amazon Kinesis, earlier this year we launched the Amazon Kinesis Storm Spout. Last week we released an update to the Spout to support Ack/Fail semantics. With this update, the Spout now re-emits failed messages up to the configured retry limit, making it easier to build reliable data processing applications. The updated Amazon Kinesis Storm Spout is available on Github.
Along with the updated Amazon Kinesis Storm Spout, we published a white paper that outlines a reference architecture for building a real-time, sliding-window visualization over clickstream data using Amazon Kinesis and Apache Storm. The white paper documents a reference system that demonstrates everything from ingestion, processing and storing to visualization of the data in real time. You can launch the entire application shown in the diagram below in one click using the template.
Check out the white paper to learn how the entire stack works all the way from ingestion to visualization, and look at our github repository to view further instructions on how to build and deploy it yourself.
If you have questions or suggestions, please leave a comment below.
Do more with Amazon Kinesis!