AWS Big Data Blog

Dispatches from re:Invent – Day 3

Matt Yanchyshyn is a Principal Solutions Architect at AWS

During the keynote on Wednesday we announced Amazon RDS for Aurora, a new high-performance and cost-effective relational database.  I heard from multiple AWS re:Invent attendees that they’re really excited about how AWS is innovating in the data storage space, and from a big data perspective it gives us another great option to leverage in our pipelines.

There were a ton of big data breakout sessions going on today.  In “Your First Big Data Application on AWS” I had a great time building a Kinesis-EMR-Redshift solution in real-time with the over 500 people who showed up to the session.  Obviously there is a huge interest in learning more about how to leverage AWS’s suite of big data services to solve business problems, and I heard about some fascinating use cases.

One session attendee is working on using machine learning to understand human learning patterns, and we spoke about running Mahout and other ML frameworks on Amazon EMR.  Another attendee had outgrown their existing MySQL database and was looking forward to testing out Amazon Redshift for their data warehousing needs – they were happy to learn how inexpensive it is to run Amazon Redshift clusters. In fact, in their case it would probably be free thanks to the Free Tier.  Several people came up to me after the session to say how easy it was to access data in Amazon Kinesis streams using the Hive connector with Amazon EMR. I think lots of companies are going to use this technique to get quick snapshots of streaming data as it flows into their Amazon Kinesis streams.

Everyone is looking forward to more product and feature announcements during Day Four’s keynote.  And I hope to hear about even more cool big data use cases!