AWS Big Data Blog

Month in Review: June 2016

Lots to see on the Big Data Blog in June! Please take a look at the summaries below for something that catches your interest.

Use Sqoop to Transfer Data from Amazon EMR to Amazon RDS
Customers commonly process and transform vast amounts of data with EMR and then transfer and store summaries or aggregates of that data in relational databases such as MySQL or Oracle. In this post, learn how to transfer data using Apache Sqoop, a tool designed to transfer data between Hadoop and relational databases.

Analyze Realtime Data from Amazon Kinesis Streams Using Zeppelin and Spark Streaming
Streaming data is everywhere. This includes clickstream data, data from sensors, data emitted from billions of IoT devices, and more. Not surprisingly, data scientists want to analyze and explore these data streams in real time. This post shows you how you can use Spark Streaming to process data coming from Amazon Kinesis streams, build some graphs using Zeppelin, and then store the Zeppelin notebook in Amazon S3.

Processing Amazon DynamoDB Streams Using the Amazon Kinesis Client Library
This post demystifies the KCL by explaining some of its important configurable properties and estimate its resource consumption

Apache Tez Now Available with Amazon EMR
Amazon EMR has added Apache Tez version 0.8.3 as a supported application in release 4.7.0. Tez is an extensible framework for building batch and interactive data processing applications on top of Hadoop YARN. This post helps you get started.

Use Apache Oozie Workflows to Automate Apache Spark Jobs (and more!) on Amazon EMR
The AWS Big Data Blog has a large community of authors who are passionate about Apache Spark and who regularly publish content that helps customers use Spark to build real-world solutions. You’ll see content on a variety of topics, including deep-dives on Spark’s internals, building Spark Streaming applications, creating machine learning pipelines using MLlib, and ways to apply Spark to various real-world use cases.

JOIN Amazon Redshift AND Amazon RDS PostgreSQL WITH dblink
In this post, you’ll learn how easy it is to create a master key in KMS, encrypt data either client-side or server-side, upload it to S3, and have EMR seamlessly read and write that encrypted data to and from S3 using the master key that you created.

FROM THE ARCHIVE

Running R on AWS (July 2015) Learn how to install and run R, RStudio Server, and Shiny Server on AWS.

———————————————–

Want to learn more about Big Data or Streaming Data? Check out our Big Data and Streaming data educational pages.

Leave a comment below to let us know what big data topics you’d like to see next on the AWS Big Data Blog.