AWS Big Data Blog

Big Data AWS Training Course Gets Big Update

Michael Stroh is Communications Manager for AWS Training & Certification

AWS offers a number of in-depth technical training courses, which we’re regularly updating in response to student feedback and changes to the AWS platform. Today I want to tell you about some exciting changes to Big Data on AWS, our most comprehensive training course on the AWS big data platform.

The 3-day class is primarily aimed at data scientists, analysts, solutions architects, and anybody else who wants to use AWS to handle their big data workloads. The course teaches you how to leverage Amazon Elastic MapReduce (EMR), Amazon Redshift, Amazon Kinesis, Amazon DynamoDB and the rest of the AWS big data platform (as well as several popular third-party tools) to get useful answers from your data at a speed and cost that suits your needs.

What’s new

So what’s different? For starters, the course was completely reorganized to talk about the AWS big data platform like a story—from data ingestion, to storage, to visualization—and make it easier to follow.

Customers also said they really wanted to hear more about Amazon Redshift and understand the differences between Amazon Redshift and Amazon EMR—where these services overlap, and where they’re different. So the new version of the course adds about 150% more Amazon Redshift-related content, including course modules on cluster architecture and optimization, and concepts critical to understanding Amazon Redshift, such as data warehousing and columnar data storage.

Also in response to customer feedback, the AWS Training team beefed up coverage of Hadoop programming frameworks, especially for Hive, Presto, Pig, and Spark. The Spark module, for example, now includes details on MLlib, Spark Streaming, and GraphX. There’s also a new course module on Hue, the popular Hadoop web interface, and a new hands-on lab on running Hue on Amazon EMR.

Course updates were also a response to the fast evolving big data platform. So we added coverage of AWS Import/Export Snowball, Amazon Kinesis Firehose, and AWS QuickSight—the data ingestion, streaming, and visualization services (respectively) announced at re:Invent 2015.

Other notable highlights of the revised course include:

  • More explanation of how Amazon Kinesis and Amazon Kinesis Streams work.
  • More focus on three different types of server-side encryption of data stored in Amazon S3 (SSE-C, SSE-S3, and SSE-KMS).
  • New hands-on lab featuring TIBCO Spotfire, the popular visualization and analytics tool.
  • Additional reference architectures and patterns for creating and hosting big data environments on AWS.

The revised course also includes new or improved case studies of The Weather Channel, Nasdaq, Netflix, AdRoll, and Kaiten Sushiro (shown below), a conveyor belt sushi chain that uses Amazon Kinesis and Amazon Redshift to help decide in real time what plates chefs should be making next:

Taking the class

That’s just a sampling of the changes. To learn more, check out the course description for Big Data on AWS.

If you’re thinking about taking the course, you should already have a basic familiarity with Apache Hadoop, SQL, MapReduce, and other common big data technologies and concepts—plus a working knowledge of core AWS services. Still ramping up? We recommend taking Big Data Technology Fundamentals and AWS Technical Essentials first.

If you have questions or suggestions, please leave a comment below.