Autoplay
Autocomplete
Previous Lesson
Complete and Continue
AWS Big Data Specialty
Welcome
What to expect in this course
About the instructor
Resources
Module 1: Overview of Big Data
Define Big Data
Identify sources of big data
Big Data use cases
Big Data ecosystem
big data pipeline and tools available for each phase
Module 2: Big Data Ingestion and Transfer
Options for ingesting data into AWS
AWS solutions for transferring data
Module 3: Real-Time Data Ingestion
Explain the need for stream processing and analytics
List the features of stream processing and analytics
Kinesis Data Streams
Kinesis Firehose
Kinesis Data Analytics
Module 4: Compute - Bid Data Storage Solutions
Data Storage Options in AWS
Explain Storage solutions concepts
Factors to consider when choosing a data store
Module 5: Big Data Processing and Analytics
Introduction to big data processing/analytics
EMR
Redshift
Simple querying
ad hoc analytics
Module 6: Apache Hadoop and Amazon EMR
Apache Hadoop
Apache Hadoop with relational database
Components of Apache Hadoop and the Apache Hadoop ecosystem
On-premises Apache Hadoop with EMR
Advantages of EMR
Improvement made to Hadoop with YARN
Architecture of Amazon EMR environment
Module 7: Using Amazon EMR
Launch Amazon EMR cluster
Long-running vs transient clusters
Quick vs Advanced Consoles
AMI options
Instance Types
resize a cluster
bootstrap actions
Methods of sending work to EMR
Module 8: Hadoop Programming Frameworks
How do programming frameworks work?
Hadoop frameworks and use cases
HIVE
Presto
Spark
Pig
Module 12: Introduction to Data Warehouse
Introduce key relational database concepts and terminology
compare and contrast transaction and analytical databases
Purpose of Data Warehouse
Big Data characteristics and how they apply data warehousing
data management on AWS
Module 13: Redshift Clusters
Networking Settings
Monitoring and auditing settings
SQL client Connectivity
Encryption
User permissions
Module 14: Designing the Database Schema
Database Schema and Data Types supported in Redshift
Columnar compression types
Available distribution styles for data
Data Storing methods
Module 15: Identifying Data Sources
Data sources for Amazon Redshift
S3
DynamoDB
EMR
Kinesis Data Firehose
Remote Hosts
Lambda Database loader
Legacy Data Warehouse & Schema Conversion tool
Module 16: Loading Data
Data and input file permission
COPY and its parameters
COPY - syntax and examples
ANALYZE, VACUUM, and deep copy
Concurrent write operations are handled
Provide troubleshooting information and best practices for loading data
Module 17: Writing Queries and Tuning for Performance
Provide an overview of Amazon Redshift SQL
Describe factors that effect query performance and provide tips for mitigating performance issues
Describe the EXPLAIN command and query plans
Explain workload management (WLM) configuration
Module 18: Amazon Redshift Spectrum
"Dark Data" Problem
Provide an overview of Amazon Redshift Spectrum and its benefits
Describe Spectrum architecture and its components
Module 19: Maintaining Cluster
Audit Logging Options
Performance monitoring options
Event subscriptions and notifications
Module 20: Analyzing and Visualizing Data
Explain the purpose of visualizing your data
Introduce Amazon QuickSight
Module 21: Web Interfaces on Amazon EMR
Web interfaces available with EMR
HUE
Hadoop applications that HUE supports
Monitoring EMR
Module 22: Apache Spark on Amazon EMR
Using Apache Spark
use cases for Spark
Spark programming model
Modules included with Spark
How Spark is deployed on EMR
Advantages of running Spark on EMR
Module 23: Using AWS Glue to Automate ETL
Serverless technology in a big data platform
AWS Glue for serverless ETL
Analyze use cases for using Glue
Module 24: Security
Shared responsibility model
EMR and VPC
EMR and IAM
EMR and Security Group
EMR and encryption (at rest and in transit)
Security on Kinesis, DynamoDB, Redshift
Module 25: Managing Big Data Costs
EMR and Cost Considerations
pricing models and cost considerations (EC2, Kinesis, DynamoDB and Redshift)
Use case and strategies
Managing EC2 costs for EMR
Leverage more than once price model
factors to consider when planning for storage and data transfer
Provide Best Practices
Module 26: Visualizing and Orchestrating Big Data
Purpose of visualizing big data
Describe AWS solutions for visualizing big data
Describe AWS Data Pipeline can orchestrate big data workflow
Module 27: Big Data Design Patterns
Interactive Query
Batch Processing
Streaming Data Processing
Real-Time Predictions
Batch Predictions
Interactive Query
Long-Running Cluster
Aggregating and Cleansing Logs
Quick Start Architecture for Data Lakes
Architecture Using AWS Glue
EMR
EMR - High Level
EMR - Planning and Configuration
EMR - EMRFS
EMR - Input Data
EMR - Cluster Hardware & Networking
EMR - Instance Fleet and Instance Group
EMR - Submitting a Job to a Cluster
EMR - Hadoop
EMR - Security
EMR - Best Practice
Big Data - Exam Collection From Internet
Big Data Exam
big data pipeline and tools available for each phase
Lesson content locked
If you're already enrolled,
you'll need to login
.
Enroll in Course to Unlock