From a2ec3e9877416e33abaa64efc3bb4b1547627209 Mon Sep 17 00:00:00 2001 From: kananinirav <30398499+kananinirav@users.noreply.github.com> Date: Tue, 16 Aug 2022 10:20:01 +0900 Subject: [PATCH] [Modified] Table Of Contents added --- README.md | 19 +- sections/databases.md | 379 +++++++++++++------------ sections/deploying.md | 344 ++++++++++++----------- sections/ec2_storage.md | 192 +++++++------ sections/other_compute.md | 273 +++++++++--------- sections/s3.md | 567 ++++++++++++++++++++------------------ 6 files changed, 949 insertions(+), 825 deletions(-) diff --git a/README.md b/README.md index 5ea3c27..5e16332 100644 --- a/README.md +++ b/README.md @@ -4,16 +4,15 @@ ### Table of contents -- AWS Fundamentals - - [What is Cloud Computing?](sections/cloud_computing.md) - - [IAM: Identity Access & Management](sections/iam.md) - - [EC2: Virtual Machines](sections/ec2.md) - - [EC2 Instance Storage](sections/ec2_storage.md) - - [Elastic Load Balancing & Auto Scaling Groups](sections/elb_asg.md) - - [Amazon S3](sections/s3.md) - - [Databases & Analytics](sections/databases.md) - - [Other Compute Section](sections/other_compute.md) - - [Deploying and Managing Infrastructure at Scale Section](sections/deploying.md) +- [What is Cloud Computing?](sections/cloud_computing.md) +- [IAM: Identity Access & Management](sections/iam.md) +- [EC2: Virtual Machines](sections/ec2.md) +- [EC2 Instance Storage](sections/ec2_storage.md) +- [Elastic Load Balancing & Auto Scaling Groups](sections/elb_asg.md) +- [Amazon S3](sections/s3.md) +- [Databases & Analytics](sections/databases.md) +- [Other Compute Section](sections/other_compute.md) +- [Deploying and Managing Infrastructure at Scale Section](sections/deploying.md) ### Contributors diff --git a/sections/databases.md b/sections/databases.md index 8b76c83..fc14765 100644 --- a/sections/databases.md +++ b/sections/databases.md @@ -1,37 +1,64 @@ -# Databases +# Databases & Analytics + +- [Databases & Analytics](#databases--analytics) + - [Databases Intro](#databases-intro) + - [Relational Databases](#relational-databases) + - [NoSQL Databases](#nosql-databases) + - [NoSQL data example: JSON](#nosql-data-example-json) + - [Databases & Shared Responsibility on AWS](#databases--shared-responsibility-on-aws) + - [AWS RDS Overview](#aws-rds-overview) + - [Advantage over using RDS versus deploying DB on EC2](#advantage-over-using-rds-versus-deploying-db-on-ec2) + - [RDS Deployments: Read Replicas, Multi-AZ](#rds-deployments-read-replicas-multi-az) + - [RDS Deployments: Multi-Region](#rds-deployments-multi-region) + - [Amazon Aurora](#amazon-aurora) + - [Amazon ElastiCache Overview](#amazon-elasticache-overview) + - [DynamoDB](#dynamodb) + - [DynamoDB Accelerator - DAX](#dynamodb-accelerator---dax) + - [DynamoDB - Global Tables](#dynamodb---global-tables) + - [Redshift Overview](#redshift-overview) + - [Amazon EMR](#amazon-emr) + - [Amazon Athena](#amazon-athena) + - [Amazon QuickSight](#amazon-quicksight) + - [DocumentDB](#documentdb) + - [Amazon Neptune](#amazon-neptune) + - [Amazon QLDB](#amazon-qldb) + - [Amazon Managed Blockchain](#amazon-managed-blockchain) + - [AWS Glue](#aws-glue) + - [DMS - Database Migration Service](#dms---database-migration-service) + - [Databases & Analytics Summary](#databases--analytics-summary) ## Databases Intro -* Storing data on disk (EFS, EBS, EC2 Instance Store, S3) can have its limits -* Sometimes, you want to store data in a database… -* You can structure the data -* You build indexes to efficiently query / search through the data -* You define relationships between your datasets -* Databases are optimized for a purpose and come with different features, shapes and constraint +- Storing data on disk (EFS, EBS, EC2 Instance Store, S3) can have its limits +- Sometimes, you want to store data in a database… +- You can structure the data +- You build indexes to efficiently query / search through the data +- You define relationships between your datasets +- Databases are optimized for a purpose and come with different features, shapes and constraint ## Relational Databases -* Looks just like Excel spreadsheets, with links between them! -* Can use the SQL language to perform queries / lookups +- Looks just like Excel spreadsheets, with links between them! +- Can use the SQL language to perform queries / lookups ## NoSQL Databases -* NoSQL = non-SQL = non relational databases -* NoSQL databases are purpose built for specific data models and have flexible schemas for building modern applications. -* Benefits: - * Flexibility: easy to evolve data model - * Scalability: designed to scale-out by using distributed clusters - * High-performance: optimized for a specific data model - * Highly functional: types optimized for the data model -* Examples: Key-value, document, graph, in-memory, search databases +- NoSQL = non-SQL = non relational databases +- NoSQL databases are purpose built for specific data models and have flexible schemas for building modern applications. +- Benefits: + - Flexibility: easy to evolve data model + - Scalability: designed to scale-out by using distributed clusters + - High-performance: optimized for a specific data model + - Highly functional: types optimized for the data model +- Examples: Key-value, document, graph, in-memory, search databases ### NoSQL data example: JSON -* JSON = JavaScript Object Notation -* JSON is a common form of data that fits into a NoSQL model -* Data can be nested -* Fields can change over time -* Support for new types: arrays, etc… +- JSON = JavaScript Object Notation +- JSON is a common form of data that fits into a NoSQL model +- Data can be nested +- Fields can change over time +- Support for new types: arrays, etc… ```json { @@ -52,213 +79,213 @@ ## Databases & Shared Responsibility on AWS -* AWS offers use to manage different databases -* Benefits include: - * Quick Provisioning, High Availability, Vertical and Horizontal Scaling - * Automated Backup & Restore, Operations, Upgrades - * Operating System Patching is handled by AWS - * Monitoring, alerting -* Note: many databases technologies could be run on EC2, but you must handle yourself the resiliency, backup, patching, high availability, fault tolerance, scaling +- AWS offers use to manage different databases +- Benefits include: + - Quick Provisioning, High Availability, Vertical and Horizontal Scaling + - Automated Backup & Restore, Operations, Upgrades + - Operating System Patching is handled by AWS + - Monitoring, alerting +- Note: many databases technologies could be run on EC2, but you must handle yourself the resiliency, backup, patching, high availability, fault tolerance, scaling ## AWS RDS Overview -* RDS stands for Relational Database Service -* It’s a managed DB service for DB use SQL as a query language. -* It allows you to create databases in the cloud that are managed by AWS - * Postgres - * MySQL - * MariaDB - * Oracle - * Microsoft SQL Server - * **Aurora (AWS Proprietary database)** +- RDS stands for Relational Database Service +- It’s a managed DB service for DB use SQL as a query language. +- It allows you to create databases in the cloud that are managed by AWS + - Postgres + - MySQL + - MariaDB + - Oracle + - Microsoft SQL Server + - **Aurora (AWS Proprietary database)** ### Advantage over using RDS versus deploying DB on EC2 -* RDS is a managed service: - * Automated provisioning, OS patching - * Continuous backups and restore to specific timestamp (Point in Time Restore)! - * Monitoring dashboards - * Read replicas for improved read performance - * Multi AZ setup for DR (Disaster Recovery) - * Maintenance windows for upgrades - * Scaling capability (vertical and horizontal) - * Storage backed by EBS (gp2 or io1) -* BUT you can’t SSH into your instances +- RDS is a managed service: + - Automated provisioning, OS patching + - Continuous backups and restore to specific timestamp (Point in Time Restore)! + - Monitoring dashboards + - Read replicas for improved read performance + - Multi AZ setup for DR (Disaster Recovery) + - Maintenance windows for upgrades + - Scaling capability (vertical and horizontal) + - Storage backed by EBS (gp2 or io1) +- BUT you can’t SSH into your instances -## Amazon Aurora +### RDS Deployments: Read Replicas, Multi-AZ -* Aurora is a proprietary technology from AWS (not open sourced) -* PostgreSQL and MySQL are both supported as Aurora DB -* Aurora is “AWS cloud optimized” and claims 5x performance improvement over MySQL on RDS, over 3x the performance of Postgres on RDS -* Aurora storage automatically grows in increments of 10GB, up to 64 TB. -* Aurora costs more than RDS (20% more) – but is more efficient -* Not in the free tier - -## RDS Deployments: Read Replicas, Multi-AZ - -Read Replicas | Multi-AZ ----- | ---- -Scale the read workload of your DB | Failover in case of AZ outage (high availability) -Can create up to 5 Read Replicas | Data is only read/written to the main database -Data is only written to the main DB | Can only have 1 other AZ as failover +| Read Replicas | Multi-AZ | +| ----------------------------------- | ------------------------------------------------- | +| Scale the read workload of your DB | Failover in case of AZ outage (high availability) | +| Can create up to 5 Read Replicas | Data is only read/written to the main database | +| Data is only written to the main DB | Can only have 1 other AZ as failover | ![Read Replicas | Multi-AZ](/images/read_replicas_multi_AZ.png) -## RDS Deployments: Multi-Region +### RDS Deployments: Multi-Region -* Multi-Region (Read Replicas) - * Disaster recovery in case of region issue - * Local performance for global reads - * Replication cost +- Multi-Region (Read Replicas) + - Disaster recovery in case of region issue + - Local performance for global reads + - Replication cost ![Multi-Region](/images/multi_region.png) +## Amazon Aurora + +- Aurora is a proprietary technology from AWS (not open sourced) +- PostgreSQL and MySQL are both supported as Aurora DB +- Aurora is “AWS cloud optimized” and claims 5x performance improvement over MySQL on RDS, over 3x the performance of Postgres on RDS +- Aurora storage automatically grows in increments of 10GB, up to 64 TB. +- Aurora costs more than RDS (20% more) – but is more efficient +- Not in the free tier + ## Amazon ElastiCache Overview -* The same way RDS is to get managed Relational Databases… -* ElastiCache is to get managed Redis or Memcached -* Caches are in-memory databases with high performance, low latency -* Helps reduce load off databases for read intensive workloads -* AWS takes care of OS maintenance / patching, optimizations, setup, configuration, monitoring, failure recovery and backup +- The same way RDS is to get managed Relational Databases… +- ElastiCache is to get managed Redis or Memcached +- Caches are in-memory databases with high performance, low latency +- Helps reduce load off databases for read intensive workloads +- AWS takes care of OS maintenance / patching, optimizations, setup, configuration, monitoring, failure recovery and backup ## DynamoDB -* Fully Managed Highly available with replication across 3 AZ -* NoSQL database - not a relational database -* Scales to massive workloads, distributed “serverless” database -* Millions of requests per seconds, trillions of row, 100s of TB of storage -* Fast and consistent in performance -* Single-digit millisecond latency – low latency retrieval -* Integrated with IAM for security, authorization and administration -* Low cost and auto scaling capabilities -* Standard & Infrequent Access (IA) Table Class +- Fully Managed Highly available with replication across 3 AZ +- NoSQL database - not a relational database +- Scales to massive workloads, distributed “serverless” database +- Millions of requests per seconds, trillions of row, 100s of TB of storage +- Fast and consistent in performance +- Single-digit millisecond latency – low latency retrieval +- Integrated with IAM for security, authorization and administration +- Low cost and auto scaling capabilities +- Standard & Infrequent Access (IA) Table Class ### DynamoDB Accelerator - DAX -* Fully Managed in-memory cache for DynamoDB -* 10x performance improvement – single- digit millisecond latency to microseconds latency – when accessing your DynamoDB tables -* Secure, highly scalable & highly available -* Difference with ElastiCache at the CCP level: DAX is only used for and is integrated with DynamoDB, while ElastiCache can be used for other databases +- Fully Managed in-memory cache for DynamoDB +- 10x performance improvement – single- digit millisecond latency to microseconds latency – when accessing your DynamoDB tables +- Secure, highly scalable & highly available +- Difference with ElastiCache at the CCP level: DAX is only used for and is integrated with DynamoDB, while ElastiCache can be used for other databases -### DynamoDB – Global Tables +### DynamoDB - Global Tables -* Make a DynamoDB table accessible with low latency in multiple-regions -* Active-Active replication (read/write to any AWS Region) +- Make a DynamoDB table accessible with low latency in multiple-regions +- Active-Active replication (read/write to any AWS Region) ## Redshift Overview -* Redshift is based on PostgreSQL, but it’s not used for OLTP (Online Transactional Processing) -* It’s OLAP – online analytical processing (analytics and data warehousing) -* Load data once every hour, not every second -* 10x better performance than other data warehouses, scale to PBs of data -* Columnar storage of data (instead of row based) -* Massively Parallel Query Execution (MPP), highly available -* Pay as you go based on the instances provisioned -* Has a SQL interface for performing the queries -* BI tools such as AWS Quicksight or Tableau integrate with it +- Redshift is based on PostgreSQL, but it’s not used for OLTP (Online Transactional Processing) +- It’s OLAP – online analytical processing (analytics and data warehousing) +- Load data once every hour, not every second +- 10x better performance than other data warehouses, scale to PBs of data +- Columnar storage of data (instead of row based) +- Massively Parallel Query Execution (MPP), highly available +- Pay as you go based on the instances provisioned +- Has a SQL interface for performing the queries +- BI tools such as AWS Quicksight or Tableau integrate with it ## Amazon EMR -* EMR stands for “Elastic MapReduce” -* EMR helps creating Hadoop clusters (Big Data) to analyze and process vast amount of data -* The clusters can be made of hundreds of EC2 instances -* Also supports Apache Spark, HBase, Presto, Flink -* EMR takes care of all the provisioning and configuration -* Auto-scaling and integrated with Spot instances -* Use cases: data processing, machine learning, web indexing, big data +- EMR stands for “Elastic MapReduce” +- EMR helps creating Hadoop clusters (Big Data) to analyze and process vast amount of data +- The clusters can be made of hundreds of EC2 instances +- Also supports Apache Spark, HBase, Presto, Flink +- EMR takes care of all the provisioning and configuration +- Auto-scaling and integrated with Spot instances +- Use cases: data processing, machine learning, web indexing, big data ## Amazon Athena -* Serverless query service to analyze data stored in Amazon S3 -* Uses standard SQL language to query the files -* Supports CSV, JSON, ORC, Avro, and Parquet (built on Presto) -* Pricing: $5.00 per TB of data scanned -* Use compressed or columnar data for cost-savings (less scan) -* Use cases: Business intelligence / analytics / reporting, analyze & query VPC Flow Logs, ELB Logs, CloudTrail trails, etc... -* **analyze data in S3 using serverless SQL, use Athena** +- Serverless query service to analyze data stored in Amazon S3 +- Uses standard SQL language to query the files +- Supports CSV, JSON, ORC, Avro, and Parquet (built on Presto) +- Pricing: $5.00 per TB of data scanned +- Use compressed or columnar data for cost-savings (less scan) +- Use cases: Business intelligence / analytics / reporting, analyze & query VPC Flow Logs, ELB Logs, CloudTrail trails, etc... +- **analyze data in S3 using serverless SQL, use Athena** ## Amazon QuickSight -* Serverless machine learning-powered business intelligence service to create interactive dashboards -* Fast, automatically scalable, embeddable, with per-session pricing -* Use cases: - * Business analytics - * Building visualizations - * Perform ad-hoc analysis - * Get business insights using data -* Integrated with RDS, Aurora, Athena, Redshift, S3… +- Serverless machine learning-powered business intelligence service to create interactive dashboards +- Fast, automatically scalable, embeddable, with per-session pricing +- Use cases: + - Business analytics + - Building visualizations + - Perform ad-hoc analysis + - Get business insights using data +- Integrated with RDS, Aurora, Athena, Redshift, S3… ## DocumentDB -* Aurora is an “AWS-implementation” of PostgreSQL / MySQL … -* DocumentDB is the same for MongoDB (which is a NoSQL database) -* MongoDB is used to store, query, and index JSON data -* Similar “deployment concepts” as Aurora -* Fully Managed, highly available with replication across 3 AZ -* Aurora storage automatically grows in increments of 10GB, up to 64 TB. -* Automatically scales to workloads with millions of requests per seconds +- Aurora is an “AWS-implementation” of PostgreSQL / MySQL … +- DocumentDB is the same for MongoDB (which is a NoSQL database) +- MongoDB is used to store, query, and index JSON data +- Similar “deployment concepts” as Aurora +- Fully Managed, highly available with replication across 3 AZ +- Aurora storage automatically grows in increments of 10GB, up to 64 TB. +- Automatically scales to workloads with millions of requests per seconds ## Amazon Neptune -* Fully managed graph database -* A popular graph dataset would be a social network - * Users have friends - * Posts have comments - * Comments have likes from users - * Users share and like posts… -* Highly available across 3 AZ, with up to 15 read replicas -* Build and run applications working with highly connected datasets – optimized for these complex and hard queries -* Can store up to billions of relations and query the graph with milliseconds latency -* Highly available with replications across multiple AZs -* Great for knowledge graphs (Wikipedia), fraud detection, recommendation engines, social networking +- Fully managed graph database +- A popular graph dataset would be a social network + - Users have friends + - Posts have comments + - Comments have likes from users + - Users share and like posts… +- Highly available across 3 AZ, with up to 15 read replicas +- Build and run applications working with highly connected datasets – optimized for these complex and hard queries +- Can store up to billions of relations and query the graph with milliseconds latency +- Highly available with replications across multiple AZs +- Great for knowledge graphs (Wikipedia), fraud detection, recommendation engines, social networking ## Amazon QLDB -* QLDB stands for ”Quantum Ledger Database” -* A ledger is a book **recording financial transactions** -* Fully Managed, Serverless, High available, Replication across 3 AZ -* Used to **review history of all the changes made to your application data** over time -* **Immutable** system: no entry can be removed or modified, cryptographically verifiable -* 2-3x better performance than common ledger blockchain frameworks, manipulate data using SQL -* Difference with Amazon Managed Blockchain: no decentralization component, in accordance with financial regulation rules +- QLDB stands for ”Quantum Ledger Database” +- A ledger is a book **recording financial transactions** +- Fully Managed, Serverless, High available, Replication across 3 AZ +- Used to **review history of all the changes made to your application data** over time +- **Immutable** system: no entry can be removed or modified, cryptographically verifiable +- 2-3x better performance than common ledger blockchain frameworks, manipulate data using SQL +- Difference with Amazon Managed Blockchain: no decentralization component, in accordance with financial regulation rules ## Amazon Managed Blockchain -* Blockchain makes it possible to build applications where multiple parties can execute transactions without the need for a trusted, central authority. -* Amazon Managed Blockchain is a managed service to: - * Join public blockchain networks - * Or create your own scalable private network -* Compatible with the frameworks Hyperledger Fabric & Ethereum +- Blockchain makes it possible to build applications where multiple parties can execute transactions without the need for a trusted, central authority. +- Amazon Managed Blockchain is a managed service to: + - Join public blockchain networks + - Or create your own scalable private network +- Compatible with the frameworks Hyperledger Fabric & Ethereum ## AWS Glue -* Managed extract, transform, and load (ETL) service -* Useful to prepare and transform data for analytics -* Fully serverless service -* Glue Data Catalog: catalog of datasets - * can be used by Athena, Redshift, EMR +- Managed extract, transform, and load (ETL) service +- Useful to prepare and transform data for analytics +- Fully serverless service +- Glue Data Catalog: catalog of datasets + - can be used by Athena, Redshift, EMR -## DMS – Database Migration Service +## DMS - Database Migration Service -* Quickly and securely migrate databases to AWS, resilient, self healing -* The source database remains available during the migration -* Supports: - * Homogeneous migrations: ex Oracle to Oracle - * Heterogeneous migrations: ex Microsoft SQL Server to Aurora +- Quickly and securely migrate databases to AWS, resilient, self healing +- The source database remains available during the migration +- Supports: + - Homogeneous migrations: ex Oracle to Oracle + - Heterogeneous migrations: ex Microsoft SQL Server to Aurora -## Databases & Analytics Summary in AWS +## Databases & Analytics Summary -* Relational Databases - OLTP: RDS & Aurora (SQL) -* Differences between Multi-AZ, Read Replicas, Multi-Region -* In-memory Database: ElastiCache -* Key/Value Database: DynamoDB (serverless) & DAX (cache for DynamoDB) -* Warehouse - OLAP: Redshift (SQL) -* Hadoop Cluster: EMR -* Athena: query data on Amazon S3 (serverless & SQL) -* QuickSight: dashboards on your data (serverless) -* DocumentDB: “Aurora for MongoDB” (JSON – NoSQL database) -* Amazon QLDB: Financial Transactions Ledger (immutable journal, cryptographically verifiable) -* Amazon Managed Blockchain: managed Hyperledger Fabric & Ethereum blockchains -* Glue: Managed ETL (Extract Transform Load) and Data Catalog service -* Database Migration: DMS -* Neptune: graph database \ No newline at end of file +- Relational Databases - OLTP: RDS & Aurora (SQL) +- Differences between Multi-AZ, Read Replicas, Multi-Region +- In-memory Database: ElastiCache +- Key/Value Database: DynamoDB (serverless) & DAX (cache for DynamoDB) +- Warehouse - OLAP: Redshift (SQL) +- Hadoop Cluster: EMR +- Athena: query data on Amazon S3 (serverless & SQL) +- QuickSight: dashboards on your data (serverless) +- DocumentDB: “Aurora for MongoDB” (JSON – NoSQL database) +- Amazon QLDB: Financial Transactions Ledger (immutable journal, cryptographically verifiable) +- Amazon Managed Blockchain: managed Hyperledger Fabric & Ethereum blockchains +- Glue: Managed ETL (Extract Transform Load) and Data Catalog service +- Database Migration: DMS +- Neptune: graph database diff --git a/sections/deploying.md b/sections/deploying.md index 47df87f..0e41e26 100644 --- a/sections/deploying.md +++ b/sections/deploying.md @@ -1,221 +1,243 @@ # Deploying and Managing Infrastructure at Scale -## What is CloudFormation +- [Deploying and Managing Infrastructure at Scale](#deploying-and-managing-infrastructure-at-scale) + - [What is CloudFormation?](#what-is-cloudformation) + - [Benefits of AWS CloudFormation](#benefits-of-aws-cloudformation) + - [CloudFormation Stack Designer](#cloudformation-stack-designer) + - [AWS Cloud Development Kit (CDK)](#aws-cloud-development-kit-cdk) + - [Developer problems on AWS](#developer-problems-on-aws) + - [AWS Elastic Beanstalk Overview](#aws-elastic-beanstalk-overview) + - [Elastic Beanstalk - Health Monitoring](#elastic-beanstalk---health-monitoring) + - [AWS CodeDeploy](#aws-codedeploy) + - [AWS CodeCommit](#aws-codecommit) + - [AWS CodeBuild](#aws-codebuild) + - [AWS CodePipeline](#aws-codepipeline) + - [AWS CodeArtifact](#aws-codeartifact) + - [AWS CodeStar](#aws-codestar) + - [AWS Cloud9](#aws-cloud9) + - [AWS Systems Manager (SSM)](#aws-systems-manager-ssm) + - [How Systems Manager works](#how-systems-manager-works) + - [Systems Manager - SSM Session Manager](#systems-manager---ssm-session-manager) + - [AWS OpsWorks](#aws-opsworks) + - [Deployment - Summary](#deployment---summary) + - [Developer Services - Summary](#developer-services---summary) -* CloudFormation is a declarative way of outlining your AWS Infrastructure, for any resources (most of them are supported). -* For example, within a CloudFormation template, you say: - * I want a security group - * I want two EC2 instances using this security group - * I want an S3 bucket - * I want a load balancer (ELB) in front of these machines -* Then CloudFormation creates those for you, in the right order, with the exact configuration that you specify +## What is CloudFormation? + +- CloudFormation is a declarative way of outlining your AWS Infrastructure, for any resources (most of them are supported). +- For example, within a CloudFormation template, you say: + - I want a security group + - I want two EC2 instances using this security group + - I want an S3 bucket + - I want a load balancer (ELB) in front of these machines +- Then CloudFormation creates those for you, in the right order, with the exact configuration that you specify ### Benefits of AWS CloudFormation -* Infrastructure as code - * No resources are manually created, which is excellent for control - * Changes to the infrastructure are reviewed through code -* Cost - * Each resources within the stack is tagged with an identifier so you can easily see how much a stack costs you - * You can estimate the costs of your resources using the CloudFormation template - * Savings strategy: In Dev, you could automation deletion of templates at 5 PM and recreated at 8 AM, safely -* Productivity - * Ability to destroy and re-create an infrastructure on the cloud on the fly - * Automated generation of Diagram for your templates! - * Declarative programming (no need to figure out ordering and orchestration) -* Don’t re-invent the wheel - * Leverage existing templates on the web! - * Leverage the documentation -* Supports (almost) all AWS resources: - * Everything we’ll see in this course is supported - * You can use “custom resources” for resources that are not supported +- Infrastructure as code + - No resources are manually created, which is excellent for control + - Changes to the infrastructure are reviewed through code +- Cost + - Each resources within the stack is tagged with an identifier so you can easily see how much a stack costs you + - You can estimate the costs of your resources using the CloudFormation template + - Savings strategy: In Dev, you could automation deletion of templates at 5 PM and recreated at 8 AM, safely +- Productivity + - Ability to destroy and re-create an infrastructure on the cloud on the fly + - Automated generation of Diagram for your templates! + - Declarative programming (no need to figure out ordering and orchestration) +- Don’t re-invent the wheel + - Leverage existing templates on the web! + - Leverage the documentation +- Supports (almost) all AWS resources: + - Everything we’ll see in this course is supported + - You can use “custom resources” for resources that are not supported ### CloudFormation Stack Designer -* Example: WordPress CloudFormation Stack -* We can see all the resources -* We can see the relations between the components +- Example: WordPress CloudFormation Stack +- We can see all the resources +- We can see the relations between the components ## AWS Cloud Development Kit (CDK) -* Define your cloud infrastructure using a familiar language: - * JavaScript/TypeScript, Python, Java, and .NET -* The code is “compiled” into a CloudFormation template (JSON/YAML) -* You can therefore deploy infrastructure and application runtime code together - * Great for Lambda functions - * Great for Docker containers in ECS / EKS +- Define your cloud infrastructure using a familiar language: + - JavaScript/TypeScript, Python, Java, and .NET +- The code is “compiled” into a CloudFormation template (JSON/YAML) +- You can therefore deploy infrastructure and application runtime code together + - Great for Lambda functions + - Great for Docker containers in ECS / EKS ## Developer problems on AWS -* Managing infrastructure -* Deploying Code -* Configuring all the databases, load balancers, etc -* Scaling concerns -* Most web apps have the same architecture (ALB + ASG) -* All the developers want is for their code to run! -* Possibly, consistently across different applications and environments +- Managing infrastructure +- Deploying Code +- Configuring all the databases, load balancers, etc +- Scaling concerns +- Most web apps have the same architecture (ALB + ASG) +- All the developers want is for their code to run! +- Possibly, consistently across different applications and environments ## AWS Elastic Beanstalk Overview -* Elastic Beanstalk is a developer centric view of deploying an application on AWS -* It uses all the component’s we’ve seen before: EC2, ASG, ELB, RDS, etc… -* But it’s all in one view that’s easy to make sense of! -* We still have full control over the configuration -* Beanstalk = Platform as a Service (PaaS) -* Beanstalk is free but you pay for the underlying instances -* Managed service - * Instance configuration / OS is handled by Beanstalk - * Deployment strategy is configurable but performed by Elastic Beanstalk - * Capacity provisioning - * Load balancing & auto-scaling -* Application health-monitoring & responsiveness -* Just the application code is the responsibility of the developer -* Three architecture models: - * Single Instance deployment: good for dev - * LB + ASG: great for production or pre-production web applications - * ASG only: great for non-web apps in production (workers, etc..) +- Elastic Beanstalk is a developer centric view of deploying an application on AWS +- It uses all the component’s we’ve seen before: EC2, ASG, ELB, RDS, etc… +- But it’s all in one view that’s easy to make sense of! +- We still have full control over the configuration +- Beanstalk = Platform as a Service (PaaS) +- Beanstalk is free but you pay for the underlying instances +- Managed service + - Instance configuration / OS is handled by Beanstalk + - Deployment strategy is configurable but performed by Elastic Beanstalk + - Capacity provisioning + - Load balancing & auto-scaling +- Application health-monitoring & responsiveness +- Just the application code is the responsibility of the developer +- Three architecture models: + - Single Instance deployment: good for dev + - LB + ASG: great for production or pre-production web applications + - ASG only: great for non-web apps in production (workers, etc..) -* Support for many platforms: - * Go - * Java SE - * Java with Tomcat - * .NET on Windows Server with IIS - * Node.js - * PHP - * Python - * Ruby - * Packer Builder - * Single Container Docker - * Multi-Container Docker - * Preconfigured Docker +- Support for many platforms: + - Go + - Java SE + - Java with Tomcat + - .NET on Windows Server with IIS + - Node.js + - PHP + - Python + - Ruby + - Packer Builder + - Single Container Docker + - Multi-Container Docker + - Preconfigured Docker -### Elastic Beanstalk – Health Monitoring +### Elastic Beanstalk - Health Monitoring -* Health agent pushes metrics to CloudWatch -* Checks for app health, publishes health events +- Health agent pushes metrics to CloudWatch +- Checks for app health, publishes health events ## AWS CodeDeploy -* We want to deploy our application automatically -* Works with EC2 Instances -* Works with On-Premises Servers -* Hybrid service -* Servers / Instances must be provisioned and configured ahead of time with the CodeDeploy Agent +- We want to deploy our application automatically +- Works with EC2 Instances +- Works with On-Premises Servers +- Hybrid service +- Servers / Instances must be provisioned and configured ahead of time with the CodeDeploy Agent ## AWS CodeCommit -* Before pushing the application code to servers, it needs to be stored somewhere -* Developers usually store code in a repository, using the Git technology -* A famous public offering is GitHub, AWS’ competing product is CodeCommit -* CodeCommit: - * Source-control service that hosts Git-based repositories - * Makes it easy to collaborate with others on code - * The code changes are automatically versioned -* Benefits: - * Fully managed - * Scalable & highly available - * Private, Secured, Integrated with AWS +- Before pushing the application code to servers, it needs to be stored somewhere +- Developers usually store code in a repository, using the Git technology +- A famous public offering is GitHub, AWS’ competing product is CodeCommit +- CodeCommit: + - Source-control service that hosts Git-based repositories + - Makes it easy to collaborate with others on code + - The code changes are automatically versioned +- Benefits: + - Fully managed + - Scalable & highly available + - Private, Secured, Integrated with AWS ## AWS CodeBuild -* Code building service in the cloud (name is obvious) -* Compiles source code, run tests, and produces packages that are ready to be deployed (by CodeDeploy for example) -* Benefits: - * Fully managed, serverless - * Continuously scalable & highly available - * Secure - * Pay-as-you-go pricing – only pay for the build time +- Code building service in the cloud (name is obvious) +- Compiles source code, run tests, and produces packages that are ready to be deployed (by CodeDeploy for example) +- Benefits: + - Fully managed, serverless + - Continuously scalable & highly available + - Secure + - Pay-as-you-go pricing – only pay for the build time ## AWS CodePipeline -* Orchestrate the different steps to have the code automatically pushed to production -* Code => Build => Test => Provision => Deploy -* Basis for CICD (Continuous Integration & Continuous Delivery) -* Benefits: - * Fully managed, compatible with CodeCommit, CodeBuild, CodeDeploy, Elastic Beanstalk, CloudFormation, GitHub, 3rd-party services (GitHub…) & custom plugins… - * Fast delivery & rapid updates +- Orchestrate the different steps to have the code automatically pushed to production +- Code => Build => Test => Provision => Deploy +- Basis for CICD (Continuous Integration & Continuous Delivery) +- Benefits: + - Fully managed, compatible with CodeCommit, CodeBuild, CodeDeploy, Elastic Beanstalk, CloudFormation, GitHub, 3rd-party services (GitHub…) & custom plugins… + - Fast delivery & rapid updates -* CodePipeline: orchestration layer - * CodeCommit => CodeBuild => CodeDeploy => Elastic Beanstalk +- CodePipeline: orchestration layer + - CodeCommit => CodeBuild => CodeDeploy => Elastic Beanstalk ## AWS CodeArtifact -* Software packages depend on each other to be built (also called code dependencies), and new ones are created -* Storing and retrieving these dependencies is called artifact management -* Traditionally you need to setup your own artifact management system -* CodeArtifact is a secure, scalable, and cost-effective artifact management for software development -* Works with common dependency management tools such as Maven, Gradle, npm, yarn, twine, pip, and NuGet -* Developers and CodeBuild can then retrieve dependencies straight from CodeArtifact +- Software packages depend on each other to be built (also called code dependencies), and new ones are created +- Storing and retrieving these dependencies is called artifact management +- Traditionally you need to setup your own artifact management system +- CodeArtifact is a secure, scalable, and cost-effective artifact management for software development +- Works with common dependency management tools such as Maven, Gradle, npm, yarn, twine, pip, and NuGet +- Developers and CodeBuild can then retrieve dependencies straight from CodeArtifact ## AWS CodeStar -* Unified UI to easily manage software development activities in one place -* “Quick way” to get started to correctly set-up CodeCommit, CodePipeline, CodeBuild, CodeDeploy, Elastic Beanstalk, EC2, etc… -* Can edit the code ”in-the-cloud” using AWS Cloud9 +- Unified UI to easily manage software development activities in one place +- “Quick way” to get started to correctly set-up CodeCommit, CodePipeline, CodeBuild, CodeDeploy, Elastic Beanstalk, EC2, etc… +- Can edit the code ”in-the-cloud” using AWS Cloud9 ## AWS Cloud9 -* AWS Cloud9 is a cloud IDE (Integrated Development Environment) for writing, running and debugging code -* “Classic” IDE (like IntelliJ, Visual Studio Code…) are downloaded on a computer before being used -* A cloud IDE can be used within a web browser, meaning you can work on your projects from your office, home, or anywhere with internet with no setup necessary -* AWS Cloud9 also allows for code collaboration in real-time (pair programming) +- AWS Cloud9 is a cloud IDE (Integrated Development Environment) for writing, running and debugging code +- “Classic” IDE (like IntelliJ, Visual Studio Code…) are downloaded on a computer before being used +- A cloud IDE can be used within a web browser, meaning you can work on your projects from your office, home, or anywhere with internet with no setup necessary +- AWS Cloud9 also allows for code collaboration in real-time (pair programming) ## AWS Systems Manager (SSM) -* Helps you manage your EC2 and On-Premises systems at scale -* Another Hybrid AWS service -* Get operational insights about the state of your infrastructure -* Suite of 10+ products -* Most important features are: - * Patching automation for enhanced compliance - * Run commands across an entire fleet of servers - * Store parameter configuration with the SSM Parameter Store -* Works for both Windows and Linux OS +- Helps you manage your EC2 and On-Premises systems at scale +- Another Hybrid AWS service +- Get operational insights about the state of your infrastructure +- Suite of 10+ products +- Most important features are: + - Patching automation for enhanced compliance + - Run commands across an entire fleet of servers + - Store parameter configuration with the SSM Parameter Store +- Works for both Windows and Linux OS ### How Systems Manager works -* We need to install the SSM agent onto the systems we control -* Installed by default on Amazon Linux AMI & some Ubuntu AMI -* If an instance can’t be controlled with SSM, it’s probably an issue with the SSM agent! -* Thanks to the SSM agent, we can run commands, patch & configure our servers +- We need to install the SSM agent onto the systems we control +- Installed by default on Amazon Linux AMI & some Ubuntu AMI +- If an instance can’t be controlled with SSM, it’s probably an issue with the SSM agent! +- Thanks to the SSM agent, we can run commands, patch & configure our servers -### Systems Manager – SSM Session Manager +### Systems Manager - SSM Session Manager -* Allows you to start a secure shell on your EC2 and on-premises servers -* No SSH access, bastion hosts, or SSH keys needed -* No port 22 needed (better security) -* Supports Linux, macOS, and Windows -* Send session log data to S3 or CloudWatch Logs +- Allows you to start a secure shell on your EC2 and on-premises servers +- No SSH access, bastion hosts, or SSH keys needed +- No port 22 needed (better security) +- Supports Linux, macOS, and Windows +- Send session log data to S3 or CloudWatch Logs ## AWS OpsWorks -* Chef & Puppet help you perform server configuration automatically, or repetitive actions -* They work great with EC2 & On-Premises VM -* AWS OpsWorks = Managed Chef & Puppet -* It’s an alternative to AWS SSM -* Only provision standard AWS resources: - * EC2 Instances, Databases, Load Balancers, EBS volumes… -* **Chef or Puppet needed => AWS OpsWorks** +- Chef & Puppet help you perform server configuration automatically, or repetitive actions +- They work great with EC2 & On-Premises VM +- AWS OpsWorks = Managed Chef & Puppet +- It’s an alternative to AWS SSM +- Only provision standard AWS resources: + - EC2 Instances, Databases, Load Balancers, EBS volumes… +- **Chef or Puppet needed => AWS OpsWorks** ## Deployment - Summary -* CloudFormation: (AWS only) - * Infrastructure as Code, works with almost all of AWS resources - * Repeat across Regions & Accounts -* Beanstalk: (AWS only) - * Platform as a Service (PaaS), limited to certain programming languages or Docker - * Deploy code consistently with a known architecture: ex, ALB + EC2 + RDS -* CodeDeploy (hybrid): deploy & upgrade any application onto servers -* Systems Manager (hybrid): patch, configure and run commands at scale -* OpsWorks (hybrid): managed Chef and Puppet in AWS +- CloudFormation: (AWS only) + - Infrastructure as Code, works with almost all of AWS resources + - Repeat across Regions & Accounts +- Beanstalk: (AWS only) + - Platform as a Service (PaaS), limited to certain programming languages or Docker + - Deploy code consistently with a known architecture: ex, ALB + EC2 + RDS +- CodeDeploy (hybrid): deploy & upgrade any application onto servers +- Systems Manager (hybrid): patch, configure and run commands at scale +- OpsWorks (hybrid): managed Chef and Puppet in AWS ## Developer Services - Summary -* CodeCommit: Store code in private git repository (version controlled) -* CodeBuild: Build & test code in AWS -* CodeDeploy: Deploy code onto servers -* CodePipeline: Orchestration of pipeline (from code to build to deploy) -* CodeArtifact: Store software packages / dependencies on AWS -* CodeStar: Unified view for allowing developers to do CICD and code -* Cloud9: Cloud IDE (Integrated Development Environment) with collab -* AWS CDK: Define your cloud infrastructure using a programming language +- CodeCommit: Store code in private git repository (version controlled) +- CodeBuild: Build & test code in AWS +- CodeDeploy: Deploy code onto servers +- CodePipeline: Orchestration of pipeline (from code to build to deploy) +- CodeArtifact: Store software packages / dependencies on AWS +- CodeStar: Unified view for allowing developers to do CICD and code +- Cloud9: Cloud IDE (Integrated Development Environment) with collab +- AWS CDK: Define your cloud infrastructure using a programming language diff --git a/sections/ec2_storage.md b/sections/ec2_storage.md index 8f5ec80..9c6dd14 100644 --- a/sections/ec2_storage.md +++ b/sections/ec2_storage.md @@ -1,136 +1,154 @@ # EC2 Instance Storage -* [EBS volumes](#ebs-volume) -* [EFS: network file system, can be attached to 100s of instances in a region](#efs-elastic-file-system) -* [EFS-IA: cost-optimized storage class for infrequent accessed files](#efs-infrequent-access-efs-ia) -* [FSx for Windows: Network File System for Windows servers](#amazon-fsx-for-windows-file-server) -* [FSx for Lustre: High Performance Computing Linux file system](#amazon-fsx-for-lustre) +- [EC2 Instance Storage](#ec2-instance-storage) + - [EBS Volumes](#ebs-volumes) + - [What’s an EBS Volume?](#whats-an-ebs-volume) + - [EBS Volume](#ebs-volume) + - [EBS – Delete on Termination attribute](#ebs--delete-on-termination-attribute) + - [EBS Snapshots](#ebs-snapshots) + - [EBS Snapshots Features](#ebs-snapshots-features) + - [EFS: Elastic File System](#efs-elastic-file-system) + - [EFS Infrequent Access (EFS-IA)](#efs-infrequent-access-efs-ia) + - [Amazon FSx – Overview](#amazon-fsx--overview) + - [Amazon FSx for Windows File Server](#amazon-fsx-for-windows-file-server) + - [Amazon FSx for Lustre](#amazon-fsx-for-lustre) + - [EC2 Instance Store](#ec2-instance-store) + - [Shared Responsibility Model for EC2 Storage](#shared-responsibility-model-for-ec2-storage) + - [AMI Overview](#ami-overview) + - [AMI Process (from an EC2 instance)](#ami-process-from-an-ec2-instance) + - [EC2 Image Builder](#ec2-image-builder) + +- EBS: Elastic Block Store, Volume is a network drive you can attach to your instances while they run +- EFS: network file system, can be attached to 100s of instances in a region +- EFS-IA: cost-optimized storage class for infrequent accessed files +- FSx for Windows: Network File System for Windows servers +- FSx for Lustre: High Performance Computing Linux file system ## EBS Volumes ### What’s an EBS Volume? -* An EBS (Elastic Block Store) Volume is a network drive you can attach to your instances while they run -* It allows your instances to persist data, even after their termination -* They can only be mounted to one instance at a time (at the CCP level) -* They are bound to a specific availability zone -* Analogy: Think of them as a “network USB stick” -* Free tier: 30 GB of free EBS storage of type General Purpose (SSD) or Magnetic per month +- An EBS (Elastic Block Store) Volume is a network drive you can attach to your instances while they run +- It allows your instances to persist data, even after their termination +- They can only be mounted to one instance at a time (at the CCP level) +- They are bound to a specific availability zone +- Analogy: Think of them as a “network USB stick” +- Free tier: 30 GB of free EBS storage of type General Purpose (SSD) or Magnetic per month ### EBS Volume -* It’s a network drive (i.e. not a physical drive) - * It uses the network to communicate the instance, which means there might be a bit of latency - * It can be detached from an EC2 instance and attached to another one quickly -* It’s locked to an Availability Zone (AZ) - * An EBS Volume in us-east-1a cannot be attached to us-east-1b - * To move a volume across, you first need to snapshot it -* Have a provisioned capacity (size in GBs, and IOPS) - * You get billed for all the provisioned capacity - * You can increase the capacity of the drive over time +- It’s a network drive (i.e. not a physical drive) + - It uses the network to communicate the instance, which means there might be a bit of latency + - It can be detached from an EC2 instance and attached to another one quickly +- It’s locked to an Availability Zone (AZ) + - An EBS Volume in us-east-1a cannot be attached to us-east-1b + - To move a volume across, you first need to snapshot it +- Have a provisioned capacity (size in GBs, and IOPS) + - You get billed for all the provisioned capacity + - You can increase the capacity of the drive over time ### EBS – Delete on Termination attribute -* Controls the EBS behaviour when an EC2 instance terminates - * By default, the root EBS volume is deleted (attribute enabled) - * By default, any other attached EBS volume is not deleted (attribute disabled) -* This can be controlled by the AWS console / AWS CLI -* Use case: preserve root volume when instance is terminated +- Controls the EBS behaviour when an EC2 instance terminates + - By default, the root EBS volume is deleted (attribute enabled) + - By default, any other attached EBS volume is not deleted (attribute disabled) +- This can be controlled by the AWS console / AWS CLI +- Use case: preserve root volume when instance is terminated ### EBS Snapshots -* Make a backup (snapshot) of your EBS volume at a point in time -* Not necessary to detach volume to do snapshot, but recommended -* Can copy snapshots across AZ or Region +- Make a backup (snapshot) of your EBS volume at a point in time +- Not necessary to detach volume to do snapshot, but recommended +- Can copy snapshots across AZ or Region ### EBS Snapshots Features -* EBS Snapshot Archive - * Move a Snapshot to an ”archive tier” that is 75% cheaper - * Takes within 24 to 72 hours for restoring the archive -* Recycle Bin for EBS Snapshots - * Setup rules to retain deleted snapshots so you can recover them after an accidental deletion - * Specify retention (from 1 day to 1 year) +- EBS Snapshot Archive + - Move a Snapshot to an ”archive tier” that is 75% cheaper + - Takes within 24 to 72 hours for restoring the archive +- Recycle Bin for EBS Snapshots + - Setup rules to retain deleted snapshots so you can recover them after an accidental deletion + - Specify retention (from 1 day to 1 year) ## EFS: Elastic File System -* Managed NFS (network file system) that can be mounted on 100s of EC2 -* EFS works with Linux EC2 instances in multi-AZ -* Highly available, scalable, expensive (3x gp2), pay per use, no capacity planning +- Managed NFS (network file system) that can be mounted on 100s of EC2 +- EFS works with Linux EC2 instances in multi-AZ +- Highly available, scalable, expensive (3x gp2), pay per use, no capacity planning ## EFS Infrequent Access (EFS-IA) -* Storage class that is cost-optimized for files not accessed every day -* Up to 92% lower cost compared to EFS Standard -* EFS will automatically move your files to EFS-IA based on the last time they were accessed -* Enable EFS-IA with a Lifecycle Policy -* Example: move files that are not accessed for 60 days to EFS-IA -* Transparent to the applications accessing EFS +- Storage class that is cost-optimized for files not accessed every day +- Up to 92% lower cost compared to EFS Standard +- EFS will automatically move your files to EFS-IA based on the last time they were accessed +- Enable EFS-IA with a Lifecycle Policy +- Example: move files that are not accessed for 60 days to EFS-IA +- Transparent to the applications accessing EFS ## Amazon FSx – Overview -* Launch 3rd party high-performance file systems on AWS -* Fully managed service - * FSx for Lustre - * FSx for Windows File Server - * FSx for NetApp ONTAP +- Launch 3rd party high-performance file systems on AWS +- Fully managed service + - FSx for Lustre + - FSx for Windows File Server + - FSx for NetApp ONTAP ### Amazon FSx for Windows File Server -* A fully managed, highly reliable, and scalable Windows native shared file system -* Built on Windows File Server -* Supports SMB protocol & Windows NTFS -* Integrated with Microsoft Active Directory -* Can be accessed from AWS or your on-premise infrastructure +- A fully managed, highly reliable, and scalable Windows native shared file system +- Built on Windows File Server +- Supports SMB protocol & Windows NTFS +- Integrated with Microsoft Active Directory +- Can be accessed from AWS or your on-premise infrastructure ### Amazon FSx for Lustre -* A fully managed, high-performance, scalable file storage for High Performance Computing (HPC) -* The name Lustre is derived from “Linux” and “cluster” -* Machine Learning, Analytics, Video Processing, Financial Modeling -* Scales up to 100s GB/s, millions of IOPS, sub-ms latencies +- A fully managed, high-performance, scalable file storage for High Performance Computing (HPC) +- The name Lustre is derived from “Linux” and “cluster” +- Machine Learning, Analytics, Video Processing, Financial Modeling +- Scales up to 100s GB/s, millions of IOPS, sub-ms latencies ## EC2 Instance Store -* EBS volumes are network drives with good but “limited” performance -* If you need a high-performance hardware disk, use EC2 Instance Store -* Better I/O performance -* EC2 Instance Store lose their storage if they’re stopped (ephemeral) -* Good for buffer / cache / scratch data / temporary content -* Risk of data loss if hardware fails -* Backups and Replication are your responsibility +- EBS volumes are network drives with good but “limited” performance +- If you need a high-performance hardware disk, use EC2 Instance Store +- Better I/O performance +- EC2 Instance Store lose their storage if they’re stopped (ephemeral) +- Good for buffer / cache / scratch data / temporary content +- Risk of data loss if hardware fails +- Backups and Replication are your responsibility ## Shared Responsibility Model for EC2 Storage -AWS | USER ----- | ---- -Infrastructure | Setting up backup / snapshot procedures -Replication for data for EBS volumes & EFS drives | Setting up data encryption -Replacing faulty hardware | Responsibility of any data on the drives -Ensuring their employees cannot access your data | Understanding the risk of using EC2 Instance Store +| AWS | USER | +| ------------------------------------------------- | -------------------------------------------------- | +| Infrastructure | Setting up backup / snapshot procedures | +| Replication for data for EBS volumes & EFS drives | Setting up data encryption | +| Replacing faulty hardware | Responsibility of any data on the drives | +| Ensuring their employees cannot access your data | Understanding the risk of using EC2 Instance Store | ## AMI Overview -* AMI = Amazon Machine Image -* AMI are a customization of an EC2 instance - * You add your own software, configuration, operating system, monitoring… - * Faster boot / configuration time because all your software is pre-packaged -* AMI are built for a specific region (and can be copied across regions) -* You can launch EC2 instances from: - * A Public AMI: AWS provided - * Your own AMI: you make and maintain them yourself - * An AWS Marketplace AMI: an AMI someone else made (and potentially sells) +- AMI = Amazon Machine Image +- AMI are a customization of an EC2 instance + - You add your own software, configuration, operating system, monitoring… + - Faster boot / configuration time because all your software is pre-packaged +- AMI are built for a specific region (and can be copied across regions) +- You can launch EC2 instances from: + - A Public AMI: AWS provided + - Your own AMI: you make and maintain them yourself + - An AWS Marketplace AMI: an AMI someone else made (and potentially sells) ### AMI Process (from an EC2 instance) -* Start an EC2 instance and customize it -* Stop the instance (for data integrity) -* Build an AMI – this will also create EBS snapshots -* Launch instances from other AMIs +- Start an EC2 instance and customize it +- Stop the instance (for data integrity) +- Build an AMI – this will also create EBS snapshots +- Launch instances from other AMIs ## EC2 Image Builder -* Used to automate the creation of Virtual Machines or container images -* => Automate the creation, maintain, validate and test EC2 AMIs -* Can be run on a schedule (weekly, whenever packages are updated, etc…) -* Free service (only pay for the underlying resources) +- Used to automate the creation of Virtual Machines or container images +- => Automate the creation, maintain, validate and test EC2 AMIs +- Can be run on a schedule (weekly, whenever packages are updated, etc…) +- Free service (only pay for the underlying resources) diff --git a/sections/other_compute.md b/sections/other_compute.md index 29f82a0..de010b4 100644 --- a/sections/other_compute.md +++ b/sections/other_compute.md @@ -1,173 +1,192 @@ # Other Compute -What is Docker? +- [Other Compute](#other-compute) + - [What is Docker?](#what-is-docker) + - [Where Docker images are stored?](#where-docker-images-are-stored) + - [Docker versus Virtual Machines](#docker-versus-virtual-machines) + - [ECS](#ecs) + - [Fargate](#fargate) + - [ECR](#ecr) + - [What’s serverless?](#whats-serverless) + - [Why AWS Lambda ?](#why-aws-lambda-) + - [Benefits of AWS Lambda](#benefits-of-aws-lambda) + - [AWS Lambda language support](#aws-lambda-language-support) + - [AWS Lambda Pricing: example](#aws-lambda-pricing-example) + - [Amazon API Gateway](#amazon-api-gateway) + - [AWS Batch](#aws-batch) + - [Batch vs Lambda](#batch-vs-lambda) + - [Amazon Lightsail](#amazon-lightsail) + - [Lambda Summary](#lambda-summary) + - [Other Compute Summary](#other-compute-summary) -* Docker is a software development platform to deploy apps -* Apps are packaged in containers that can be run on any OS -* Apps run the same, regardless of where they’re run - * Any machine - * No compatibility issues - * Predictable behavior - * Less work - * Easier to maintain and deploy - * Works with any language, any OS, any technology -* Scale containers up and down very quickly (seconds) +## What is Docker? -Where Docker images are stored? +- Docker is a software development platform to deploy apps +- Apps are packaged in containers that can be run on any OS +- Apps run the same, regardless of where they’re run + - Any machine + - No compatibility issues + - Predictable behavior + - Less work + - Easier to maintain and deploy + - Works with any language, any OS, any technology +- Scale containers up and down very quickly (seconds) -* Docker images are stored in Docker Repositories -* Public: Docker Hub - * Find base images for many technologies or OS: - * Ubuntu - * MySQL - * NodeJS, Java… -* Private: Amazon ECR (Elastic Container Registry) +### Where Docker images are stored? -## Docker versus Virtual Machines +- Docker images are stored in Docker Repositories +- Public: Docker Hub + - Find base images for many technologies or OS: + - Ubuntu + - MySQL + - NodeJS, Java… +- Private: Amazon ECR (Elastic Container Registry) -* Docker is ”sort of” a virtualization technology, but not exactly -* Resources are shared with the host => many containers on one server +### Docker versus Virtual Machines + +- Docker is ”sort of” a virtualization technology, but not exactly +- Resources are shared with the host => many containers on one server ## ECS -* ECS = Elastic Container Service -* Launch Docker containers on AWS -* You must provision & maintain the infrastructure (the EC2 instances) -* AWS takes care of starting / stopping containers -* Has integrations with the Application Load Balancer +- ECS = Elastic Container Service +- Launch Docker containers on AWS +- You must provision & maintain the infrastructure (the EC2 instances) +- AWS takes care of starting / stopping containers +- Has integrations with the Application Load Balancer ## Fargate -* Launch Docker containers on AWS -* You do not provision the infrastructure (no EC2 instances to manage) – simpler! -* Serverless offering -* AWS just runs containers for you based on the CPU / RAM you need +- Launch Docker containers on AWS +- You do not provision the infrastructure (no EC2 instances to manage) – simpler! +- Serverless offering +- AWS just runs containers for you based on the CPU / RAM you need ## ECR -* Elastic Container Registry -* Private Docker Registry on AWS -* This is where you store your Docker images so they can be run by ECS or Fargate +- Elastic Container Registry +- Private Docker Registry on AWS +- This is where you store your Docker images so they can be run by ECS or Fargate ## What’s serverless? -* Serverless is a new paradigm in which the developers don’t have to manage servers anymore… -* They just deploy code -* They just deploy… functions ! -* Initially... Serverless == FaaS (Function as a Service) -* Serverless was pioneered by AWS Lambda but now also includes anything that’s managed: “databases, messaging, storage, etc.” -* Serverless does not mean there are no servers… -* it means you just don’t manage / provision / see them +- Serverless is a new paradigm in which the developers don’t have to manage servers anymore… +- They just deploy code +- They just deploy… functions ! +- Initially... Serverless == FaaS (Function as a Service) +- Serverless was pioneered by AWS Lambda but now also includes anything that’s managed: “databases, messaging, storage, etc.” +- Serverless does not mean there are no servers… +- it means you just don’t manage / provision / see them ## Why AWS Lambda ? -EC2 | Lambda ----- | ---- -Virtual Servers in the Cloud | Virtual functions – no servers to manage! -Limited by RAM and CPU | Limited by time - short executions -Continuously running | Run on-demand -Scaling means intervention to add / remove servers | Scaling is automated! +| EC2 | Lambda | +| -------------------------------------------------- | ----------------------------------------- | +| Virtual Servers in the Cloud | Virtual functions – no servers to manage! | +| Limited by RAM and CPU | Limited by time - short executions | +| Continuously running | Run on-demand | +| Scaling means intervention to add / remove servers | Scaling is automated! | -## Benefits of AWS Lambda +### Benefits of AWS Lambda -* Easy Pricing: - * Pay per request and compute time - * Free tier of 1,000,000 AWS Lambda requests and 400,000 GBs of compute time -* Integrated with the whole AWS suite of services -* Event-Driven: functions get invoked by AWS when needed -* Integrated with many programming languages -* Easy monitoring through AWS CloudWatch -* Easy to get more resources per functions (up to 10GB of RAM!) -* Increasing RAM will also improve CPU and network! +- Easy Pricing: + - Pay per request and compute time + - Free tier of 1,000,000 AWS Lambda requests and 400,000 GBs of compute time +- Integrated with the whole AWS suite of services +- Event-Driven: functions get invoked by AWS when needed +- Integrated with many programming languages +- Easy monitoring through AWS CloudWatch +- Easy to get more resources per functions (up to 10GB of RAM!) +- Increasing RAM will also improve CPU and network! -## AWS Lambda language support +### AWS Lambda language support -* Node.js (JavaScript) -* Python -* Java (Java 8 compatible) -* C# (.NET Core) -* Golang -* C# / Powershell -* Ruby -* Custom Runtime API (community supported, example Rust) -* Lambda Container Image - * The container image must implement the Lambda Runtime API - * ECS / Fargate is preferred for running arbitrary Docker images +- Node.js (JavaScript) +- Python +- Java (Java 8 compatible) +- C# (.NET Core) +- Golang +- C# / Powershell +- Ruby +- Custom Runtime API (community supported, example Rust) +- Lambda Container Image + - The container image must implement the Lambda Runtime API + - ECS / Fargate is preferred for running arbitrary Docker images -## AWS Lambda Pricing: example +### AWS Lambda Pricing: example -* You can find overall pricing information here: -* Pay per calls: - * First 1,000,000 requests are free - * $0.20 per 1 million requests thereafter ($0.0000002 per request) -* Pay per duration: (in increment of 1 ms) - * 400,000 GB-seconds of compute time per month for FREE - * == 400,000 seconds if function is 1GB RAM - * == 3,200,000 seconds if function is 128 MB RAM - * After that $1.00 for 600,000 GB-seconds -* It is usually **very cheap** to run AWS Lambda so it’s **very popular** +- You can find overall pricing information here: +- Pay per calls: + - First 1,000,000 requests are free + - $0.20 per 1 million requests thereafter ($0.0000002 per request) +- Pay per duration: (in increment of 1 ms) + - 400,000 GB-seconds of compute time per month for FREE + - == 400,000 seconds if function is 1GB RAM + - == 3,200,000 seconds if function is 128 MB RAM + - After that $1.00 for 600,000 GB-seconds +- It is usually **very cheap** to run AWS Lambda so it’s **very popular** ## Amazon API Gateway -* Example: building a serverless API -* Fully managed service for developers to easily create, publish, maintain, monitor, and secure APIs -* Serverless and scalable -* Supports RESTful APIs and WebSocket APIs -* Support for security, user authentication, API throttling, API keys, monitoring. +- Example: building a serverless API +- Fully managed service for developers to easily create, publish, maintain, monitor, and secure APIs +- Serverless and scalable +- Supports RESTful APIs and WebSocket APIs +- Support for security, user authentication, API throttling, API keys, monitoring. ## AWS Batch -* Fully managed batch processing at any scale -* Efficiently run 100,000s of computing batch jobs on AWS -* A “batch” job is a job with a start and an end (opposed to continuous) -* Batch will dynamically launch EC2 instances or Spot Instances -* AWS Batch provisions the right amount of compute / memory -* You submit or schedule batch jobs and AWS Batch does the rest! -* Batch jobs are defined as Docker images and run on ECS -* Helpful for cost optimizations and focusing less on the infrastructure +- Fully managed batch processing at any scale +- Efficiently run 100,000s of computing batch jobs on AWS +- A “batch” job is a job with a start and an end (opposed to continuous) +- Batch will dynamically launch EC2 instances or Spot Instances +- AWS Batch provisions the right amount of compute / memory +- You submit or schedule batch jobs and AWS Batch does the rest! +- Batch jobs are defined as Docker images and run on ECS +- Helpful for cost optimizations and focusing less on the infrastructure ## Batch vs Lambda -Batch | Lambda ----- | ---- -No time limit | Time limit -Any runtime as long as it’s packaged as a Docker image | Limited runtime -Rely on EBS / instance store for disk space | Limited temporary disk space -Relies on EC2 (can be managed by AWS) | Serverless +| Batch | Lambda | +| ------------------------------------------------------ | ---------------------------- | +| No time limit | Time limit | +| Any runtime as long as it’s packaged as a Docker image | Limited runtime | +| Rely on EBS / instance store for disk space | Limited temporary disk space | +| Relies on EC2 (can be managed by AWS) | Serverless | ## Amazon Lightsail -* Virtual servers, storage, databases, and networking -* Low & predictable pricing -* Simpler alternative to using EC2, RDS, ELB, EBS, Route 53… -* Great for people with little cloud experience! -* Can setup notifications and monitoring of your Lightsail resources -* Use cases: - * Simple web applications (has templates for LAMP, Nginx, MEAN, Node.js…) - * Websites (templates for WordPress, Magento, Plesk, Joomla) - * Dev / Test environment -* Has high availability but no auto-scaling, limited AWS integrations +- Virtual servers, storage, databases, and networking +- Low & predictable pricing +- Simpler alternative to using EC2, RDS, ELB, EBS, Route 53… +- Great for people with little cloud experience! +- Can setup notifications and monitoring of your Lightsail resources +- Use cases: + - Simple web applications (has templates for LAMP, Nginx, MEAN, Node.js…) + - Websites (templates for WordPress, Magento, Plesk, Joomla) + - Dev / Test environment +- Has high availability but no auto-scaling, limited AWS integrations ## Lambda Summary -* Lambda is Serverless, Function as a Service, seamless scaling, reactive -* Lambda Billing: - * By the time run x by the RAM provisioned - * By the number of invocations -* Language Support: many programming languages except (arbitrary) Docker -* Invocation time: up to 15 minutes -* Use cases: - * Create Thumbnails for images uploaded onto S3 - * Run a Serverless cron job -* API Gateway: expose Lambda functions as HTTP API +- Lambda is Serverless, Function as a Service, seamless scaling, reactive +- Lambda Billing: + - By the time run x by the RAM provisioned + - By the number of invocations +- Language Support: many programming languages except (arbitrary) Docker +- Invocation time: up to 15 minutes +- Use cases: + - Create Thumbnails for images uploaded onto S3 + - Run a Serverless cron job +- API Gateway: expose Lambda functions as HTTP API ## Other Compute Summary -* Docker: container technology to run applications -* ECS: run Docker containers on EC2 instances -* Fargate: -* Run Docker containers without provisioning the infrastructure -* Serverless offering (no EC2 instances) -* ECR: Private Docker Images Repository -* Batch: run batch jobs on AWS across managed EC2 instances -* Lightsail: predictable & low pricing for simple application & DB stacks +- Docker: container technology to run applications +- ECS: run Docker containers on EC2 instances +- Fargate: +- Run Docker containers without provisioning the infrastructure +- Serverless offering (no EC2 instances) +- ECR: Private Docker Images Repository +- Batch: run batch jobs on AWS across managed EC2 instances +- Lightsail: predictable & low pricing for simple application & DB stacks diff --git a/sections/s3.md b/sections/s3.md index 7e8d994..4eabec1 100644 --- a/sections/s3.md +++ b/sections/s3.md @@ -1,71 +1,109 @@ # Amazon S3 +- [Amazon S3](#amazon-s3) + - [S3 Use cases](#s3-use-cases) + - [Amazon S3 Overview - Buckets](#amazon-s3-overview---buckets) + - [Amazon S3 Overview - Objects](#amazon-s3-overview---objects) + - [S3 Security](#s3-security) + - [S3 Bucket Policies](#s3-bucket-policies) + - [Bucket settings for Block Public Access](#bucket-settings-for-block-public-access) + - [S3 Websites](#s3-websites) + - [S3 - Versioning](#s3---versioning) + - [S3 Access Logs](#s3-access-logs) + - [S3 Replication (CRR & SRR)](#s3-replication-crr--srr) + - [S3 Storage Classes](#s3-storage-classes) + - [S3 Durability and Availability](#s3-durability-and-availability) + - [S3 Standard General Purpose](#s3-standard-general-purpose) + - [S3 Storage Classes - Infrequent Access](#s3-storage-classes---infrequent-access) + - [S3 Standard Infrequent Access (S3 Standard-IA)](#s3-standard-infrequent-access-s3-standard-ia) + - [S3 One Zone Infrequent Access (S3 One Zone-IA)](#s3-one-zone-infrequent-access-s3-one-zone-ia) + - [Amazon S3 Glacier Storage Classes](#amazon-s3-glacier-storage-classes) + - [Amazon S3 Glacier Instant Retrieval](#amazon-s3-glacier-instant-retrieval) + - [Amazon S3 Glacier Flexible Retrieval (formerly Amazon S3 Glacier)](#amazon-s3-glacier-flexible-retrieval-formerly-amazon-s3-glacier) + - [Amazon S3 Glacier Deep Archive - for long term storage](#amazon-s3-glacier-deep-archive---for-long-term-storage) + - [S3 Intelligent-Tiering](#s3-intelligent-tiering) + - [S3 Object Lock & Glacier Vault Lock](#s3-object-lock--glacier-vault-lock) + - [Shared Responsibility Model for S3](#shared-responsibility-model-for-s3) + - [AWS Snow Family](#aws-snow-family) + - [Data Migrations with AWS Snow Family](#data-migrations-with-aws-snow-family) + - [Time to Transfer](#time-to-transfer) + - [Snowball Edge (for data transfers)](#snowball-edge-for-data-transfers) + - [AWS Snowcone](#aws-snowcone) + - [AWS Snowmobile](#aws-snowmobile) + - [Snow Family - Usage Process](#snow-family---usage-process) + - [What is Edge Computing?](#what-is-edge-computing) + - [Snow Family - Edge Computing](#snow-family---edge-computing) + - [AWS OpsHub](#aws-opshub) + - [Hybrid Cloud for Storage](#hybrid-cloud-for-storage) + - [AWS Storage Gateway](#aws-storage-gateway) + - [Amazon S3 - Summary](#amazon-s3---summary) + ## S3 Use cases -* Backup and storage -* Disaster Recovery -* Archive -* Hybrid Cloud storage -* Application hosting -* Media hosting -* Data lakes & big data analytics -* Software delivery -* Static website +- Backup and storage +- Disaster Recovery +- Archive +- Hybrid Cloud storage +- Application hosting +- Media hosting +- Data lakes & big data analytics +- Software delivery +- Static website ## Amazon S3 Overview - Buckets -* Amazon S3 allows people to store objects (files) in “buckets” (directories) -* Buckets must have a globally unique name (across all regions all accounts) -* Buckets are defined at the region level -* S3 looks like a global service but buckets are created in a region -* Naming convention - * No uppercase - * No underscore - * 3-63 characters long - * Not an IP - * Must start with lowercase letter or number +- Amazon S3 allows people to store objects (files) in “buckets” (directories) +- Buckets must have a globally unique name (across all regions all accounts) +- Buckets are defined at the region level +- S3 looks like a global service but buckets are created in a region +- Naming convention + - No uppercase + - No underscore + - 3-63 characters long + - Not an IP + - Must start with lowercase letter or number ## Amazon S3 Overview - Objects -* Objects (files) have a Key -* The key is the FULL path: - * s3://my-bucket/my_file.txt - * s3://my-bucket/my_folder1/another_folder/my_file.txt -* The key is composed of **prefix** + **object name** - * s3://my-bucket/my_folder1/another_folder/my_file.txt -* There’s no concept of “directories” within buckets (although the UI will trick you to think otherwise) -* Just keys with very long names that contain slashes (“/”) -* Object values are the content of the body: - * Max Object Size is 5TB (5000GB) - * If uploading more than 5GB, must use “multi-part upload” -* Metadata (list of text key / value pairs – system or user metadata) - * Tags (Unicode key / value pair – up to 10) – useful for security / lifecycle - * Version ID (if versioning is enabled) +- Objects (files) have a Key +- The key is the FULL path: + - s3://my-bucket/my_file.txt + - s3://my-bucket/my_folder1/another_folder/my_file.txt +- The key is composed of **prefix** + **object name** + - s3://my-bucket/my_folder1/another_folder/my_file.txt +- There’s no concept of “directories” within buckets (although the UI will trick you to think otherwise) +- Just keys with very long names that contain slashes (“/”) +- Object values are the content of the body: + - Max Object Size is 5TB (5000GB) + - If uploading more than 5GB, must use “multi-part upload” +- Metadata (list of text key / value pairs – system or user metadata) + - Tags (Unicode key / value pair – up to 10) – useful for security / lifecycle + - Version ID (if versioning is enabled) ## S3 Security -* **User based** - * IAM policies - which API calls should be allowed for a specific user from IAM console -* **Resource Based** - * Bucket Policies - bucket wide rules from the S3 console - allows cross account - * Object Access Control List (ACL) – finer grain - * Bucket Access Control List (ACL) – less common -* **Note:** an IAM principal can access an S3 object if - * the user IAM permissions allow it OR the resource policy ALLOWS it - * AND there’s no explicit DENY -* **Encryption:** encrypt objects in Amazon S3 using encryption keys +- **User based** + - IAM policies - which API calls should be allowed for a specific user from IAM console +- **Resource Based** + - Bucket Policies - bucket wide rules from the S3 console - allows cross account + - Object Access Control List (ACL) – finer grain + - Bucket Access Control List (ACL) – less common +- **Note:** an IAM principal can access an S3 object if + - the user IAM permissions allow it OR the resource policy ALLOWS it + - AND there’s no explicit DENY +- **Encryption:** encrypt objects in Amazon S3 using encryption keys -S3 Bucket Policies +## S3 Bucket Policies -* JSON based policies - * Resources: buckets and objects - * Actions: Set of API to Allow or Deny - * Effect: Allow / Deny +- JSON based policies + - Resources: buckets and objects + - Actions: Set of API to Allow or Deny + - Effect: Allow / Deny Principal: The account or user to apply the policy to -* Use S3 bucket for policy to: - * Grant public access to the bucket - * Force objects to be encrypted at upload - * Grant access to another account (Cross Account) +- Use S3 bucket for policy to: + - Grant public access to the bucket + - Force objects to be encrypted at upload + - Grant access to another account (Cross Account) ```json { @@ -88,215 +126,216 @@ S3 Bucket Policies ## Bucket settings for Block Public Access -* Block all public access: On - * Block public access to buckets and objects granted through new access control lists (ACLS): On - * Block public access to buckets and objects granted through any access control lists (ACLS): On - * Block public access to buckets and objects granted through new public bucket or access point policies: On - * Block public and cross-account access to buckets and objects through any public bucket or access point policies: On +- Block all public access: On + - Block public access to buckets and objects granted through new access control lists (ACLS): On + - Block public access to buckets and objects granted through any access control lists (ACLS): On + - Block public access to buckets and objects granted through new public bucket or access point policies: On + - Block public and cross-account access to buckets and objects through any public bucket or access point policies: On -* These settings were created to prevent company data leaks -* If you know your bucket should never be public, leave these on -* Can be set at the account level +- These settings were created to prevent company data leaks +- If you know your bucket should never be public, leave these on +- Can be set at the account level ## S3 Websites -* S3 can host static websites and have them accessible on the www -* The website URL will be: -* bucket-name.s3-website-AWS-region.amazonaws.com +- S3 can host static websites and have them accessible on the www +- The website URL will be: +- bucket-name.s3-website-AWS-region.amazonaws.com OR -* bucket-name.s3-website.AWS-region.amazonaws.com -* **If you get a 403 (Forbidden) error, make sure the bucket policy allows public reads!** +- bucket-name.s3-website.AWS-region.amazonaws.com +- **If you get a 403 (Forbidden) error, make sure the bucket policy allows public reads!** -## S3 -Versioning +## S3 - Versioning -* You can version your files in Amazon S3 -* It is enabled at the bucket level -* Same key overwrite will increment the “version”: 1, 2, 3…. -* It is best practice to version your buckets - * Protect against unintended deletes (ability to restore a version) - * Easy roll back to previous version -* Notes: - * Any file that is not versioned prior to enabling versioning will have version “null” - * Suspending versioning does not delete the previous versions +- You can version your files in Amazon S3 +- It is enabled at the bucket level +- Same key overwrite will increment the “version”: 1, 2, 3…. +- It is best practice to version your buckets + - Protect against unintended deletes (ability to restore a version) + - Easy roll back to previous version +- Notes: + - Any file that is not versioned prior to enabling versioning will have version “null” + - Suspending versioning does not delete the previous versions ## S3 Access Logs -* For audit purpose, you may want to log all access to S3 buckets -* Any request made to S3, from any account, authorized or denied, will be logged into another S3 bucket -* That data can be analyzed using data analysis tools… -* Very helpful to come down to the root cause of an issue, or audit usage, view suspicious patterns, etc… +- For audit purpose, you may want to log all access to S3 buckets +- Any request made to S3, from any account, authorized or denied, will be logged into another S3 bucket +- That data can be analyzed using data analysis tools… +- Very helpful to come down to the root cause of an issue, or audit usage, view suspicious patterns, etc… ## S3 Replication (CRR & SRR) -* Must enable versioning in source and destination -* Cross Region Replication (CRR) -* Same Region Replication (SRR) -* Buckets can be in different accounts -* Copying is asynchronous -* Must give proper IAM permissions to S3 -* CRR - Use cases: compliance, lower latency access, replication across accounts -* SRR – Use cases: log aggregation, live replication between production and test accounts +- Must enable versioning in source and destination +- Cross Region Replication (CRR) +- Same Region Replication (SRR) +- Buckets can be in different accounts +- Copying is asynchronous +- Must give proper IAM permissions to S3 +- CRR - Use cases: compliance, lower latency access, replication across accounts +- SRR – Use cases: log aggregation, live replication between production and test accounts ## S3 Storage Classes -* [Amazon S3 Standard - General Purpose](#s3-standard-general-purpose) -* [Amazon S3 Standard - Infrequent Access (IA)](#s3-standard-infrequent-access-s3-standard-ia) -* [Amazon S3 One Zone - Infrequent Access](#s3-one-zone-infrequent-access-s3-one-zone-ia) -* [Amazon S3 Glacier Instant Retrieval](#amazon-s3-glacier-instant-retrieval) -* [Amazon S3 Glacier Flexible Retrieval](#amazon-s3-glacier-flexible-retrieval-formerly-amazon-s3-glacier) -* [Amazon S3 Glacier Deep Archive](#amazon-s3-glacier-deep-archive-–-for-long-term-storage) -* [Amazon S3 Intelligent Tiering](#s3-intelligent-tiering) +- [Amazon S3 Standard - General Purpose](#s3-standard-general-purpose) +- [Amazon S3 Standard - Infrequent Access (IA)](#s3-standard-infrequent-access-s3-standard-ia) +- [Amazon S3 One Zone - Infrequent Access](#s3-one-zone-infrequent-access-s3-one-zone-ia) +- [Amazon S3 Glacier Instant Retrieval](#amazon-s3-glacier-instant-retrieval) +- [Amazon S3 Glacier Flexible Retrieval](#amazon-s3-glacier-flexible-retrieval-formerly-amazon-s3-glacier) +- [Amazon S3 Glacier Deep Archive](#amazon-s3-glacier-deep-archive-–-for-long-term-storage) +- [Amazon S3 Intelligent Tiering](#s3-intelligent-tiering) -* Can move between classes manually or using S3 Lifecycle configurations +- Can move between classes manually or using S3 Lifecycle configurations -## S3 Durability and Availability +### S3 Durability and Availability -* Durability: - * High durability (99.999999999%, 11 9’s) of objects across multiple AZ - * If you store 10,000,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000 years - * Same for all storage classes -* Availability: - * Measures how readily available a service is - * Varies depending on storage class - * Example: S3 standard has 99.99% availability = not available 53 minutes a year +- Durability: + - High durability (99.999999999%, 11 9’s) of objects across multiple AZ + - If you store 10,000,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000 years + - Same for all storage classes +- Availability: + - Measures how readily available a service is + - Varies depending on storage class + - Example: S3 standard has 99.99% availability = not available 53 minutes a year -## S3 Standard General Purpose +### S3 Standard General Purpose -* 99.99% Availability -* Used for frequently accessed data -* Low latency and high throughput -* Sustain 2 concurrent facility failures -* Use Cases: Big Data analytics, mobile & gaming applications, content distribution… +- 99.99% Availability +- Used for frequently accessed data +- Low latency and high throughput +- Sustain 2 concurrent facility failures +- Use Cases: Big Data analytics, mobile & gaming applications, content distribution… -## S3 Storage Classes – Infrequent Access +### S3 Storage Classes - Infrequent Access -* For data that is less frequently accessed, but requires rapid access when needed -* Lower cost than S3 Standard +- For data that is less frequently accessed, but requires rapid access when needed +- Lower cost than S3 Standard -### S3 Standard Infrequent Access (S3 Standard-IA) +#### S3 Standard Infrequent Access (S3 Standard-IA) -* 99.9% Availability -* Use cases: Disaster Recovery, backups +- 99.9% Availability +- Use cases: Disaster Recovery, backups -### S3 One Zone Infrequent Access (S3 One Zone-IA) +#### S3 One Zone Infrequent Access (S3 One Zone-IA) -* High durability (99.999999999%) in a single AZ; data lost when AZ is destroyed -* 99.5% Availability -* Use Cases: Storing secondary backup copies of on-premise data, or data you can recreate +- High durability (99.999999999%) in a single AZ; data lost when AZ is destroyed +- 99.5% Availability +- Use Cases: Storing secondary backup copies of on-premise data, or data you can recreate -## Amazon S3 Glacier Storage Classes +### Amazon S3 Glacier Storage Classes -* Low-cost object storage meant for archiving / backup -* Pricing: price for storage + object retrieval cost +- Low-cost object storage meant for archiving / backup +- Pricing: price for storage + object retrieval cost -### Amazon S3 Glacier Instant Retrieval +#### Amazon S3 Glacier Instant Retrieval -* Millisecond retrieval, great for data accessed once a quarter -* Minimum storage duration of 90 days +- Millisecond retrieval, great for data accessed once a quarter +- Minimum storage duration of 90 days -### Amazon S3 Glacier Flexible Retrieval (formerly Amazon S3 Glacier) +#### Amazon S3 Glacier Flexible Retrieval (formerly Amazon S3 Glacier) -* Expedited (1 to 5 minutes), Standard (3 to 5 hours), Bulk (5 to 12 hours) – free -* Minimum storage duration of 90 days +- Expedited (1 to 5 minutes), Standard (3 to 5 hours), Bulk (5 to 12 hours) – free +- Minimum storage duration of 90 days -### Amazon S3 Glacier Deep Archive – for long term storage +#### Amazon S3 Glacier Deep Archive - for long term storage -* Standard (12 hours), Bulk (48 hours) -* Minimum storage duration of 180 days +- Standard (12 hours), Bulk (48 hours) +- Minimum storage duration of 180 days -## S3 Intelligent-Tiering +### S3 Intelligent-Tiering -* Small monthly monitoring and auto-tiering fee -* Moves objects automatically between Access Tiers based on usage -* There are no retrieval charges in S3 Intelligent-Tiering -* Frequent Access tier (automatic): default tier -* Infrequent Access tier (automatic): objects not accessed for 30 days -* Archive Instant Access tier (automatic): objects not accessed for 90 days -* Archive Access tier (optional): configurable from 90 days to 700+ days -* Deep Archive Access tier (optional): config. from 180 days to 700+ days +- Small monthly monitoring and auto-tiering fee +- Moves objects automatically between Access Tiers based on usage +- There are no retrieval charges in S3 Intelligent-Tiering +- Frequent Access tier (automatic): default tier +- Infrequent Access tier (automatic): objects not accessed for 30 days +- Archive Instant Access tier (automatic): objects not accessed for 90 days +- Archive Access tier (optional): configurable from 90 days to 700+ days +- Deep Archive Access tier (optional): config. from 180 days to 700+ days ## S3 Object Lock & Glacier Vault Lock -* S3 Object Lock - * Adopt a WORM (Write Once Read Many) model - * Block an object version deletion for a specified amount of time -* Glacier Vault Lock - * Adopt a WORM (Write Once Read Many) model - * Lock the policy for future edits (can no longer be changed) - * Helpful for compliance and data retention +- S3 Object Lock + - Adopt a WORM (Write Once Read Many) model + - Block an object version deletion for a specified amount of time +- Glacier Vault Lock + - Adopt a WORM (Write Once Read Many) model + - Lock the policy for future edits (can no longer be changed) + - Helpful for compliance and data retention ## Shared Responsibility Model for S3 -AWS | YOU ----- | ---- -Infrastructure (global security, durability, availability, sustain concurrent loss of data in two facilities) | S3 Versioning, S3 Bucket Policies, S3 Replication Setup -Configuration and vulnerability analysis | Logging and Monitoring, S3 Storage Classes -Compliance validation | Data encryption at rest and in transit +| AWS | YOU | +| ------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- | +| Infrastructure (global security, durability, availability, sustain concurrent loss of data in two facilities) | S3 Versioning, S3 Bucket Policies, S3 Replication Setup | +| Configuration and vulnerability analysis | Logging and Monitoring, S3 Storage Classes | +| Compliance validation | Data encryption at rest and in transit | ## AWS Snow Family -* Highly-secure, portable devices to collect and process data at the edge, and migrate data into and out of AWS -* Data migration: - * Snowcone - * Snowball Edge - * Snowmobile -* Edge computing: - * Snowcone - * Snowball Edge +- Highly-secure, portable devices to collect and process data at the edge, and migrate data into and out of AWS +- Data migration: + - Snowcone + - Snowball Edge + - Snowmobile +- Edge computing: + - Snowcone + - Snowball Edge -## Data Migrations with AWS Snow Family +### Data Migrations with AWS Snow Family -* **AWS Snow Family: offline devices to perform data migrations** If it takes more than a week to transfer over the network, use Snowball devices! +- **AWS Snow Family: offline devices to perform data migrations** If it takes more than a week to transfer over the network, use Snowball devices! -* Challenges: - * Limited connectivity - * Limited bandwidth - * High network cost - * Shared bandwidth (can’t maximize the line) - * Connection stability +- Challenges: + - Limited connectivity + - Limited bandwidth + - High network cost + - Shared bandwidth (can’t maximize the line) + - Connection stability -## Time to Transfer +### Time to Transfer -Data | 100 Mbps | 1Gbps | 10Gbps -10 TB | 12 days | 30 hours | 3 hours -100 TB | 124 days | 12 days | 30 hours -1 PB | 3 years | 124 days | 12 days +| Data | 100 Mbps | 1Gbps | 10Gbps | +| ------ | -------- | -------- | -------- | +| 10 TB | 12 days | 30 hours | 3 hours | +| 100 TB | 124 days | 12 days | 30 hours | +| 1 PB | 3 years | 124 days | 12 days | -## Snowball Edge (for data transfers) +### Snowball Edge (for data transfers) -* Physical data transport solution: move TBs or PBs of data in or out of AWS -* Alternative to moving data over the network (and paying network fees) -* Pay per data transfer job -* Provide block storage and Amazon S3-compatible object storage -* Snowball Edge Storage Optimized - * 80 TB of HDD capacity for block volume and S3 compatible object storage -* Snowball Edge Compute Optimized - * 42 TB of HDD capacity for block volume and S3 compatible object storage -* Use cases: large data cloud migrations, DC decommission, disaster recovery +- Physical data transport solution: move TBs or PBs of data in or out of AWS +- Alternative to moving data over the network (and paying network fees) +- Pay per data transfer job +- Provide block storage and Amazon S3-compatible object storage +- Snowball Edge Storage Optimized + - 80 TB of HDD capacity for block volume and S3 compatible object storage +- Snowball Edge Compute Optimized + - 42 TB of HDD capacity for block volume and S3 compatible object storage +- Use cases: large data cloud migrations, DC decommission, disaster recovery -## AWS Snowcone +### AWS Snowcone -* Small, portable computing, anywhere, rugged & secure, withstands harsh environments -* Light (4.5 pounds, 2.1 kg) -* Device used for edge computing, storage, and data transfer -* **8 TBs of usable storage** -* Use Snowcone where Snowball does not fit (space-constrained environment) -* Must provide your own battery / cables -* Can be sent back to AWS offline, or connect it to internet and use **AWS DataSync** to send data +- Small, portable computing, anywhere, rugged & secure, withstands harsh environments +- Light (4.5 pounds, 2.1 kg) +- Device used for edge computing, storage, and data transfer +- **8 TBs of usable storage** +- Use Snowcone where Snowball does not fit (space-constrained environment) +- Must provide your own battery / cables +- Can be sent back to AWS offline, or connect it to internet and use **AWS DataSync** to send data -## AWS Snowmobile +### AWS Snowmobile -* Transfer exabytes of data (1 EB = 1,000 PB = 1,000,000 TBs) -* Each Snowmobile has 100 PB of capacity (use multiple in parallel) -* High security: temperature controlled, GPS, 24/7 video surveillance -* **Better than Snowball if you transfer more than 10 PB** +- Transfer exabytes of data (1 EB = 1,000 PB = 1,000,000 TBs) +- Each Snowmobile has 100 PB of capacity (use multiple in parallel) +- High security: temperature controlled, GPS, 24/7 video surveillance +- **Better than Snowball if you transfer more than 10 PB** -Properties | Snowcone | Snowball Edge Storage Optimized | Snowmobile ----- | ---- | ---- | ---- -Storage Capacity | 8 TB usable | 80 TB usable | < 100 PB -Migration Size | Up to 24 TB, online and offline | Up to petabytes, offline | Up to exabytes, offline +| Properties | Snowcone | Snowball Edge Storage Optimized | Snowmobile | +| ---------------- | ------------------------------- | ------------------------------- | ----------------------- | +| Storage Capacity | 8 TB usable | 80 TB usable | < 100 PB | +| Migration Size | Up to 24 TB, online and offline | Up to petabytes, offline | Up to exabytes, offline | -## Snow Family – Usage Process +### Snow Family - Usage Process 1. Request Snowball devices from the AWS console for delivery 2. Install the snowball client / AWS OpsHub on your servers @@ -307,78 +346,78 @@ Migration Size | Up to 24 TB, online and offline | Up to petabytes, offline | Up ## What is Edge Computing? -* Process data while it’s being created on an edge location - * A truck on the road, a ship on the sea, a mining station underground... -* These locations may have - * Limited / no internet access - * Limited / no easy access to computing power -* We setup a **Snowball Edge / Snowcone** device to do edge computing -* Use cases of Edge Computing: - * Preprocess data - * Machine learning at the edge - * Transcoding media streams -* Eventually (if need be) we can ship back the device to AWS (for transferring data for example) +- Process data while it’s being created on an edge location + - A truck on the road, a ship on the sea, a mining station underground... +- These locations may have + - Limited / no internet access + - Limited / no easy access to computing power +- We setup a **Snowball Edge / Snowcone** device to do edge computing +- Use cases of Edge Computing: + - Preprocess data + - Machine learning at the edge + - Transcoding media streams +- Eventually (if need be) we can ship back the device to AWS (for transferring data for example) -## Snow Family – Edge Computing +## Snow Family - Edge Computing -* **Snowcone (smaller)** - * 2 CPUs, 4 GB of memory, wired or wireless access - * USB-C power using a cord or the optional battery -* **Snowball Edge – Compute Optimized** - * 52 vCPUs, 208 GiB of RAM - * Optional GPU (useful for video processing or machine learning) - * 42 TB usable storage -* **Snowball Edge – Storage Optimized** - * Up to 40 vCPUs, 80 GiB of RAM - * Object storage clustering available -* All: Can run EC2 Instances & AWS Lambda functions (using AWS IoT Greengrass) -* Long-term deployment options: 1 and 3 years discounted pricing +- **Snowcone (smaller)** + - 2 CPUs, 4 GB of memory, wired or wireless access + - USB-C power using a cord or the optional battery +- **Snowball Edge – Compute Optimized** + - 52 vCPUs, 208 GiB of RAM + - Optional GPU (useful for video processing or machine learning) + - 42 TB usable storage +- **Snowball Edge – Storage Optimized** + - Up to 40 vCPUs, 80 GiB of RAM + - Object storage clustering available +- All: Can run EC2 Instances & AWS Lambda functions (using AWS IoT Greengrass) +- Long-term deployment options: 1 and 3 years discounted pricing ## AWS OpsHub -* Historically, to use Snow Family devices, you needed a CLI (Command Line Interface tool) -* Today, you can use **AWS OpsHub** (a software you install on your computer / laptop) to manage your Snow Family Device - * Unlocking and configuring single or clustered devices - * Transferring files - * Launching and managing instances running on Snow Family Devices - * Monitor device metrics (storage capacity, active instances on your device) - * Launch compatible AWS services on your devices (ex: Amazon EC2 instances, AWS DataSync, Network File System (NFS)) +- Historically, to use Snow Family devices, you needed a CLI (Command Line Interface tool) +- Today, you can use **AWS OpsHub** (a software you install on your computer / laptop) to manage your Snow Family Device + - Unlocking and configuring single or clustered devices + - Transferring files + - Launching and managing instances running on Snow Family Devices + - Monitor device metrics (storage capacity, active instances on your device) + - Launch compatible AWS services on your devices (ex: Amazon EC2 instances, AWS DataSync, Network File System (NFS)) ## Hybrid Cloud for Storage -* AWS is pushing for ”hybrid cloud” - * Part of your infrastructure is on-premises - * Part of your infrastructure is on the cloud -* This can be due to - * Long cloud migrations - * Security requirements - * Compliance requirements - * IT strategy -* S3 is a proprietary storage technology (unlike EFS / NFS), so how do you expose the S3 data on-premise? -* AWS Storage Gateway! +- AWS is pushing for ”hybrid cloud” + - Part of your infrastructure is on-premises + - Part of your infrastructure is on the cloud +- This can be due to + - Long cloud migrations + - Security requirements + - Compliance requirements + - IT strategy +- S3 is a proprietary storage technology (unlike EFS / NFS), so how do you expose the S3 data on-premise? +- AWS Storage Gateway! ## AWS Storage Gateway -* Bridge between on-premise data and cloud data in S3 -* Hybrid storage service to allow on- premises to seamlessly use the AWS Cloud -* Use cases: disaster recovery, backup & restore, tiered storage -* Types of Storage Gateway: - * File Gateway - * Volume Gateway - * Tape Gateway -* No need to know the types at the exam +- Bridge between on-premise data and cloud data in S3 +- Hybrid storage service to allow on- premises to seamlessly use the AWS Cloud +- Use cases: disaster recovery, backup & restore, tiered storage +- Types of Storage Gateway: + - File Gateway + - Volume Gateway + - Tape Gateway +- No need to know the types at the exam -## Amazon S3 – Summary +## Amazon S3 - Summary -* Buckets vs Objects: global unique name, tied to a region -* S3 security: IAM policy, S3 Bucket Policy (public access), S3 Encryption -* S3 Websites: host a static website on Amazon S3 -* S3 Versioning: multiple versions for files, prevent accidental deletes -* S3 Access Logs: log requests made within your S3 bucket -* S3 Replication: same-region or cross-region, must enable versioning -* S3 Storage Classes: Standard, IA, 1Z-IA, Intelligent, Glacier, Glacier Deep Archive -* S3 Lifecycle Rules: transition objects between classes -* S3 Glacier Vault Lock / S3 Object Lock: WORM (Write Once Read Many) -* Snow Family: import data onto S3 through a physical device, edge computing -* OpsHub: desktop application to manage Snow Family devices -* Storage Gateway: hybrid solution to extend on-premises storage to S3 \ No newline at end of file +- Buckets vs Objects: global unique name, tied to a region +- S3 security: IAM policy, S3 Bucket Policy (public access), S3 Encryption +- S3 Websites: host a static website on Amazon S3 +- S3 Versioning: multiple versions for files, prevent accidental deletes +- S3 Access Logs: log requests made within your S3 bucket +- S3 Replication: same-region or cross-region, must enable versioning +- S3 Storage Classes: Standard, IA, 1Z-IA, Intelligent, Glacier, Glacier Deep Archive +- S3 Lifecycle Rules: transition objects between classes +- S3 Glacier Vault Lock / S3 Object Lock: WORM (Write Once Read Many) +- Snow Family: import data onto S3 through a physical device, edge computing +- OpsHub: desktop application to manage Snow Family devices +- Storage Gateway: hybrid solution to extend on-premises storage to S3