[Modified] Table Of Contents added

This commit is contained in:
kananinirav
2022-08-16 10:20:01 +09:00
parent bfe63bf998
commit a2ec3e9877
6 changed files with 949 additions and 825 deletions

View File

@@ -4,16 +4,15 @@
### Table of contents
- AWS Fundamentals
- [What is Cloud Computing?](sections/cloud_computing.md)
- [IAM: Identity Access & Management](sections/iam.md)
- [EC2: Virtual Machines](sections/ec2.md)
- [EC2 Instance Storage](sections/ec2_storage.md)
- [Elastic Load Balancing & Auto Scaling Groups](sections/elb_asg.md)
- [Amazon S3](sections/s3.md)
- [Databases & Analytics](sections/databases.md)
- [Other Compute Section](sections/other_compute.md)
- [Deploying and Managing Infrastructure at Scale Section](sections/deploying.md)
- [What is Cloud Computing?](sections/cloud_computing.md)
- [IAM: Identity Access & Management](sections/iam.md)
- [EC2: Virtual Machines](sections/ec2.md)
- [EC2 Instance Storage](sections/ec2_storage.md)
- [Elastic Load Balancing & Auto Scaling Groups](sections/elb_asg.md)
- [Amazon S3](sections/s3.md)
- [Databases & Analytics](sections/databases.md)
- [Other Compute Section](sections/other_compute.md)
- [Deploying and Managing Infrastructure at Scale Section](sections/deploying.md)
### Contributors

View File

@@ -1,37 +1,64 @@
# Databases
# Databases & Analytics
- [Databases & Analytics](#databases--analytics)
- [Databases Intro](#databases-intro)
- [Relational Databases](#relational-databases)
- [NoSQL Databases](#nosql-databases)
- [NoSQL data example: JSON](#nosql-data-example-json)
- [Databases & Shared Responsibility on AWS](#databases--shared-responsibility-on-aws)
- [AWS RDS Overview](#aws-rds-overview)
- [Advantage over using RDS versus deploying DB on EC2](#advantage-over-using-rds-versus-deploying-db-on-ec2)
- [RDS Deployments: Read Replicas, Multi-AZ](#rds-deployments-read-replicas-multi-az)
- [RDS Deployments: Multi-Region](#rds-deployments-multi-region)
- [Amazon Aurora](#amazon-aurora)
- [Amazon ElastiCache Overview](#amazon-elasticache-overview)
- [DynamoDB](#dynamodb)
- [DynamoDB Accelerator - DAX](#dynamodb-accelerator---dax)
- [DynamoDB - Global Tables](#dynamodb---global-tables)
- [Redshift Overview](#redshift-overview)
- [Amazon EMR](#amazon-emr)
- [Amazon Athena](#amazon-athena)
- [Amazon QuickSight](#amazon-quicksight)
- [DocumentDB](#documentdb)
- [Amazon Neptune](#amazon-neptune)
- [Amazon QLDB](#amazon-qldb)
- [Amazon Managed Blockchain](#amazon-managed-blockchain)
- [AWS Glue](#aws-glue)
- [DMS - Database Migration Service](#dms---database-migration-service)
- [Databases & Analytics Summary](#databases--analytics-summary)
## Databases Intro
* Storing data on disk (EFS, EBS, EC2 Instance Store, S3) can have its limits
* Sometimes, you want to store data in a database…
* You can structure the data
* You build indexes to efficiently query / search through the data
* You define relationships between your datasets
* Databases are optimized for a purpose and come with different features, shapes and constraint
- Storing data on disk (EFS, EBS, EC2 Instance Store, S3) can have its limits
- Sometimes, you want to store data in a database…
- You can structure the data
- You build indexes to efficiently query / search through the data
- You define relationships between your datasets
- Databases are optimized for a purpose and come with different features, shapes and constraint
## Relational Databases
* Looks just like Excel spreadsheets, with links between them!
* Can use the SQL language to perform queries / lookups
- Looks just like Excel spreadsheets, with links between them!
- Can use the SQL language to perform queries / lookups
## NoSQL Databases
* NoSQL = non-SQL = non relational databases
* NoSQL databases are purpose built for specific data models and have flexible schemas for building modern applications.
* Benefits:
* Flexibility: easy to evolve data model
* Scalability: designed to scale-out by using distributed clusters
* High-performance: optimized for a specific data model
* Highly functional: types optimized for the data model
* Examples: Key-value, document, graph, in-memory, search databases
- NoSQL = non-SQL = non relational databases
- NoSQL databases are purpose built for specific data models and have flexible schemas for building modern applications.
- Benefits:
- Flexibility: easy to evolve data model
- Scalability: designed to scale-out by using distributed clusters
- High-performance: optimized for a specific data model
- Highly functional: types optimized for the data model
- Examples: Key-value, document, graph, in-memory, search databases
### NoSQL data example: JSON
* JSON = JavaScript Object Notation
* JSON is a common form of data that fits into a NoSQL model
* Data can be nested
* Fields can change over time
* Support for new types: arrays, etc…
- JSON = JavaScript Object Notation
- JSON is a common form of data that fits into a NoSQL model
- Data can be nested
- Fields can change over time
- Support for new types: arrays, etc…
```json
{
@@ -52,213 +79,213 @@
## Databases & Shared Responsibility on AWS
* AWS offers use to manage different databases
* Benefits include:
* Quick Provisioning, High Availability, Vertical and Horizontal Scaling
* Automated Backup & Restore, Operations, Upgrades
* Operating System Patching is handled by AWS
* Monitoring, alerting
* Note: many databases technologies could be run on EC2, but you must handle yourself the resiliency, backup, patching, high availability, fault tolerance, scaling
- AWS offers use to manage different databases
- Benefits include:
- Quick Provisioning, High Availability, Vertical and Horizontal Scaling
- Automated Backup & Restore, Operations, Upgrades
- Operating System Patching is handled by AWS
- Monitoring, alerting
- Note: many databases technologies could be run on EC2, but you must handle yourself the resiliency, backup, patching, high availability, fault tolerance, scaling
## AWS RDS Overview
* RDS stands for Relational Database Service
* Its a managed DB service for DB use SQL as a query language.
* It allows you to create databases in the cloud that are managed by AWS
* Postgres
* MySQL
* MariaDB
* Oracle
* Microsoft SQL Server
* **Aurora (AWS Proprietary database)**
- RDS stands for Relational Database Service
- Its a managed DB service for DB use SQL as a query language.
- It allows you to create databases in the cloud that are managed by AWS
- Postgres
- MySQL
- MariaDB
- Oracle
- Microsoft SQL Server
- **Aurora (AWS Proprietary database)**
### Advantage over using RDS versus deploying DB on EC2
* RDS is a managed service:
* Automated provisioning, OS patching
* Continuous backups and restore to specific timestamp (Point in Time Restore)!
* Monitoring dashboards
* Read replicas for improved read performance
* Multi AZ setup for DR (Disaster Recovery)
* Maintenance windows for upgrades
* Scaling capability (vertical and horizontal)
* Storage backed by EBS (gp2 or io1)
* BUT you cant SSH into your instances
- RDS is a managed service:
- Automated provisioning, OS patching
- Continuous backups and restore to specific timestamp (Point in Time Restore)!
- Monitoring dashboards
- Read replicas for improved read performance
- Multi AZ setup for DR (Disaster Recovery)
- Maintenance windows for upgrades
- Scaling capability (vertical and horizontal)
- Storage backed by EBS (gp2 or io1)
- BUT you cant SSH into your instances
## Amazon Aurora
### RDS Deployments: Read Replicas, Multi-AZ
* Aurora is a proprietary technology from AWS (not open sourced)
* PostgreSQL and MySQL are both supported as Aurora DB
* Aurora is “AWS cloud optimized” and claims 5x performance improvement over MySQL on RDS, over 3x the performance of Postgres on RDS
* Aurora storage automatically grows in increments of 10GB, up to 64 TB.
* Aurora costs more than RDS (20% more) but is more efficient
* Not in the free tier
## RDS Deployments: Read Replicas, Multi-AZ
Read Replicas | Multi-AZ
---- | ----
Scale the read workload of your DB | Failover in case of AZ outage (high availability)
Can create up to 5 Read Replicas | Data is only read/written to the main database
Data is only written to the main DB | Can only have 1 other AZ as failover
| Read Replicas | Multi-AZ |
| ----------------------------------- | ------------------------------------------------- |
| Scale the read workload of your DB | Failover in case of AZ outage (high availability) |
| Can create up to 5 Read Replicas | Data is only read/written to the main database |
| Data is only written to the main DB | Can only have 1 other AZ as failover |
![Read Replicas | Multi-AZ](/images/read_replicas_multi_AZ.png)
## RDS Deployments: Multi-Region
### RDS Deployments: Multi-Region
* Multi-Region (Read Replicas)
* Disaster recovery in case of region issue
* Local performance for global reads
* Replication cost
- Multi-Region (Read Replicas)
- Disaster recovery in case of region issue
- Local performance for global reads
- Replication cost
![Multi-Region](/images/multi_region.png)
## Amazon Aurora
- Aurora is a proprietary technology from AWS (not open sourced)
- PostgreSQL and MySQL are both supported as Aurora DB
- Aurora is “AWS cloud optimized” and claims 5x performance improvement over MySQL on RDS, over 3x the performance of Postgres on RDS
- Aurora storage automatically grows in increments of 10GB, up to 64 TB.
- Aurora costs more than RDS (20% more) but is more efficient
- Not in the free tier
## Amazon ElastiCache Overview
* The same way RDS is to get managed Relational Databases…
* ElastiCache is to get managed Redis or Memcached
* Caches are in-memory databases with high performance, low latency
* Helps reduce load off databases for read intensive workloads
* AWS takes care of OS maintenance / patching, optimizations, setup, configuration, monitoring, failure recovery and backup
- The same way RDS is to get managed Relational Databases…
- ElastiCache is to get managed Redis or Memcached
- Caches are in-memory databases with high performance, low latency
- Helps reduce load off databases for read intensive workloads
- AWS takes care of OS maintenance / patching, optimizations, setup, configuration, monitoring, failure recovery and backup
## DynamoDB
* Fully Managed Highly available with replication across 3 AZ
* NoSQL database - not a relational database
* Scales to massive workloads, distributed “serverless” database
* Millions of requests per seconds, trillions of row, 100s of TB of storage
* Fast and consistent in performance
* Single-digit millisecond latency low latency retrieval
* Integrated with IAM for security, authorization and administration
* Low cost and auto scaling capabilities
* Standard & Infrequent Access (IA) Table Class
- Fully Managed Highly available with replication across 3 AZ
- NoSQL database - not a relational database
- Scales to massive workloads, distributed “serverless” database
- Millions of requests per seconds, trillions of row, 100s of TB of storage
- Fast and consistent in performance
- Single-digit millisecond latency low latency retrieval
- Integrated with IAM for security, authorization and administration
- Low cost and auto scaling capabilities
- Standard & Infrequent Access (IA) Table Class
### DynamoDB Accelerator - DAX
* Fully Managed in-memory cache for DynamoDB
* 10x performance improvement single- digit millisecond latency to microseconds latency when accessing your DynamoDB tables
* Secure, highly scalable & highly available
* Difference with ElastiCache at the CCP level: DAX is only used for and is integrated with DynamoDB, while ElastiCache can be used for other databases
- Fully Managed in-memory cache for DynamoDB
- 10x performance improvement single- digit millisecond latency to microseconds latency when accessing your DynamoDB tables
- Secure, highly scalable & highly available
- Difference with ElastiCache at the CCP level: DAX is only used for and is integrated with DynamoDB, while ElastiCache can be used for other databases
### DynamoDB Global Tables
### DynamoDB - Global Tables
* Make a DynamoDB table accessible with low latency in multiple-regions
* Active-Active replication (read/write to any AWS Region)
- Make a DynamoDB table accessible with low latency in multiple-regions
- Active-Active replication (read/write to any AWS Region)
## Redshift Overview
* Redshift is based on PostgreSQL, but its not used for OLTP (Online Transactional Processing)
* Its OLAP online analytical processing (analytics and data warehousing)
* Load data once every hour, not every second
* 10x better performance than other data warehouses, scale to PBs of data
* Columnar storage of data (instead of row based)
* Massively Parallel Query Execution (MPP), highly available
* Pay as you go based on the instances provisioned
* Has a SQL interface for performing the queries
* BI tools such as AWS Quicksight or Tableau integrate with it
- Redshift is based on PostgreSQL, but its not used for OLTP (Online Transactional Processing)
- Its OLAP online analytical processing (analytics and data warehousing)
- Load data once every hour, not every second
- 10x better performance than other data warehouses, scale to PBs of data
- Columnar storage of data (instead of row based)
- Massively Parallel Query Execution (MPP), highly available
- Pay as you go based on the instances provisioned
- Has a SQL interface for performing the queries
- BI tools such as AWS Quicksight or Tableau integrate with it
## Amazon EMR
* EMR stands for “Elastic MapReduce”
* EMR helps creating Hadoop clusters (Big Data) to analyze and process vast amount of data
* The clusters can be made of hundreds of EC2 instances
* Also supports Apache Spark, HBase, Presto, Flink
* EMR takes care of all the provisioning and configuration
* Auto-scaling and integrated with Spot instances
* Use cases: data processing, machine learning, web indexing, big data
- EMR stands for “Elastic MapReduce”
- EMR helps creating Hadoop clusters (Big Data) to analyze and process vast amount of data
- The clusters can be made of hundreds of EC2 instances
- Also supports Apache Spark, HBase, Presto, Flink
- EMR takes care of all the provisioning and configuration
- Auto-scaling and integrated with Spot instances
- Use cases: data processing, machine learning, web indexing, big data
## Amazon Athena
* Serverless query service to analyze data stored in Amazon S3
* Uses standard SQL language to query the files
* Supports CSV, JSON, ORC, Avro, and Parquet (built on Presto)
* Pricing: $5.00 per TB of data scanned
* Use compressed or columnar data for cost-savings (less scan)
* Use cases: Business intelligence / analytics / reporting, analyze & query VPC Flow Logs, ELB Logs, CloudTrail trails, etc...
* **analyze data in S3 using serverless SQL, use Athena**
- Serverless query service to analyze data stored in Amazon S3
- Uses standard SQL language to query the files
- Supports CSV, JSON, ORC, Avro, and Parquet (built on Presto)
- Pricing: $5.00 per TB of data scanned
- Use compressed or columnar data for cost-savings (less scan)
- Use cases: Business intelligence / analytics / reporting, analyze & query VPC Flow Logs, ELB Logs, CloudTrail trails, etc...
- **analyze data in S3 using serverless SQL, use Athena**
## Amazon QuickSight
* Serverless machine learning-powered business intelligence service to create interactive dashboards
* Fast, automatically scalable, embeddable, with per-session pricing
* Use cases:
* Business analytics
* Building visualizations
* Perform ad-hoc analysis
* Get business insights using data
* Integrated with RDS, Aurora, Athena, Redshift, S3…
- Serverless machine learning-powered business intelligence service to create interactive dashboards
- Fast, automatically scalable, embeddable, with per-session pricing
- Use cases:
- Business analytics
- Building visualizations
- Perform ad-hoc analysis
- Get business insights using data
- Integrated with RDS, Aurora, Athena, Redshift, S3…
## DocumentDB
* Aurora is an “AWS-implementation” of PostgreSQL / MySQL …
* DocumentDB is the same for MongoDB (which is a NoSQL database)
* MongoDB is used to store, query, and index JSON data
* Similar “deployment concepts” as Aurora
* Fully Managed, highly available with replication across 3 AZ
* Aurora storage automatically grows in increments of 10GB, up to 64 TB.
* Automatically scales to workloads with millions of requests per seconds
- Aurora is an “AWS-implementation” of PostgreSQL / MySQL …
- DocumentDB is the same for MongoDB (which is a NoSQL database)
- MongoDB is used to store, query, and index JSON data
- Similar “deployment concepts” as Aurora
- Fully Managed, highly available with replication across 3 AZ
- Aurora storage automatically grows in increments of 10GB, up to 64 TB.
- Automatically scales to workloads with millions of requests per seconds
## Amazon Neptune
* Fully managed graph database
* A popular graph dataset would be a social network
* Users have friends
* Posts have comments
* Comments have likes from users
* Users share and like posts…
* Highly available across 3 AZ, with up to 15 read replicas
* Build and run applications working with highly connected datasets optimized for these complex and hard queries
* Can store up to billions of relations and query the graph with milliseconds latency
* Highly available with replications across multiple AZs
* Great for knowledge graphs (Wikipedia), fraud detection, recommendation engines, social networking
- Fully managed graph database
- A popular graph dataset would be a social network
- Users have friends
- Posts have comments
- Comments have likes from users
- Users share and like posts…
- Highly available across 3 AZ, with up to 15 read replicas
- Build and run applications working with highly connected datasets optimized for these complex and hard queries
- Can store up to billions of relations and query the graph with milliseconds latency
- Highly available with replications across multiple AZs
- Great for knowledge graphs (Wikipedia), fraud detection, recommendation engines, social networking
## Amazon QLDB
* QLDB stands for ”Quantum Ledger Database”
* A ledger is a book **recording financial transactions**
* Fully Managed, Serverless, High available, Replication across 3 AZ
* Used to **review history of all the changes made to your application data** over time
* **Immutable** system: no entry can be removed or modified, cryptographically verifiable
* 2-3x better performance than common ledger blockchain frameworks, manipulate data using SQL
* Difference with Amazon Managed Blockchain: no decentralization component, in accordance with financial regulation rules
- QLDB stands for ”Quantum Ledger Database”
- A ledger is a book **recording financial transactions**
- Fully Managed, Serverless, High available, Replication across 3 AZ
- Used to **review history of all the changes made to your application data** over time
- **Immutable** system: no entry can be removed or modified, cryptographically verifiable
- 2-3x better performance than common ledger blockchain frameworks, manipulate data using SQL
- Difference with Amazon Managed Blockchain: no decentralization component, in accordance with financial regulation rules
## Amazon Managed Blockchain
* Blockchain makes it possible to build applications where multiple parties can execute transactions without the need for a trusted, central authority.
* Amazon Managed Blockchain is a managed service to:
* Join public blockchain networks
* Or create your own scalable private network
* Compatible with the frameworks Hyperledger Fabric & Ethereum
- Blockchain makes it possible to build applications where multiple parties can execute transactions without the need for a trusted, central authority.
- Amazon Managed Blockchain is a managed service to:
- Join public blockchain networks
- Or create your own scalable private network
- Compatible with the frameworks Hyperledger Fabric & Ethereum
## AWS Glue
* Managed extract, transform, and load (ETL) service
* Useful to prepare and transform data for analytics
* Fully serverless service
* Glue Data Catalog: catalog of datasets
* can be used by Athena, Redshift, EMR
- Managed extract, transform, and load (ETL) service
- Useful to prepare and transform data for analytics
- Fully serverless service
- Glue Data Catalog: catalog of datasets
- can be used by Athena, Redshift, EMR
## DMS Database Migration Service
## DMS - Database Migration Service
* Quickly and securely migrate databases to AWS, resilient, self healing
* The source database remains available during the migration
* Supports:
* Homogeneous migrations: ex Oracle to Oracle
* Heterogeneous migrations: ex Microsoft SQL Server to Aurora
- Quickly and securely migrate databases to AWS, resilient, self healing
- The source database remains available during the migration
- Supports:
- Homogeneous migrations: ex Oracle to Oracle
- Heterogeneous migrations: ex Microsoft SQL Server to Aurora
## Databases & Analytics Summary in AWS
## Databases & Analytics Summary
* Relational Databases - OLTP: RDS & Aurora (SQL)
* Differences between Multi-AZ, Read Replicas, Multi-Region
* In-memory Database: ElastiCache
* Key/Value Database: DynamoDB (serverless) & DAX (cache for DynamoDB)
* Warehouse - OLAP: Redshift (SQL)
* Hadoop Cluster: EMR
* Athena: query data on Amazon S3 (serverless & SQL)
* QuickSight: dashboards on your data (serverless)
* DocumentDB: “Aurora for MongoDB” (JSON NoSQL database)
* Amazon QLDB: Financial Transactions Ledger (immutable journal, cryptographically verifiable)
* Amazon Managed Blockchain: managed Hyperledger Fabric & Ethereum blockchains
* Glue: Managed ETL (Extract Transform Load) and Data Catalog service
* Database Migration: DMS
* Neptune: graph database
- Relational Databases - OLTP: RDS & Aurora (SQL)
- Differences between Multi-AZ, Read Replicas, Multi-Region
- In-memory Database: ElastiCache
- Key/Value Database: DynamoDB (serverless) & DAX (cache for DynamoDB)
- Warehouse - OLAP: Redshift (SQL)
- Hadoop Cluster: EMR
- Athena: query data on Amazon S3 (serverless & SQL)
- QuickSight: dashboards on your data (serverless)
- DocumentDB: “Aurora for MongoDB” (JSON NoSQL database)
- Amazon QLDB: Financial Transactions Ledger (immutable journal, cryptographically verifiable)
- Amazon Managed Blockchain: managed Hyperledger Fabric & Ethereum blockchains
- Glue: Managed ETL (Extract Transform Load) and Data Catalog service
- Database Migration: DMS
- Neptune: graph database

View File

@@ -1,221 +1,243 @@
# Deploying and Managing Infrastructure at Scale
## What is CloudFormation
- [Deploying and Managing Infrastructure at Scale](#deploying-and-managing-infrastructure-at-scale)
- [What is CloudFormation?](#what-is-cloudformation)
- [Benefits of AWS CloudFormation](#benefits-of-aws-cloudformation)
- [CloudFormation Stack Designer](#cloudformation-stack-designer)
- [AWS Cloud Development Kit (CDK)](#aws-cloud-development-kit-cdk)
- [Developer problems on AWS](#developer-problems-on-aws)
- [AWS Elastic Beanstalk Overview](#aws-elastic-beanstalk-overview)
- [Elastic Beanstalk - Health Monitoring](#elastic-beanstalk---health-monitoring)
- [AWS CodeDeploy](#aws-codedeploy)
- [AWS CodeCommit](#aws-codecommit)
- [AWS CodeBuild](#aws-codebuild)
- [AWS CodePipeline](#aws-codepipeline)
- [AWS CodeArtifact](#aws-codeartifact)
- [AWS CodeStar](#aws-codestar)
- [AWS Cloud9](#aws-cloud9)
- [AWS Systems Manager (SSM)](#aws-systems-manager-ssm)
- [How Systems Manager works](#how-systems-manager-works)
- [Systems Manager - SSM Session Manager](#systems-manager---ssm-session-manager)
- [AWS OpsWorks](#aws-opsworks)
- [Deployment - Summary](#deployment---summary)
- [Developer Services - Summary](#developer-services---summary)
* CloudFormation is a declarative way of outlining your AWS Infrastructure, for any resources (most of them are supported).
* For example, within a CloudFormation template, you say:
* I want a security group
* I want two EC2 instances using this security group
* I want an S3 bucket
* I want a load balancer (ELB) in front of these machines
* Then CloudFormation creates those for you, in the right order, with the exact configuration that you specify
## What is CloudFormation?
- CloudFormation is a declarative way of outlining your AWS Infrastructure, for any resources (most of them are supported).
- For example, within a CloudFormation template, you say:
- I want a security group
- I want two EC2 instances using this security group
- I want an S3 bucket
- I want a load balancer (ELB) in front of these machines
- Then CloudFormation creates those for you, in the right order, with the exact configuration that you specify
### Benefits of AWS CloudFormation
* Infrastructure as code
* No resources are manually created, which is excellent for control
* Changes to the infrastructure are reviewed through code
* Cost
* Each resources within the stack is tagged with an identifier so you can easily see how much a stack costs you
* You can estimate the costs of your resources using the CloudFormation template
* Savings strategy: In Dev, you could automation deletion of templates at 5 PM and recreated at 8 AM, safely
* Productivity
* Ability to destroy and re-create an infrastructure on the cloud on the fly
* Automated generation of Diagram for your templates!
* Declarative programming (no need to figure out ordering and orchestration)
* Dont re-invent the wheel
* Leverage existing templates on the web!
* Leverage the documentation
* Supports (almost) all AWS resources:
* Everything well see in this course is supported
* You can use “custom resources” for resources that are not supported
- Infrastructure as code
- No resources are manually created, which is excellent for control
- Changes to the infrastructure are reviewed through code
- Cost
- Each resources within the stack is tagged with an identifier so you can easily see how much a stack costs you
- You can estimate the costs of your resources using the CloudFormation template
- Savings strategy: In Dev, you could automation deletion of templates at 5 PM and recreated at 8 AM, safely
- Productivity
- Ability to destroy and re-create an infrastructure on the cloud on the fly
- Automated generation of Diagram for your templates!
- Declarative programming (no need to figure out ordering and orchestration)
- Dont re-invent the wheel
- Leverage existing templates on the web!
- Leverage the documentation
- Supports (almost) all AWS resources:
- Everything well see in this course is supported
- You can use “custom resources” for resources that are not supported
### CloudFormation Stack Designer
* Example: WordPress CloudFormation Stack
* We can see all the resources
* We can see the relations between the components
- Example: WordPress CloudFormation Stack
- We can see all the resources
- We can see the relations between the components
## AWS Cloud Development Kit (CDK)
* Define your cloud infrastructure using a familiar language:
* JavaScript/TypeScript, Python, Java, and .NET
* The code is “compiled” into a CloudFormation template (JSON/YAML)
* You can therefore deploy infrastructure and application runtime code together
* Great for Lambda functions
* Great for Docker containers in ECS / EKS
- Define your cloud infrastructure using a familiar language:
- JavaScript/TypeScript, Python, Java, and .NET
- The code is “compiled” into a CloudFormation template (JSON/YAML)
- You can therefore deploy infrastructure and application runtime code together
- Great for Lambda functions
- Great for Docker containers in ECS / EKS
## Developer problems on AWS
* Managing infrastructure
* Deploying Code
* Configuring all the databases, load balancers, etc
* Scaling concerns
* Most web apps have the same architecture (ALB + ASG)
* All the developers want is for their code to run!
* Possibly, consistently across different applications and environments
- Managing infrastructure
- Deploying Code
- Configuring all the databases, load balancers, etc
- Scaling concerns
- Most web apps have the same architecture (ALB + ASG)
- All the developers want is for their code to run!
- Possibly, consistently across different applications and environments
## AWS Elastic Beanstalk Overview
* Elastic Beanstalk is a developer centric view of deploying an application on AWS
* It uses all the components weve seen before: EC2, ASG, ELB, RDS, etc…
* But its all in one view thats easy to make sense of!
* We still have full control over the configuration
* Beanstalk = Platform as a Service (PaaS)
* Beanstalk is free but you pay for the underlying instances
* Managed service
* Instance configuration / OS is handled by Beanstalk
* Deployment strategy is configurable but performed by Elastic Beanstalk
* Capacity provisioning
* Load balancing & auto-scaling
* Application health-monitoring & responsiveness
* Just the application code is the responsibility of the developer
* Three architecture models:
* Single Instance deployment: good for dev
* LB + ASG: great for production or pre-production web applications
* ASG only: great for non-web apps in production (workers, etc..)
- Elastic Beanstalk is a developer centric view of deploying an application on AWS
- It uses all the components weve seen before: EC2, ASG, ELB, RDS, etc…
- But its all in one view thats easy to make sense of!
- We still have full control over the configuration
- Beanstalk = Platform as a Service (PaaS)
- Beanstalk is free but you pay for the underlying instances
- Managed service
- Instance configuration / OS is handled by Beanstalk
- Deployment strategy is configurable but performed by Elastic Beanstalk
- Capacity provisioning
- Load balancing & auto-scaling
- Application health-monitoring & responsiveness
- Just the application code is the responsibility of the developer
- Three architecture models:
- Single Instance deployment: good for dev
- LB + ASG: great for production or pre-production web applications
- ASG only: great for non-web apps in production (workers, etc..)
* Support for many platforms:
* Go
* Java SE
* Java with Tomcat
* .NET on Windows Server with IIS
* Node.js
* PHP
* Python
* Ruby
* Packer Builder
* Single Container Docker
* Multi-Container Docker
* Preconfigured Docker
- Support for many platforms:
- Go
- Java SE
- Java with Tomcat
- .NET on Windows Server with IIS
- Node.js
- PHP
- Python
- Ruby
- Packer Builder
- Single Container Docker
- Multi-Container Docker
- Preconfigured Docker
### Elastic Beanstalk Health Monitoring
### Elastic Beanstalk - Health Monitoring
* Health agent pushes metrics to CloudWatch
* Checks for app health, publishes health events
- Health agent pushes metrics to CloudWatch
- Checks for app health, publishes health events
## AWS CodeDeploy
* We want to deploy our application automatically
* Works with EC2 Instances
* Works with On-Premises Servers
* Hybrid service
* Servers / Instances must be provisioned and configured ahead of time with the CodeDeploy Agent
- We want to deploy our application automatically
- Works with EC2 Instances
- Works with On-Premises Servers
- Hybrid service
- Servers / Instances must be provisioned and configured ahead of time with the CodeDeploy Agent
## AWS CodeCommit
* Before pushing the application code to servers, it needs to be stored somewhere
* Developers usually store code in a repository, using the Git technology
* A famous public offering is GitHub, AWS competing product is CodeCommit
* CodeCommit:
* Source-control service that hosts Git-based repositories
* Makes it easy to collaborate with others on code
* The code changes are automatically versioned
* Benefits:
* Fully managed
* Scalable & highly available
* Private, Secured, Integrated with AWS
- Before pushing the application code to servers, it needs to be stored somewhere
- Developers usually store code in a repository, using the Git technology
- A famous public offering is GitHub, AWS competing product is CodeCommit
- CodeCommit:
- Source-control service that hosts Git-based repositories
- Makes it easy to collaborate with others on code
- The code changes are automatically versioned
- Benefits:
- Fully managed
- Scalable & highly available
- Private, Secured, Integrated with AWS
## AWS CodeBuild
* Code building service in the cloud (name is obvious)
* Compiles source code, run tests, and produces packages that are ready to be deployed (by CodeDeploy for example)
* Benefits:
* Fully managed, serverless
* Continuously scalable & highly available
* Secure
* Pay-as-you-go pricing only pay for the build time
- Code building service in the cloud (name is obvious)
- Compiles source code, run tests, and produces packages that are ready to be deployed (by CodeDeploy for example)
- Benefits:
- Fully managed, serverless
- Continuously scalable & highly available
- Secure
- Pay-as-you-go pricing only pay for the build time
## AWS CodePipeline
* Orchestrate the different steps to have the code automatically pushed to production
* Code => Build => Test => Provision => Deploy
* Basis for CICD (Continuous Integration & Continuous Delivery)
* Benefits:
* Fully managed, compatible with CodeCommit, CodeBuild, CodeDeploy, Elastic Beanstalk, CloudFormation, GitHub, 3rd-party services (GitHub…) & custom plugins…
* Fast delivery & rapid updates
- Orchestrate the different steps to have the code automatically pushed to production
- Code => Build => Test => Provision => Deploy
- Basis for CICD (Continuous Integration & Continuous Delivery)
- Benefits:
- Fully managed, compatible with CodeCommit, CodeBuild, CodeDeploy, Elastic Beanstalk, CloudFormation, GitHub, 3rd-party services (GitHub…) & custom plugins…
- Fast delivery & rapid updates
* CodePipeline: orchestration layer
* CodeCommit => CodeBuild => CodeDeploy => Elastic Beanstalk
- CodePipeline: orchestration layer
- CodeCommit => CodeBuild => CodeDeploy => Elastic Beanstalk
## AWS CodeArtifact
* Software packages depend on each other to be built (also called code dependencies), and new ones are created
* Storing and retrieving these dependencies is called artifact management
* Traditionally you need to setup your own artifact management system
* CodeArtifact is a secure, scalable, and cost-effective artifact management for software development
* Works with common dependency management tools such as Maven, Gradle, npm, yarn, twine, pip, and NuGet
* Developers and CodeBuild can then retrieve dependencies straight from CodeArtifact
- Software packages depend on each other to be built (also called code dependencies), and new ones are created
- Storing and retrieving these dependencies is called artifact management
- Traditionally you need to setup your own artifact management system
- CodeArtifact is a secure, scalable, and cost-effective artifact management for software development
- Works with common dependency management tools such as Maven, Gradle, npm, yarn, twine, pip, and NuGet
- Developers and CodeBuild can then retrieve dependencies straight from CodeArtifact
## AWS CodeStar
* Unified UI to easily manage software development activities in one place
* “Quick way” to get started to correctly set-up CodeCommit, CodePipeline, CodeBuild, CodeDeploy, Elastic Beanstalk, EC2, etc…
* Can edit the code ”in-the-cloud” using AWS Cloud9
- Unified UI to easily manage software development activities in one place
- “Quick way” to get started to correctly set-up CodeCommit, CodePipeline, CodeBuild, CodeDeploy, Elastic Beanstalk, EC2, etc…
- Can edit the code ”in-the-cloud” using AWS Cloud9
## AWS Cloud9
* AWS Cloud9 is a cloud IDE (Integrated Development Environment) for writing, running and debugging code
* “Classic” IDE (like IntelliJ, Visual Studio Code…) are downloaded on a computer before being used
* A cloud IDE can be used within a web browser, meaning you can work on your projects from your office, home, or anywhere with internet with no setup necessary
* AWS Cloud9 also allows for code collaboration in real-time (pair programming)
- AWS Cloud9 is a cloud IDE (Integrated Development Environment) for writing, running and debugging code
- “Classic” IDE (like IntelliJ, Visual Studio Code…) are downloaded on a computer before being used
- A cloud IDE can be used within a web browser, meaning you can work on your projects from your office, home, or anywhere with internet with no setup necessary
- AWS Cloud9 also allows for code collaboration in real-time (pair programming)
## AWS Systems Manager (SSM)
* Helps you manage your EC2 and On-Premises systems at scale
* Another Hybrid AWS service
* Get operational insights about the state of your infrastructure
* Suite of 10+ products
* Most important features are:
* Patching automation for enhanced compliance
* Run commands across an entire fleet of servers
* Store parameter configuration with the SSM Parameter Store
* Works for both Windows and Linux OS
- Helps you manage your EC2 and On-Premises systems at scale
- Another Hybrid AWS service
- Get operational insights about the state of your infrastructure
- Suite of 10+ products
- Most important features are:
- Patching automation for enhanced compliance
- Run commands across an entire fleet of servers
- Store parameter configuration with the SSM Parameter Store
- Works for both Windows and Linux OS
### How Systems Manager works
* We need to install the SSM agent onto the systems we control
* Installed by default on Amazon Linux AMI & some Ubuntu AMI
* If an instance cant be controlled with SSM, its probably an issue with the SSM agent!
* Thanks to the SSM agent, we can run commands, patch & configure our servers
- We need to install the SSM agent onto the systems we control
- Installed by default on Amazon Linux AMI & some Ubuntu AMI
- If an instance cant be controlled with SSM, its probably an issue with the SSM agent!
- Thanks to the SSM agent, we can run commands, patch & configure our servers
### Systems Manager SSM Session Manager
### Systems Manager - SSM Session Manager
* Allows you to start a secure shell on your EC2 and on-premises servers
* No SSH access, bastion hosts, or SSH keys needed
* No port 22 needed (better security)
* Supports Linux, macOS, and Windows
* Send session log data to S3 or CloudWatch Logs
- Allows you to start a secure shell on your EC2 and on-premises servers
- No SSH access, bastion hosts, or SSH keys needed
- No port 22 needed (better security)
- Supports Linux, macOS, and Windows
- Send session log data to S3 or CloudWatch Logs
## AWS OpsWorks
* Chef & Puppet help you perform server configuration automatically, or repetitive actions
* They work great with EC2 & On-Premises VM
* AWS OpsWorks = Managed Chef & Puppet
* Its an alternative to AWS SSM
* Only provision standard AWS resources:
* EC2 Instances, Databases, Load Balancers, EBS volumes…
* **Chef or Puppet needed => AWS OpsWorks**
- Chef & Puppet help you perform server configuration automatically, or repetitive actions
- They work great with EC2 & On-Premises VM
- AWS OpsWorks = Managed Chef & Puppet
- Its an alternative to AWS SSM
- Only provision standard AWS resources:
- EC2 Instances, Databases, Load Balancers, EBS volumes…
- **Chef or Puppet needed => AWS OpsWorks**
## Deployment - Summary
* CloudFormation: (AWS only)
* Infrastructure as Code, works with almost all of AWS resources
* Repeat across Regions & Accounts
* Beanstalk: (AWS only)
* Platform as a Service (PaaS), limited to certain programming languages or Docker
* Deploy code consistently with a known architecture: ex, ALB + EC2 + RDS
* CodeDeploy (hybrid): deploy & upgrade any application onto servers
* Systems Manager (hybrid): patch, configure and run commands at scale
* OpsWorks (hybrid): managed Chef and Puppet in AWS
- CloudFormation: (AWS only)
- Infrastructure as Code, works with almost all of AWS resources
- Repeat across Regions & Accounts
- Beanstalk: (AWS only)
- Platform as a Service (PaaS), limited to certain programming languages or Docker
- Deploy code consistently with a known architecture: ex, ALB + EC2 + RDS
- CodeDeploy (hybrid): deploy & upgrade any application onto servers
- Systems Manager (hybrid): patch, configure and run commands at scale
- OpsWorks (hybrid): managed Chef and Puppet in AWS
## Developer Services - Summary
* CodeCommit: Store code in private git repository (version controlled)
* CodeBuild: Build & test code in AWS
* CodeDeploy: Deploy code onto servers
* CodePipeline: Orchestration of pipeline (from code to build to deploy)
* CodeArtifact: Store software packages / dependencies on AWS
* CodeStar: Unified view for allowing developers to do CICD and code
* Cloud9: Cloud IDE (Integrated Development Environment) with collab
* AWS CDK: Define your cloud infrastructure using a programming language
- CodeCommit: Store code in private git repository (version controlled)
- CodeBuild: Build & test code in AWS
- CodeDeploy: Deploy code onto servers
- CodePipeline: Orchestration of pipeline (from code to build to deploy)
- CodeArtifact: Store software packages / dependencies on AWS
- CodeStar: Unified view for allowing developers to do CICD and code
- Cloud9: Cloud IDE (Integrated Development Environment) with collab
- AWS CDK: Define your cloud infrastructure using a programming language

View File

@@ -1,136 +1,154 @@
# EC2 Instance Storage
* [EBS volumes](#ebs-volume)
* [EFS: network file system, can be attached to 100s of instances in a region](#efs-elastic-file-system)
* [EFS-IA: cost-optimized storage class for infrequent accessed files](#efs-infrequent-access-efs-ia)
* [FSx for Windows: Network File System for Windows servers](#amazon-fsx-for-windows-file-server)
* [FSx for Lustre: High Performance Computing Linux file system](#amazon-fsx-for-lustre)
- [EC2 Instance Storage](#ec2-instance-storage)
- [EBS Volumes](#ebs-volumes)
- [Whats an EBS Volume?](#whats-an-ebs-volume)
- [EBS Volume](#ebs-volume)
- [EBS Delete on Termination attribute](#ebs--delete-on-termination-attribute)
- [EBS Snapshots](#ebs-snapshots)
- [EBS Snapshots Features](#ebs-snapshots-features)
- [EFS: Elastic File System](#efs-elastic-file-system)
- [EFS Infrequent Access (EFS-IA)](#efs-infrequent-access-efs-ia)
- [Amazon FSx Overview](#amazon-fsx--overview)
- [Amazon FSx for Windows File Server](#amazon-fsx-for-windows-file-server)
- [Amazon FSx for Lustre](#amazon-fsx-for-lustre)
- [EC2 Instance Store](#ec2-instance-store)
- [Shared Responsibility Model for EC2 Storage](#shared-responsibility-model-for-ec2-storage)
- [AMI Overview](#ami-overview)
- [AMI Process (from an EC2 instance)](#ami-process-from-an-ec2-instance)
- [EC2 Image Builder](#ec2-image-builder)
- EBS: Elastic Block Store, Volume is a network drive you can attach to your instances while they run
- EFS: network file system, can be attached to 100s of instances in a region
- EFS-IA: cost-optimized storage class for infrequent accessed files
- FSx for Windows: Network File System for Windows servers
- FSx for Lustre: High Performance Computing Linux file system
## EBS Volumes
### Whats an EBS Volume?
* An EBS (Elastic Block Store) Volume is a network drive you can attach to your instances while they run
* It allows your instances to persist data, even after their termination
* They can only be mounted to one instance at a time (at the CCP level)
* They are bound to a specific availability zone
* Analogy: Think of them as a “network USB stick”
* Free tier: 30 GB of free EBS storage of type General Purpose (SSD) or Magnetic per month
- An EBS (Elastic Block Store) Volume is a network drive you can attach to your instances while they run
- It allows your instances to persist data, even after their termination
- They can only be mounted to one instance at a time (at the CCP level)
- They are bound to a specific availability zone
- Analogy: Think of them as a “network USB stick”
- Free tier: 30 GB of free EBS storage of type General Purpose (SSD) or Magnetic per month
### EBS Volume
* Its a network drive (i.e. not a physical drive)
* It uses the network to communicate the instance, which means there might be a bit of latency
* It can be detached from an EC2 instance and attached to another one quickly
* Its locked to an Availability Zone (AZ)
* An EBS Volume in us-east-1a cannot be attached to us-east-1b
* To move a volume across, you first need to snapshot it
* Have a provisioned capacity (size in GBs, and IOPS)
* You get billed for all the provisioned capacity
* You can increase the capacity of the drive over time
- Its a network drive (i.e. not a physical drive)
- It uses the network to communicate the instance, which means there might be a bit of latency
- It can be detached from an EC2 instance and attached to another one quickly
- Its locked to an Availability Zone (AZ)
- An EBS Volume in us-east-1a cannot be attached to us-east-1b
- To move a volume across, you first need to snapshot it
- Have a provisioned capacity (size in GBs, and IOPS)
- You get billed for all the provisioned capacity
- You can increase the capacity of the drive over time
### EBS Delete on Termination attribute
* Controls the EBS behaviour when an EC2 instance terminates
* By default, the root EBS volume is deleted (attribute enabled)
* By default, any other attached EBS volume is not deleted (attribute disabled)
* This can be controlled by the AWS console / AWS CLI
* Use case: preserve root volume when instance is terminated
- Controls the EBS behaviour when an EC2 instance terminates
- By default, the root EBS volume is deleted (attribute enabled)
- By default, any other attached EBS volume is not deleted (attribute disabled)
- This can be controlled by the AWS console / AWS CLI
- Use case: preserve root volume when instance is terminated
### EBS Snapshots
* Make a backup (snapshot) of your EBS volume at a point in time
* Not necessary to detach volume to do snapshot, but recommended
* Can copy snapshots across AZ or Region
- Make a backup (snapshot) of your EBS volume at a point in time
- Not necessary to detach volume to do snapshot, but recommended
- Can copy snapshots across AZ or Region
### EBS Snapshots Features
* EBS Snapshot Archive
* Move a Snapshot to an ”archive tier” that is 75% cheaper
* Takes within 24 to 72 hours for restoring the archive
* Recycle Bin for EBS Snapshots
* Setup rules to retain deleted snapshots so you can recover them after an accidental deletion
* Specify retention (from 1 day to 1 year)
- EBS Snapshot Archive
- Move a Snapshot to an ”archive tier” that is 75% cheaper
- Takes within 24 to 72 hours for restoring the archive
- Recycle Bin for EBS Snapshots
- Setup rules to retain deleted snapshots so you can recover them after an accidental deletion
- Specify retention (from 1 day to 1 year)
## EFS: Elastic File System
* Managed NFS (network file system) that can be mounted on 100s of EC2
* EFS works with Linux EC2 instances in multi-AZ
* Highly available, scalable, expensive (3x gp2), pay per use, no capacity planning
- Managed NFS (network file system) that can be mounted on 100s of EC2
- EFS works with Linux EC2 instances in multi-AZ
- Highly available, scalable, expensive (3x gp2), pay per use, no capacity planning
## EFS Infrequent Access (EFS-IA)
* Storage class that is cost-optimized for files not accessed every day
* Up to 92% lower cost compared to EFS Standard
* EFS will automatically move your files to EFS-IA based on the last time they were accessed
* Enable EFS-IA with a Lifecycle Policy
* Example: move files that are not accessed for 60 days to EFS-IA
* Transparent to the applications accessing EFS
- Storage class that is cost-optimized for files not accessed every day
- Up to 92% lower cost compared to EFS Standard
- EFS will automatically move your files to EFS-IA based on the last time they were accessed
- Enable EFS-IA with a Lifecycle Policy
- Example: move files that are not accessed for 60 days to EFS-IA
- Transparent to the applications accessing EFS
## Amazon FSx Overview
* Launch 3rd party high-performance file systems on AWS
* Fully managed service
* FSx for Lustre
* FSx for Windows File Server
* FSx for NetApp ONTAP
- Launch 3rd party high-performance file systems on AWS
- Fully managed service
- FSx for Lustre
- FSx for Windows File Server
- FSx for NetApp ONTAP
### Amazon FSx for Windows File Server
* A fully managed, highly reliable, and scalable Windows native shared file system
* Built on Windows File Server
* Supports SMB protocol & Windows NTFS
* Integrated with Microsoft Active Directory
* Can be accessed from AWS or your on-premise infrastructure
- A fully managed, highly reliable, and scalable Windows native shared file system
- Built on Windows File Server
- Supports SMB protocol & Windows NTFS
- Integrated with Microsoft Active Directory
- Can be accessed from AWS or your on-premise infrastructure
### Amazon FSx for Lustre
* A fully managed, high-performance, scalable file storage for High Performance Computing (HPC)
* The name Lustre is derived from “Linux” and “cluster”
* Machine Learning, Analytics, Video Processing, Financial Modeling
* Scales up to 100s GB/s, millions of IOPS, sub-ms latencies
- A fully managed, high-performance, scalable file storage for High Performance Computing (HPC)
- The name Lustre is derived from “Linux” and “cluster”
- Machine Learning, Analytics, Video Processing, Financial Modeling
- Scales up to 100s GB/s, millions of IOPS, sub-ms latencies
## EC2 Instance Store
* EBS volumes are network drives with good but “limited” performance
* If you need a high-performance hardware disk, use EC2 Instance Store
* Better I/O performance
* EC2 Instance Store lose their storage if theyre stopped (ephemeral)
* Good for buffer / cache / scratch data / temporary content
* Risk of data loss if hardware fails
* Backups and Replication are your responsibility
- EBS volumes are network drives with good but “limited” performance
- If you need a high-performance hardware disk, use EC2 Instance Store
- Better I/O performance
- EC2 Instance Store lose their storage if theyre stopped (ephemeral)
- Good for buffer / cache / scratch data / temporary content
- Risk of data loss if hardware fails
- Backups and Replication are your responsibility
## Shared Responsibility Model for EC2 Storage
AWS | USER
---- | ----
Infrastructure | Setting up backup / snapshot procedures
Replication for data for EBS volumes & EFS drives | Setting up data encryption
Replacing faulty hardware | Responsibility of any data on the drives
Ensuring their employees cannot access your data | Understanding the risk of using EC2 Instance Store
| AWS | USER |
| ------------------------------------------------- | -------------------------------------------------- |
| Infrastructure | Setting up backup / snapshot procedures |
| Replication for data for EBS volumes & EFS drives | Setting up data encryption |
| Replacing faulty hardware | Responsibility of any data on the drives |
| Ensuring their employees cannot access your data | Understanding the risk of using EC2 Instance Store |
## AMI Overview
* AMI = Amazon Machine Image
* AMI are a customization of an EC2 instance
* You add your own software, configuration, operating system, monitoring…
* Faster boot / configuration time because all your software is pre-packaged
* AMI are built for a specific region (and can be copied across regions)
* You can launch EC2 instances from:
* A Public AMI: AWS provided
* Your own AMI: you make and maintain them yourself
* An AWS Marketplace AMI: an AMI someone else made (and potentially sells)
- AMI = Amazon Machine Image
- AMI are a customization of an EC2 instance
- You add your own software, configuration, operating system, monitoring…
- Faster boot / configuration time because all your software is pre-packaged
- AMI are built for a specific region (and can be copied across regions)
- You can launch EC2 instances from:
- A Public AMI: AWS provided
- Your own AMI: you make and maintain them yourself
- An AWS Marketplace AMI: an AMI someone else made (and potentially sells)
### AMI Process (from an EC2 instance)
* Start an EC2 instance and customize it
* Stop the instance (for data integrity)
* Build an AMI this will also create EBS snapshots
* Launch instances from other AMIs
- Start an EC2 instance and customize it
- Stop the instance (for data integrity)
- Build an AMI this will also create EBS snapshots
- Launch instances from other AMIs
## EC2 Image Builder
* Used to automate the creation of Virtual Machines or container images
* => Automate the creation, maintain, validate and test EC2 AMIs
* Can be run on a schedule (weekly, whenever packages are updated, etc…)
* Free service (only pay for the underlying resources)
- Used to automate the creation of Virtual Machines or container images
- => Automate the creation, maintain, validate and test EC2 AMIs
- Can be run on a schedule (weekly, whenever packages are updated, etc…)
- Free service (only pay for the underlying resources)

View File

@@ -1,173 +1,192 @@
# Other Compute
What is Docker?
- [Other Compute](#other-compute)
- [What is Docker?](#what-is-docker)
- [Where Docker images are stored?](#where-docker-images-are-stored)
- [Docker versus Virtual Machines](#docker-versus-virtual-machines)
- [ECS](#ecs)
- [Fargate](#fargate)
- [ECR](#ecr)
- [Whats serverless?](#whats-serverless)
- [Why AWS Lambda ?](#why-aws-lambda-)
- [Benefits of AWS Lambda](#benefits-of-aws-lambda)
- [AWS Lambda language support](#aws-lambda-language-support)
- [AWS Lambda Pricing: example](#aws-lambda-pricing-example)
- [Amazon API Gateway](#amazon-api-gateway)
- [AWS Batch](#aws-batch)
- [Batch vs Lambda](#batch-vs-lambda)
- [Amazon Lightsail](#amazon-lightsail)
- [Lambda Summary](#lambda-summary)
- [Other Compute Summary](#other-compute-summary)
* Docker is a software development platform to deploy apps
* Apps are packaged in containers that can be run on any OS
* Apps run the same, regardless of where theyre run
* Any machine
* No compatibility issues
* Predictable behavior
* Less work
* Easier to maintain and deploy
* Works with any language, any OS, any technology
* Scale containers up and down very quickly (seconds)
## What is Docker?
Where Docker images are stored?
- Docker is a software development platform to deploy apps
- Apps are packaged in containers that can be run on any OS
- Apps run the same, regardless of where theyre run
- Any machine
- No compatibility issues
- Predictable behavior
- Less work
- Easier to maintain and deploy
- Works with any language, any OS, any technology
- Scale containers up and down very quickly (seconds)
* Docker images are stored in Docker Repositories
* Public: Docker Hub <https://hub.docker.com/>
* Find base images for many technologies or OS:
* Ubuntu
* MySQL
* NodeJS, Java…
* Private: Amazon ECR (Elastic Container Registry)
### Where Docker images are stored?
## Docker versus Virtual Machines
- Docker images are stored in Docker Repositories
- Public: Docker Hub <https://hub.docker.com/>
- Find base images for many technologies or OS:
- Ubuntu
- MySQL
- NodeJS, Java…
- Private: Amazon ECR (Elastic Container Registry)
* Docker is ”sort of” a virtualization technology, but not exactly
* Resources are shared with the host => many containers on one server
### Docker versus Virtual Machines
- Docker is ”sort of” a virtualization technology, but not exactly
- Resources are shared with the host => many containers on one server
## ECS
* ECS = Elastic Container Service
* Launch Docker containers on AWS
* You must provision & maintain the infrastructure (the EC2 instances)
* AWS takes care of starting / stopping containers
* Has integrations with the Application Load Balancer
- ECS = Elastic Container Service
- Launch Docker containers on AWS
- You must provision & maintain the infrastructure (the EC2 instances)
- AWS takes care of starting / stopping containers
- Has integrations with the Application Load Balancer
## Fargate
* Launch Docker containers on AWS
* You do not provision the infrastructure (no EC2 instances to manage) simpler!
* Serverless offering
* AWS just runs containers for you based on the CPU / RAM you need
- Launch Docker containers on AWS
- You do not provision the infrastructure (no EC2 instances to manage) simpler!
- Serverless offering
- AWS just runs containers for you based on the CPU / RAM you need
## ECR
* Elastic Container Registry
* Private Docker Registry on AWS
* This is where you store your Docker images so they can be run by ECS or Fargate
- Elastic Container Registry
- Private Docker Registry on AWS
- This is where you store your Docker images so they can be run by ECS or Fargate
## Whats serverless?
* Serverless is a new paradigm in which the developers dont have to manage servers anymore…
* They just deploy code
* They just deploy… functions !
* Initially... Serverless == FaaS (Function as a Service)
* Serverless was pioneered by AWS Lambda but now also includes anything thats managed: “databases, messaging, storage, etc.”
* Serverless does not mean there are no servers…
* it means you just dont manage / provision / see them
- Serverless is a new paradigm in which the developers dont have to manage servers anymore…
- They just deploy code
- They just deploy… functions !
- Initially... Serverless == FaaS (Function as a Service)
- Serverless was pioneered by AWS Lambda but now also includes anything thats managed: “databases, messaging, storage, etc.”
- Serverless does not mean there are no servers…
- it means you just dont manage / provision / see them
## Why AWS Lambda ?
EC2 | Lambda
---- | ----
Virtual Servers in the Cloud | Virtual functions no servers to manage!
Limited by RAM and CPU | Limited by time - short executions
Continuously running | Run on-demand
Scaling means intervention to add / remove servers | Scaling is automated!
| EC2 | Lambda |
| -------------------------------------------------- | ----------------------------------------- |
| Virtual Servers in the Cloud | Virtual functions no servers to manage! |
| Limited by RAM and CPU | Limited by time - short executions |
| Continuously running | Run on-demand |
| Scaling means intervention to add / remove servers | Scaling is automated! |
## Benefits of AWS Lambda
### Benefits of AWS Lambda
* Easy Pricing:
* Pay per request and compute time
* Free tier of 1,000,000 AWS Lambda requests and 400,000 GBs of compute time
* Integrated with the whole AWS suite of services
* Event-Driven: functions get invoked by AWS when needed
* Integrated with many programming languages
* Easy monitoring through AWS CloudWatch
* Easy to get more resources per functions (up to 10GB of RAM!)
* Increasing RAM will also improve CPU and network!
- Easy Pricing:
- Pay per request and compute time
- Free tier of 1,000,000 AWS Lambda requests and 400,000 GBs of compute time
- Integrated with the whole AWS suite of services
- Event-Driven: functions get invoked by AWS when needed
- Integrated with many programming languages
- Easy monitoring through AWS CloudWatch
- Easy to get more resources per functions (up to 10GB of RAM!)
- Increasing RAM will also improve CPU and network!
## AWS Lambda language support
### AWS Lambda language support
* Node.js (JavaScript)
* Python
* Java (Java 8 compatible)
* C# (.NET Core)
* Golang
* C# / Powershell
* Ruby
* Custom Runtime API (community supported, example Rust)
* Lambda Container Image
* The container image must implement the Lambda Runtime API
* ECS / Fargate is preferred for running arbitrary Docker images
- Node.js (JavaScript)
- Python
- Java (Java 8 compatible)
- C# (.NET Core)
- Golang
- C# / Powershell
- Ruby
- Custom Runtime API (community supported, example Rust)
- Lambda Container Image
- The container image must implement the Lambda Runtime API
- ECS / Fargate is preferred for running arbitrary Docker images
## AWS Lambda Pricing: example
### AWS Lambda Pricing: example
* You can find overall pricing information here: <https://aws.amazon.com/lambda/pricing/>
* Pay per calls:
* First 1,000,000 requests are free
* $0.20 per 1 million requests thereafter ($0.0000002 per request)
* Pay per duration: (in increment of 1 ms)
* 400,000 GB-seconds of compute time per month for FREE
* == 400,000 seconds if function is 1GB RAM
* == 3,200,000 seconds if function is 128 MB RAM
* After that $1.00 for 600,000 GB-seconds
* It is usually **very cheap** to run AWS Lambda so its **very popular**
- You can find overall pricing information here: <https://aws.amazon.com/lambda/pricing/>
- Pay per calls:
- First 1,000,000 requests are free
- $0.20 per 1 million requests thereafter ($0.0000002 per request)
- Pay per duration: (in increment of 1 ms)
- 400,000 GB-seconds of compute time per month for FREE
- == 400,000 seconds if function is 1GB RAM
- == 3,200,000 seconds if function is 128 MB RAM
- After that $1.00 for 600,000 GB-seconds
- It is usually **very cheap** to run AWS Lambda so its **very popular**
## Amazon API Gateway
* Example: building a serverless API
* Fully managed service for developers to easily create, publish, maintain, monitor, and secure APIs
* Serverless and scalable
* Supports RESTful APIs and WebSocket APIs
* Support for security, user authentication, API throttling, API keys, monitoring.
- Example: building a serverless API
- Fully managed service for developers to easily create, publish, maintain, monitor, and secure APIs
- Serverless and scalable
- Supports RESTful APIs and WebSocket APIs
- Support for security, user authentication, API throttling, API keys, monitoring.
## AWS Batch
* Fully managed batch processing at any scale
* Efficiently run 100,000s of computing batch jobs on AWS
* A “batch” job is a job with a start and an end (opposed to continuous)
* Batch will dynamically launch EC2 instances or Spot Instances
* AWS Batch provisions the right amount of compute / memory
* You submit or schedule batch jobs and AWS Batch does the rest!
* Batch jobs are defined as Docker images and run on ECS
* Helpful for cost optimizations and focusing less on the infrastructure
- Fully managed batch processing at any scale
- Efficiently run 100,000s of computing batch jobs on AWS
- A “batch” job is a job with a start and an end (opposed to continuous)
- Batch will dynamically launch EC2 instances or Spot Instances
- AWS Batch provisions the right amount of compute / memory
- You submit or schedule batch jobs and AWS Batch does the rest!
- Batch jobs are defined as Docker images and run on ECS
- Helpful for cost optimizations and focusing less on the infrastructure
## Batch vs Lambda
Batch | Lambda
---- | ----
No time limit | Time limit
Any runtime as long as its packaged as a Docker image | Limited runtime
Rely on EBS / instance store for disk space | Limited temporary disk space
Relies on EC2 (can be managed by AWS) | Serverless
| Batch | Lambda |
| ------------------------------------------------------ | ---------------------------- |
| No time limit | Time limit |
| Any runtime as long as its packaged as a Docker image | Limited runtime |
| Rely on EBS / instance store for disk space | Limited temporary disk space |
| Relies on EC2 (can be managed by AWS) | Serverless |
## Amazon Lightsail
* Virtual servers, storage, databases, and networking
* Low & predictable pricing
* Simpler alternative to using EC2, RDS, ELB, EBS, Route 53…
* Great for people with little cloud experience!
* Can setup notifications and monitoring of your Lightsail resources
* Use cases:
* Simple web applications (has templates for LAMP, Nginx, MEAN, Node.js…)
* Websites (templates for WordPress, Magento, Plesk, Joomla)
* Dev / Test environment
* Has high availability but no auto-scaling, limited AWS integrations
- Virtual servers, storage, databases, and networking
- Low & predictable pricing
- Simpler alternative to using EC2, RDS, ELB, EBS, Route 53…
- Great for people with little cloud experience!
- Can setup notifications and monitoring of your Lightsail resources
- Use cases:
- Simple web applications (has templates for LAMP, Nginx, MEAN, Node.js…)
- Websites (templates for WordPress, Magento, Plesk, Joomla)
- Dev / Test environment
- Has high availability but no auto-scaling, limited AWS integrations
## Lambda Summary
* Lambda is Serverless, Function as a Service, seamless scaling, reactive
* Lambda Billing:
* By the time run x by the RAM provisioned
* By the number of invocations
* Language Support: many programming languages except (arbitrary) Docker
* Invocation time: up to 15 minutes
* Use cases:
* Create Thumbnails for images uploaded onto S3
* Run a Serverless cron job
* API Gateway: expose Lambda functions as HTTP API
- Lambda is Serverless, Function as a Service, seamless scaling, reactive
- Lambda Billing:
- By the time run x by the RAM provisioned
- By the number of invocations
- Language Support: many programming languages except (arbitrary) Docker
- Invocation time: up to 15 minutes
- Use cases:
- Create Thumbnails for images uploaded onto S3
- Run a Serverless cron job
- API Gateway: expose Lambda functions as HTTP API
## Other Compute Summary
* Docker: container technology to run applications
* ECS: run Docker containers on EC2 instances
* Fargate:
* Run Docker containers without provisioning the infrastructure
* Serverless offering (no EC2 instances)
* ECR: Private Docker Images Repository
* Batch: run batch jobs on AWS across managed EC2 instances
* Lightsail: predictable & low pricing for simple application & DB stacks
- Docker: container technology to run applications
- ECS: run Docker containers on EC2 instances
- Fargate:
- Run Docker containers without provisioning the infrastructure
- Serverless offering (no EC2 instances)
- ECR: Private Docker Images Repository
- Batch: run batch jobs on AWS across managed EC2 instances
- Lightsail: predictable & low pricing for simple application & DB stacks

View File

@@ -1,71 +1,109 @@
# Amazon S3
- [Amazon S3](#amazon-s3)
- [S3 Use cases](#s3-use-cases)
- [Amazon S3 Overview - Buckets](#amazon-s3-overview---buckets)
- [Amazon S3 Overview - Objects](#amazon-s3-overview---objects)
- [S3 Security](#s3-security)
- [S3 Bucket Policies](#s3-bucket-policies)
- [Bucket settings for Block Public Access](#bucket-settings-for-block-public-access)
- [S3 Websites](#s3-websites)
- [S3 - Versioning](#s3---versioning)
- [S3 Access Logs](#s3-access-logs)
- [S3 Replication (CRR & SRR)](#s3-replication-crr--srr)
- [S3 Storage Classes](#s3-storage-classes)
- [S3 Durability and Availability](#s3-durability-and-availability)
- [S3 Standard General Purpose](#s3-standard-general-purpose)
- [S3 Storage Classes - Infrequent Access](#s3-storage-classes---infrequent-access)
- [S3 Standard Infrequent Access (S3 Standard-IA)](#s3-standard-infrequent-access-s3-standard-ia)
- [S3 One Zone Infrequent Access (S3 One Zone-IA)](#s3-one-zone-infrequent-access-s3-one-zone-ia)
- [Amazon S3 Glacier Storage Classes](#amazon-s3-glacier-storage-classes)
- [Amazon S3 Glacier Instant Retrieval](#amazon-s3-glacier-instant-retrieval)
- [Amazon S3 Glacier Flexible Retrieval (formerly Amazon S3 Glacier)](#amazon-s3-glacier-flexible-retrieval-formerly-amazon-s3-glacier)
- [Amazon S3 Glacier Deep Archive - for long term storage](#amazon-s3-glacier-deep-archive---for-long-term-storage)
- [S3 Intelligent-Tiering](#s3-intelligent-tiering)
- [S3 Object Lock & Glacier Vault Lock](#s3-object-lock--glacier-vault-lock)
- [Shared Responsibility Model for S3](#shared-responsibility-model-for-s3)
- [AWS Snow Family](#aws-snow-family)
- [Data Migrations with AWS Snow Family](#data-migrations-with-aws-snow-family)
- [Time to Transfer](#time-to-transfer)
- [Snowball Edge (for data transfers)](#snowball-edge-for-data-transfers)
- [AWS Snowcone](#aws-snowcone)
- [AWS Snowmobile](#aws-snowmobile)
- [Snow Family - Usage Process](#snow-family---usage-process)
- [What is Edge Computing?](#what-is-edge-computing)
- [Snow Family - Edge Computing](#snow-family---edge-computing)
- [AWS OpsHub](#aws-opshub)
- [Hybrid Cloud for Storage](#hybrid-cloud-for-storage)
- [AWS Storage Gateway](#aws-storage-gateway)
- [Amazon S3 - Summary](#amazon-s3---summary)
## S3 Use cases
* Backup and storage
* Disaster Recovery
* Archive
* Hybrid Cloud storage
* Application hosting
* Media hosting
* Data lakes & big data analytics
* Software delivery
* Static website
- Backup and storage
- Disaster Recovery
- Archive
- Hybrid Cloud storage
- Application hosting
- Media hosting
- Data lakes & big data analytics
- Software delivery
- Static website
## Amazon S3 Overview - Buckets
* Amazon S3 allows people to store objects (files) in “buckets” (directories)
* Buckets must have a globally unique name (across all regions all accounts)
* Buckets are defined at the region level
* S3 looks like a global service but buckets are created in a region
* Naming convention
* No uppercase
* No underscore
* 3-63 characters long
* Not an IP
* Must start with lowercase letter or number
- Amazon S3 allows people to store objects (files) in “buckets” (directories)
- Buckets must have a globally unique name (across all regions all accounts)
- Buckets are defined at the region level
- S3 looks like a global service but buckets are created in a region
- Naming convention
- No uppercase
- No underscore
- 3-63 characters long
- Not an IP
- Must start with lowercase letter or number
## Amazon S3 Overview - Objects
* Objects (files) have a Key
* The key is the FULL path:
* s3://my-bucket/my_file.txt
* s3://my-bucket/my_folder1/another_folder/my_file.txt
* The key is composed of **prefix** + **object name**
* s3://my-bucket/my_folder1/another_folder/my_file.txt
* Theres no concept of “directories” within buckets (although the UI will trick you to think otherwise)
* Just keys with very long names that contain slashes (“/”)
* Object values are the content of the body:
* Max Object Size is 5TB (5000GB)
* If uploading more than 5GB, must use “multi-part upload”
* Metadata (list of text key / value pairs system or user metadata)
* Tags (Unicode key / value pair up to 10) useful for security / lifecycle
* Version ID (if versioning is enabled)
- Objects (files) have a Key
- The key is the FULL path:
- s3://my-bucket/my_file.txt
- s3://my-bucket/my_folder1/another_folder/my_file.txt
- The key is composed of **prefix** + **object name**
- s3://my-bucket/my_folder1/another_folder/my_file.txt
- Theres no concept of “directories” within buckets (although the UI will trick you to think otherwise)
- Just keys with very long names that contain slashes (“/”)
- Object values are the content of the body:
- Max Object Size is 5TB (5000GB)
- If uploading more than 5GB, must use “multi-part upload”
- Metadata (list of text key / value pairs system or user metadata)
- Tags (Unicode key / value pair up to 10) useful for security / lifecycle
- Version ID (if versioning is enabled)
## S3 Security
* **User based**
* IAM policies - which API calls should be allowed for a specific user from IAM console
* **Resource Based**
* Bucket Policies - bucket wide rules from the S3 console - allows cross account
* Object Access Control List (ACL) finer grain
* Bucket Access Control List (ACL) less common
* **Note:** an IAM principal can access an S3 object if
* the user IAM permissions allow it OR the resource policy ALLOWS it
* AND theres no explicit DENY
* **Encryption:** encrypt objects in Amazon S3 using encryption keys
- **User based**
- IAM policies - which API calls should be allowed for a specific user from IAM console
- **Resource Based**
- Bucket Policies - bucket wide rules from the S3 console - allows cross account
- Object Access Control List (ACL) finer grain
- Bucket Access Control List (ACL) less common
- **Note:** an IAM principal can access an S3 object if
- the user IAM permissions allow it OR the resource policy ALLOWS it
- AND theres no explicit DENY
- **Encryption:** encrypt objects in Amazon S3 using encryption keys
S3 Bucket Policies
## S3 Bucket Policies
* JSON based policies
* Resources: buckets and objects
* Actions: Set of API to Allow or Deny
* Effect: Allow / Deny
- JSON based policies
- Resources: buckets and objects
- Actions: Set of API to Allow or Deny
- Effect: Allow / Deny
Principal: The account or user to apply the policy to
* Use S3 bucket for policy to:
* Grant public access to the bucket
* Force objects to be encrypted at upload
* Grant access to another account (Cross Account)
- Use S3 bucket for policy to:
- Grant public access to the bucket
- Force objects to be encrypted at upload
- Grant access to another account (Cross Account)
```json
{
@@ -88,215 +126,216 @@ S3 Bucket Policies
## Bucket settings for Block Public Access
* Block all public access: On
* Block public access to buckets and objects granted through new access control lists (ACLS): On
* Block public access to buckets and objects granted through any access control lists (ACLS): On
* Block public access to buckets and objects granted through new public bucket or access point policies: On
* Block public and cross-account access to buckets and objects through any public bucket or access point policies: On
- Block all public access: On
- Block public access to buckets and objects granted through new access control lists (ACLS): On
- Block public access to buckets and objects granted through any access control lists (ACLS): On
- Block public access to buckets and objects granted through new public bucket or access point policies: On
- Block public and cross-account access to buckets and objects through any public bucket or access point policies: On
* These settings were created to prevent company data leaks
* If you know your bucket should never be public, leave these on
* Can be set at the account level
- These settings were created to prevent company data leaks
- If you know your bucket should never be public, leave these on
- Can be set at the account level
## S3 Websites
* S3 can host static websites and have them accessible on the www
* The website URL will be:
* bucket-name.s3-website-AWS-region.amazonaws.com
- S3 can host static websites and have them accessible on the www
- The website URL will be:
- bucket-name.s3-website-AWS-region.amazonaws.com
OR
* bucket-name.s3-website.AWS-region.amazonaws.com
* **If you get a 403 (Forbidden) error, make sure the bucket policy allows public reads!**
- bucket-name.s3-website.AWS-region.amazonaws.com
- **If you get a 403 (Forbidden) error, make sure the bucket policy allows public reads!**
## S3 -Versioning
## S3 - Versioning
* You can version your files in Amazon S3
* It is enabled at the bucket level
* Same key overwrite will increment the “version”: 1, 2, 3….
* It is best practice to version your buckets
* Protect against unintended deletes (ability to restore a version)
* Easy roll back to previous version
* Notes:
* Any file that is not versioned prior to enabling versioning will have version “null”
* Suspending versioning does not delete the previous versions
- You can version your files in Amazon S3
- It is enabled at the bucket level
- Same key overwrite will increment the “version”: 1, 2, 3….
- It is best practice to version your buckets
- Protect against unintended deletes (ability to restore a version)
- Easy roll back to previous version
- Notes:
- Any file that is not versioned prior to enabling versioning will have version “null”
- Suspending versioning does not delete the previous versions
## S3 Access Logs
* For audit purpose, you may want to log all access to S3 buckets
* Any request made to S3, from any account, authorized or denied, will be logged into another S3 bucket
* That data can be analyzed using data analysis tools…
* Very helpful to come down to the root cause of an issue, or audit usage, view suspicious patterns, etc…
- For audit purpose, you may want to log all access to S3 buckets
- Any request made to S3, from any account, authorized or denied, will be logged into another S3 bucket
- That data can be analyzed using data analysis tools…
- Very helpful to come down to the root cause of an issue, or audit usage, view suspicious patterns, etc…
## S3 Replication (CRR & SRR)
* Must enable versioning in source and destination
* Cross Region Replication (CRR)
* Same Region Replication (SRR)
* Buckets can be in different accounts
* Copying is asynchronous
* Must give proper IAM permissions to S3
* CRR - Use cases: compliance, lower latency access, replication across accounts
* SRR Use cases: log aggregation, live replication between production and test accounts
- Must enable versioning in source and destination
- Cross Region Replication (CRR)
- Same Region Replication (SRR)
- Buckets can be in different accounts
- Copying is asynchronous
- Must give proper IAM permissions to S3
- CRR - Use cases: compliance, lower latency access, replication across accounts
- SRR Use cases: log aggregation, live replication between production and test accounts
## S3 Storage Classes
* [Amazon S3 Standard - General Purpose](#s3-standard-general-purpose)
* [Amazon S3 Standard - Infrequent Access (IA)](#s3-standard-infrequent-access-s3-standard-ia)
* [Amazon S3 One Zone - Infrequent Access](#s3-one-zone-infrequent-access-s3-one-zone-ia)
* [Amazon S3 Glacier Instant Retrieval](#amazon-s3-glacier-instant-retrieval)
* [Amazon S3 Glacier Flexible Retrieval](#amazon-s3-glacier-flexible-retrieval-formerly-amazon-s3-glacier)
* [Amazon S3 Glacier Deep Archive](#amazon-s3-glacier-deep-archive--for-long-term-storage)
* [Amazon S3 Intelligent Tiering](#s3-intelligent-tiering)
- [Amazon S3 Standard - General Purpose](#s3-standard-general-purpose)
- [Amazon S3 Standard - Infrequent Access (IA)](#s3-standard-infrequent-access-s3-standard-ia)
- [Amazon S3 One Zone - Infrequent Access](#s3-one-zone-infrequent-access-s3-one-zone-ia)
- [Amazon S3 Glacier Instant Retrieval](#amazon-s3-glacier-instant-retrieval)
- [Amazon S3 Glacier Flexible Retrieval](#amazon-s3-glacier-flexible-retrieval-formerly-amazon-s3-glacier)
- [Amazon S3 Glacier Deep Archive](#amazon-s3-glacier-deep-archive--for-long-term-storage)
- [Amazon S3 Intelligent Tiering](#s3-intelligent-tiering)
* Can move between classes manually or using S3 Lifecycle configurations
- Can move between classes manually or using S3 Lifecycle configurations
## S3 Durability and Availability
### S3 Durability and Availability
* Durability:
* High durability (99.999999999%, 11 9s) of objects across multiple AZ
* If you store 10,000,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000 years
* Same for all storage classes
* Availability:
* Measures how readily available a service is
* Varies depending on storage class
* Example: S3 standard has 99.99% availability = not available 53 minutes a year
- Durability:
- High durability (99.999999999%, 11 9s) of objects across multiple AZ
- If you store 10,000,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000 years
- Same for all storage classes
- Availability:
- Measures how readily available a service is
- Varies depending on storage class
- Example: S3 standard has 99.99% availability = not available 53 minutes a year
## S3 Standard General Purpose
### S3 Standard General Purpose
* 99.99% Availability
* Used for frequently accessed data
* Low latency and high throughput
* Sustain 2 concurrent facility failures
* Use Cases: Big Data analytics, mobile & gaming applications, content distribution…
- 99.99% Availability
- Used for frequently accessed data
- Low latency and high throughput
- Sustain 2 concurrent facility failures
- Use Cases: Big Data analytics, mobile & gaming applications, content distribution…
## S3 Storage Classes Infrequent Access
### S3 Storage Classes - Infrequent Access
* For data that is less frequently accessed, but requires rapid access when needed
* Lower cost than S3 Standard
- For data that is less frequently accessed, but requires rapid access when needed
- Lower cost than S3 Standard
### S3 Standard Infrequent Access (S3 Standard-IA)
#### S3 Standard Infrequent Access (S3 Standard-IA)
* 99.9% Availability
* Use cases: Disaster Recovery, backups
- 99.9% Availability
- Use cases: Disaster Recovery, backups
### S3 One Zone Infrequent Access (S3 One Zone-IA)
#### S3 One Zone Infrequent Access (S3 One Zone-IA)
* High durability (99.999999999%) in a single AZ; data lost when AZ is destroyed
* 99.5% Availability
* Use Cases: Storing secondary backup copies of on-premise data, or data you can recreate
- High durability (99.999999999%) in a single AZ; data lost when AZ is destroyed
- 99.5% Availability
- Use Cases: Storing secondary backup copies of on-premise data, or data you can recreate
## Amazon S3 Glacier Storage Classes
### Amazon S3 Glacier Storage Classes
* Low-cost object storage meant for archiving / backup
* Pricing: price for storage + object retrieval cost
- Low-cost object storage meant for archiving / backup
- Pricing: price for storage + object retrieval cost
### Amazon S3 Glacier Instant Retrieval
#### Amazon S3 Glacier Instant Retrieval
* Millisecond retrieval, great for data accessed once a quarter
* Minimum storage duration of 90 days
- Millisecond retrieval, great for data accessed once a quarter
- Minimum storage duration of 90 days
### Amazon S3 Glacier Flexible Retrieval (formerly Amazon S3 Glacier)
#### Amazon S3 Glacier Flexible Retrieval (formerly Amazon S3 Glacier)
* Expedited (1 to 5 minutes), Standard (3 to 5 hours), Bulk (5 to 12 hours) free
* Minimum storage duration of 90 days
- Expedited (1 to 5 minutes), Standard (3 to 5 hours), Bulk (5 to 12 hours) free
- Minimum storage duration of 90 days
### Amazon S3 Glacier Deep Archive for long term storage
#### Amazon S3 Glacier Deep Archive - for long term storage
* Standard (12 hours), Bulk (48 hours)
* Minimum storage duration of 180 days
- Standard (12 hours), Bulk (48 hours)
- Minimum storage duration of 180 days
## S3 Intelligent-Tiering
### S3 Intelligent-Tiering
* Small monthly monitoring and auto-tiering fee
* Moves objects automatically between Access Tiers based on usage
* There are no retrieval charges in S3 Intelligent-Tiering
* Frequent Access tier (automatic): default tier
* Infrequent Access tier (automatic): objects not accessed for 30 days
* Archive Instant Access tier (automatic): objects not accessed for 90 days
* Archive Access tier (optional): configurable from 90 days to 700+ days
* Deep Archive Access tier (optional): config. from 180 days to 700+ days
- Small monthly monitoring and auto-tiering fee
- Moves objects automatically between Access Tiers based on usage
- There are no retrieval charges in S3 Intelligent-Tiering
- Frequent Access tier (automatic): default tier
- Infrequent Access tier (automatic): objects not accessed for 30 days
- Archive Instant Access tier (automatic): objects not accessed for 90 days
- Archive Access tier (optional): configurable from 90 days to 700+ days
- Deep Archive Access tier (optional): config. from 180 days to 700+ days
## S3 Object Lock & Glacier Vault Lock
* S3 Object Lock
* Adopt a WORM (Write Once Read Many) model
* Block an object version deletion for a specified amount of time
* Glacier Vault Lock
* Adopt a WORM (Write Once Read Many) model
* Lock the policy for future edits (can no longer be changed)
* Helpful for compliance and data retention
- S3 Object Lock
- Adopt a WORM (Write Once Read Many) model
- Block an object version deletion for a specified amount of time
- Glacier Vault Lock
- Adopt a WORM (Write Once Read Many) model
- Lock the policy for future edits (can no longer be changed)
- Helpful for compliance and data retention
## Shared Responsibility Model for S3
AWS | YOU
---- | ----
Infrastructure (global security, durability, availability, sustain concurrent loss of data in two facilities) | S3 Versioning, S3 Bucket Policies, S3 Replication Setup
Configuration and vulnerability analysis | Logging and Monitoring, S3 Storage Classes
Compliance validation | Data encryption at rest and in transit
| AWS | YOU |
| ------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- |
| Infrastructure (global security, durability, availability, sustain concurrent loss of data in two facilities) | S3 Versioning, S3 Bucket Policies, S3 Replication Setup |
| Configuration and vulnerability analysis | Logging and Monitoring, S3 Storage Classes |
| Compliance validation | Data encryption at rest and in transit |
## AWS Snow Family
* Highly-secure, portable devices to collect and process data at the edge, and migrate data into and out of AWS
* Data migration:
* Snowcone
* Snowball Edge
* Snowmobile
* Edge computing:
* Snowcone
* Snowball Edge
- Highly-secure, portable devices to collect and process data at the edge, and migrate data into and out of AWS
- Data migration:
- Snowcone
- Snowball Edge
- Snowmobile
- Edge computing:
- Snowcone
- Snowball Edge
## Data Migrations with AWS Snow Family
### Data Migrations with AWS Snow Family
* **AWS Snow Family: offline devices to perform data migrations** If it takes more than a week to transfer over the network, use Snowball devices!
- **AWS Snow Family: offline devices to perform data migrations** If it takes more than a week to transfer over the network, use Snowball devices!
* Challenges:
* Limited connectivity
* Limited bandwidth
* High network cost
* Shared bandwidth (cant maximize the line)
* Connection stability
- Challenges:
- Limited connectivity
- Limited bandwidth
- High network cost
- Shared bandwidth (cant maximize the line)
- Connection stability
## Time to Transfer
### Time to Transfer
Data | 100 Mbps | 1Gbps | 10Gbps
10 TB | 12 days | 30 hours | 3 hours
100 TB | 124 days | 12 days | 30 hours
1 PB | 3 years | 124 days | 12 days
| Data | 100 Mbps | 1Gbps | 10Gbps |
| ------ | -------- | -------- | -------- |
| 10 TB | 12 days | 30 hours | 3 hours |
| 100 TB | 124 days | 12 days | 30 hours |
| 1 PB | 3 years | 124 days | 12 days |
## Snowball Edge (for data transfers)
### Snowball Edge (for data transfers)
* Physical data transport solution: move TBs or PBs of data in or out of AWS
* Alternative to moving data over the network (and paying network fees)
* Pay per data transfer job
* Provide block storage and Amazon S3-compatible object storage
* Snowball Edge Storage Optimized
* 80 TB of HDD capacity for block volume and S3 compatible object storage
* Snowball Edge Compute Optimized
* 42 TB of HDD capacity for block volume and S3 compatible object storage
* Use cases: large data cloud migrations, DC decommission, disaster recovery
- Physical data transport solution: move TBs or PBs of data in or out of AWS
- Alternative to moving data over the network (and paying network fees)
- Pay per data transfer job
- Provide block storage and Amazon S3-compatible object storage
- Snowball Edge Storage Optimized
- 80 TB of HDD capacity for block volume and S3 compatible object storage
- Snowball Edge Compute Optimized
- 42 TB of HDD capacity for block volume and S3 compatible object storage
- Use cases: large data cloud migrations, DC decommission, disaster recovery
## AWS Snowcone
### AWS Snowcone
* Small, portable computing, anywhere, rugged & secure, withstands harsh environments
* Light (4.5 pounds, 2.1 kg)
* Device used for edge computing, storage, and data transfer
* **8 TBs of usable storage**
* Use Snowcone where Snowball does not fit (space-constrained environment)
* Must provide your own battery / cables
* Can be sent back to AWS offline, or connect it to internet and use **AWS DataSync** to send data
- Small, portable computing, anywhere, rugged & secure, withstands harsh environments
- Light (4.5 pounds, 2.1 kg)
- Device used for edge computing, storage, and data transfer
- **8 TBs of usable storage**
- Use Snowcone where Snowball does not fit (space-constrained environment)
- Must provide your own battery / cables
- Can be sent back to AWS offline, or connect it to internet and use **AWS DataSync** to send data
## AWS Snowmobile
### AWS Snowmobile
* Transfer exabytes of data (1 EB = 1,000 PB = 1,000,000 TBs)
* Each Snowmobile has 100 PB of capacity (use multiple in parallel)
* High security: temperature controlled, GPS, 24/7 video surveillance
* **Better than Snowball if you transfer more than 10 PB**
- Transfer exabytes of data (1 EB = 1,000 PB = 1,000,000 TBs)
- Each Snowmobile has 100 PB of capacity (use multiple in parallel)
- High security: temperature controlled, GPS, 24/7 video surveillance
- **Better than Snowball if you transfer more than 10 PB**
Properties | Snowcone | Snowball Edge Storage Optimized | Snowmobile
---- | ---- | ---- | ----
Storage Capacity | 8 TB usable | 80 TB usable | < 100 PB
Migration Size | Up to 24 TB, online and offline | Up to petabytes, offline | Up to exabytes, offline
| Properties | Snowcone | Snowball Edge Storage Optimized | Snowmobile |
| ---------------- | ------------------------------- | ------------------------------- | ----------------------- |
| Storage Capacity | 8 TB usable | 80 TB usable | < 100 PB |
| Migration Size | Up to 24 TB, online and offline | Up to petabytes, offline | Up to exabytes, offline |
## Snow Family Usage Process
### Snow Family - Usage Process
1. Request Snowball devices from the AWS console for delivery
2. Install the snowball client / AWS OpsHub on your servers
@@ -307,78 +346,78 @@ Migration Size | Up to 24 TB, online and offline | Up to petabytes, offline | Up
## What is Edge Computing?
* Process data while its being created on an edge location
* A truck on the road, a ship on the sea, a mining station underground...
* These locations may have
* Limited / no internet access
* Limited / no easy access to computing power
* We setup a **Snowball Edge / Snowcone** device to do edge computing
* Use cases of Edge Computing:
* Preprocess data
* Machine learning at the edge
* Transcoding media streams
* Eventually (if need be) we can ship back the device to AWS (for transferring data for example)
- Process data while its being created on an edge location
- A truck on the road, a ship on the sea, a mining station underground...
- These locations may have
- Limited / no internet access
- Limited / no easy access to computing power
- We setup a **Snowball Edge / Snowcone** device to do edge computing
- Use cases of Edge Computing:
- Preprocess data
- Machine learning at the edge
- Transcoding media streams
- Eventually (if need be) we can ship back the device to AWS (for transferring data for example)
## Snow Family Edge Computing
## Snow Family - Edge Computing
* **Snowcone (smaller)**
* 2 CPUs, 4 GB of memory, wired or wireless access
* USB-C power using a cord or the optional battery
* **Snowball Edge Compute Optimized**
* 52 vCPUs, 208 GiB of RAM
* Optional GPU (useful for video processing or machine learning)
* 42 TB usable storage
* **Snowball Edge Storage Optimized**
* Up to 40 vCPUs, 80 GiB of RAM
* Object storage clustering available
* All: Can run EC2 Instances & AWS Lambda functions (using AWS IoT Greengrass)
* Long-term deployment options: 1 and 3 years discounted pricing
- **Snowcone (smaller)**
- 2 CPUs, 4 GB of memory, wired or wireless access
- USB-C power using a cord or the optional battery
- **Snowball Edge Compute Optimized**
- 52 vCPUs, 208 GiB of RAM
- Optional GPU (useful for video processing or machine learning)
- 42 TB usable storage
- **Snowball Edge Storage Optimized**
- Up to 40 vCPUs, 80 GiB of RAM
- Object storage clustering available
- All: Can run EC2 Instances & AWS Lambda functions (using AWS IoT Greengrass)
- Long-term deployment options: 1 and 3 years discounted pricing
## AWS OpsHub
* Historically, to use Snow Family devices, you needed a CLI (Command Line Interface tool)
* Today, you can use **AWS OpsHub** (a software you install on your computer / laptop) to manage your Snow Family Device
* Unlocking and configuring single or clustered devices
* Transferring files
* Launching and managing instances running on Snow Family Devices
* Monitor device metrics (storage capacity, active instances on your device)
* Launch compatible AWS services on your devices (ex: Amazon EC2 instances, AWS DataSync, Network File System (NFS))
- Historically, to use Snow Family devices, you needed a CLI (Command Line Interface tool)
- Today, you can use **AWS OpsHub** (a software you install on your computer / laptop) to manage your Snow Family Device
- Unlocking and configuring single or clustered devices
- Transferring files
- Launching and managing instances running on Snow Family Devices
- Monitor device metrics (storage capacity, active instances on your device)
- Launch compatible AWS services on your devices (ex: Amazon EC2 instances, AWS DataSync, Network File System (NFS))
## Hybrid Cloud for Storage
* AWS is pushing for ”hybrid cloud”
* Part of your infrastructure is on-premises
* Part of your infrastructure is on the cloud
* This can be due to
* Long cloud migrations
* Security requirements
* Compliance requirements
* IT strategy
* S3 is a proprietary storage technology (unlike EFS / NFS), so how do you expose the S3 data on-premise?
* AWS Storage Gateway!
- AWS is pushing for ”hybrid cloud”
- Part of your infrastructure is on-premises
- Part of your infrastructure is on the cloud
- This can be due to
- Long cloud migrations
- Security requirements
- Compliance requirements
- IT strategy
- S3 is a proprietary storage technology (unlike EFS / NFS), so how do you expose the S3 data on-premise?
- AWS Storage Gateway!
## AWS Storage Gateway
* Bridge between on-premise data and cloud data in S3
* Hybrid storage service to allow on- premises to seamlessly use the AWS Cloud
* Use cases: disaster recovery, backup & restore, tiered storage
* Types of Storage Gateway:
* File Gateway
* Volume Gateway
* Tape Gateway
* No need to know the types at the exam
- Bridge between on-premise data and cloud data in S3
- Hybrid storage service to allow on- premises to seamlessly use the AWS Cloud
- Use cases: disaster recovery, backup & restore, tiered storage
- Types of Storage Gateway:
- File Gateway
- Volume Gateway
- Tape Gateway
- No need to know the types at the exam
## Amazon S3 Summary
## Amazon S3 - Summary
* Buckets vs Objects: global unique name, tied to a region
* S3 security: IAM policy, S3 Bucket Policy (public access), S3 Encryption
* S3 Websites: host a static website on Amazon S3
* S3 Versioning: multiple versions for files, prevent accidental deletes
* S3 Access Logs: log requests made within your S3 bucket
* S3 Replication: same-region or cross-region, must enable versioning
* S3 Storage Classes: Standard, IA, 1Z-IA, Intelligent, Glacier, Glacier Deep Archive
* S3 Lifecycle Rules: transition objects between classes
* S3 Glacier Vault Lock / S3 Object Lock: WORM (Write Once Read Many)
* Snow Family: import data onto S3 through a physical device, edge computing
* OpsHub: desktop application to manage Snow Family devices
* Storage Gateway: hybrid solution to extend on-premises storage to S3
- Buckets vs Objects: global unique name, tied to a region
- S3 security: IAM policy, S3 Bucket Policy (public access), S3 Encryption
- S3 Websites: host a static website on Amazon S3
- S3 Versioning: multiple versions for files, prevent accidental deletes
- S3 Access Logs: log requests made within your S3 bucket
- S3 Replication: same-region or cross-region, must enable versioning
- S3 Storage Classes: Standard, IA, 1Z-IA, Intelligent, Glacier, Glacier Deep Archive
- S3 Lifecycle Rules: transition objects between classes
- S3 Glacier Vault Lock / S3 Object Lock: WORM (Write Once Read Many)
- Snow Family: import data onto S3 through a physical device, edge computing
- OpsHub: desktop application to manage Snow Family devices
- Storage Gateway: hybrid solution to extend on-premises storage to S3