From 6f328932d64cede2269100b9a19b260a91be29ee Mon Sep 17 00:00:00 2001 From: kananinirav <30398499+kananinirav@users.noreply.github.com> Date: Sat, 13 Aug 2022 17:33:13 +0900 Subject: [PATCH] s3 doc added --- README.md | 3 +- s3.md | 384 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 386 insertions(+), 1 deletion(-) create mode 100644 s3.md diff --git a/README.md b/README.md index 5eaedc9..934e54e 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,8 @@ - [IAM: Identity Access & Management](/iam.md) - [EC2: Virtual Machines](/ec2.md) - [EC2 Instance Storage](/ec2_storage.md) - - [Elastic Load Balancing & Auto Scaling Groups Section](/elb_asg.md) + - [Elastic Load Balancing & Auto Scaling Groups](/elb_asg.md) + - [Amazon S3](/s3.md) ### Contributors diff --git a/s3.md b/s3.md new file mode 100644 index 0000000..65e1080 --- /dev/null +++ b/s3.md @@ -0,0 +1,384 @@ +# Amazon S3 + +## S3 Use cases + +* Backup and storage +* Disaster Recovery +* Archive +* Hybrid Cloud storage +* Application hosting +* Media hosting +* Data lakes & big data analytics +* Software delivery +* Static website + +## Amazon S3 Overview - Buckets + +* Amazon S3 allows people to store objects (files) in “buckets” (directories) +* Buckets must have a globally unique name (across all regions all accounts) +* Buckets are defined at the region level +* S3 looks like a global service but buckets are created in a region +* Naming convention + * No uppercase + * No underscore + * 3-63 characters long + * Not an IP + * Must start with lowercase letter or number + +## Amazon S3 Overview - Objects + +* Objects (files) have a Key +* The key is the FULL path: + * s3://my-bucket/my_file.txt + * s3://my-bucket/my_folder1/another_folder/my_file.txt +* The key is composed of **prefix** + **object name** + * s3://my-bucket/my_folder1/another_folder/my_file.txt +* There’s no concept of “directories” within buckets (although the UI will trick you to think otherwise) +* Just keys with very long names that contain slashes (“/”) +* Object values are the content of the body: + * Max Object Size is 5TB (5000GB) + * If uploading more than 5GB, must use “multi-part upload” +* Metadata (list of text key / value pairs – system or user metadata) + * Tags (Unicode key / value pair – up to 10) – useful for security / lifecycle + * Version ID (if versioning is enabled) + +## S3 Security + +* **User based** + * IAM policies - which API calls should be allowed for a specific user from IAM console +* **Resource Based** + * Bucket Policies - bucket wide rules from the S3 console - allows cross account + * Object Access Control List (ACL) – finer grain + * Bucket Access Control List (ACL) – less common +* **Note:** an IAM principal can access an S3 object if + * the user IAM permissions allow it OR the resource policy ALLOWS it + * AND there’s no explicit DENY +* **Encryption:** encrypt objects in Amazon S3 using encryption keys + +S3 Bucket Policies + +* JSON based policies + * Resources: buckets and objects + * Actions: Set of API to Allow or Deny + * Effect: Allow / Deny + Principal: The account or user to apply the policy to +* Use S3 bucket for policy to: + * Grant public access to the bucket + * Force objects to be encrypted at upload + * Grant access to another account (Cross Account) + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "sid": "PublicRead", + "Effect": "Allow", + "Principal": "*", + "Action": [ + "s3:GetObject" + ], + "Resource": [ + "arn:aws:s3::examplebucket/*" + ] + } + ] +} +``` + +## Bucket settings for Block Public Access + +* Block all public access: On + * Block public access to buckets and objects granted through new access control lists (ACLS): On + * Block public access to buckets and objects granted through any access control lists (ACLS): On + * Block public access to buckets and objects granted through new public bucket or access point policies: On + * Block public and cross-account access to buckets and objects through any public bucket or access point policies: On + +* These settings were created to prevent company data leaks +* If you know your bucket should never be public, leave these on +* Can be set at the account level + +## S3 Websites + +* S3 can host static websites and have them accessible on the www +* The website URL will be: +* bucket-name.s3-website-AWS-region.amazonaws.com + OR +* bucket-name.s3-website.AWS-region.amazonaws.com +* **If you get a 403 (Forbidden) error, make sure the bucket policy allows public reads!** + +## S3 -Versioning + +* You can version your files in Amazon S3 +* It is enabled at the bucket level +* Same key overwrite will increment the “version”: 1, 2, 3…. +* It is best practice to version your buckets + * Protect against unintended deletes (ability to restore a version) + * Easy roll back to previous version +* Notes: + * Any file that is not versioned prior to enabling versioning will have version “null” + * Suspending versioning does not delete the previous versions + +## S3 Access Logs + +* For audit purpose, you may want to log all access to S3 buckets +* Any request made to S3, from any account, authorized or denied, will be logged into another S3 bucket +* That data can be analyzed using data analysis tools… +* Very helpful to come down to the root cause of an issue, or audit usage, view suspicious patterns, etc… + +## S3 Replication (CRR & SRR) + +* Must enable versioning in source and destination +* Cross Region Replication (CRR) +* Same Region Replication (SRR) +* Buckets can be in different accounts +* Copying is asynchronous +* Must give proper IAM permissions to S3 +* CRR - Use cases: compliance, lower latency access, replication across accounts +* SRR – Use cases: log aggregation, live replication between production and test accounts + +## S3 Storage Classes + +* [Amazon S3 Standard - General Purpose](#s3-standard-general-purpose) +* [Amazon S3 Standard - Infrequent Access (IA)](#s3-standard-infrequent-access-s3-standard-ia) +* [Amazon S3 One Zone - Infrequent Access](#s3-one-zone-infrequent-access-s3-one-zone-ia) +* [Amazon S3 Glacier Instant Retrieval](#amazon-s3-glacier-instant-retrieval) +* [Amazon S3 Glacier Flexible Retrieval](#amazon-s3-glacier-flexible-retrieval-formerly-amazon-s3-glacier) +* [Amazon S3 Glacier Deep Archive](#amazon-s3-glacier-deep-archive-–-for-long-term-storage) +* [Amazon S3 Intelligent Tiering](#s3-intelligent-tiering) + +* Can move between classes manually or using S3 Lifecycle configurations + +## S3 Durability and Availability + +* Durability: + * High durability (99.999999999%, 11 9’s) of objects across multiple AZ + * If you store 10,000,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000 years + * Same for all storage classes +* Availability: + * Measures how readily available a service is + * Varies depending on storage class + * Example: S3 standard has 99.99% availability = not available 53 minutes a year + +## S3 Standard General Purpose + +* 99.99% Availability +* Used for frequently accessed data +* Low latency and high throughput +* Sustain 2 concurrent facility failures +* Use Cases: Big Data analytics, mobile & gaming applications, content distribution… + +## S3 Storage Classes – Infrequent Access + +* For data that is less frequently accessed, but requires rapid access when needed +* Lower cost than S3 Standard + +### S3 Standard Infrequent Access (S3 Standard-IA) + +* 99.9% Availability +* Use cases: Disaster Recovery, backups + +### S3 One Zone Infrequent Access (S3 One Zone-IA) + +* High durability (99.999999999%) in a single AZ; data lost when AZ is destroyed +* 99.5% Availability +* Use Cases: Storing secondary backup copies of on-premise data, or data you can recreate + +## Amazon S3 Glacier Storage Classes + +* Low-cost object storage meant for archiving / backup +* Pricing: price for storage + object retrieval cost + +### Amazon S3 Glacier Instant Retrieval + +* Millisecond retrieval, great for data accessed once a quarter +* Minimum storage duration of 90 days + +### Amazon S3 Glacier Flexible Retrieval (formerly Amazon S3 Glacier) + +* Expedited (1 to 5 minutes), Standard (3 to 5 hours), Bulk (5 to 12 hours) – free +* Minimum storage duration of 90 days + +### Amazon S3 Glacier Deep Archive – for long term storage + +* Standard (12 hours), Bulk (48 hours) +* Minimum storage duration of 180 days + +## S3 Intelligent-Tiering + +* Small monthly monitoring and auto-tiering fee +* Moves objects automatically between Access Tiers based on usage +* There are no retrieval charges in S3 Intelligent-Tiering +* Frequent Access tier (automatic): default tier +* Infrequent Access tier (automatic): objects not accessed for 30 days +* Archive Instant Access tier (automatic): objects not accessed for 90 days +* Archive Access tier (optional): configurable from 90 days to 700+ days +* Deep Archive Access tier (optional): config. from 180 days to 700+ days + +## S3 Object Lock & Glacier Vault Lock + +* S3 Object Lock + * Adopt a WORM (Write Once Read Many) model + * Block an object version deletion for a specified amount of time +* Glacier Vault Lock + * Adopt a WORM (Write Once Read Many) model + * Lock the policy for future edits (can no longer be changed) + * Helpful for compliance and data retention + +## Shared Responsibility Model for S3 + +AWS | YOU +---- | ---- +Infrastructure (global security, durability, availability, sustain concurrent loss of data in two facilities) | S3 Versioning, S3 Bucket Policies, S3 Replication Setup +Configuration and vulnerability analysis | Logging and Monitoring, S3 Storage Classes +Compliance validation | Data encryption at rest and in transit + +## AWS Snow Family + +* Highly-secure, portable devices to collect and process data at the edge, and migrate data into and out of AWS +* Data migration: + * Snowcone + * Snowball Edge + * Snowmobile +* Edge computing: + * Snowcone + * Snowball Edge + +## Data Migrations with AWS Snow Family + +* **AWS Snow Family: offline devices to perform data migrations** If it takes more than a week to transfer over the network, use Snowball devices! + +* Challenges: + * Limited connectivity + * Limited bandwidth + * High network cost + * Shared bandwidth (can’t maximize the line) + * Connection stability + +## Time to Transfer + +Data | 100 Mbps | 1Gbps | 10Gbps +10 TB | 12 days | 30 hours | 3 hours +100 TB | 124 days | 12 days | 30 hours +1 PB | 3 years | 124 days | 12 days + +## Snowball Edge (for data transfers) + +* Physical data transport solution: move TBs or PBs of data in or out of AWS +* Alternative to moving data over the network (and paying network fees) +* Pay per data transfer job +* Provide block storage and Amazon S3-compatible object storage +* Snowball Edge Storage Optimized + * 80 TB of HDD capacity for block volume and S3 compatible object storage +* Snowball Edge Compute Optimized + * 42 TB of HDD capacity for block volume and S3 compatible object storage +* Use cases: large data cloud migrations, DC decommission, disaster recovery + +## AWS Snowcone + +* Small, portable computing, anywhere, rugged & secure, withstands harsh environments +* Light (4.5 pounds, 2.1 kg) +* Device used for edge computing, storage, and data transfer +* **8 TBs of usable storage** +* Use Snowcone where Snowball does not fit (space-constrained environment) +* Must provide your own battery / cables +* Can be sent back to AWS offline, or connect it to internet and use **AWS DataSync** to send data + +## AWS Snowmobile + +* Transfer exabytes of data (1 EB = 1,000 PB = 1,000,000 TBs) +* Each Snowmobile has 100 PB of capacity (use multiple in parallel) +* High security: temperature controlled, GPS, 24/7 video surveillance +* **Better than Snowball if you transfer more than 10 PB** + +Propertie | Snowcone | Snowball Edge Storage Optimized | Snowmobile +---- | ---- | ---- | ---- +Storage Capacity | 8 TB usable | 80 TB usable | < 100 PB +Migration Size | Up to 24 TB, online and offline | Up to petabytes, offline | Up to exabytes, offline + +## Snow Family – Usage Process + +1. Request Snowball devices from the AWS console for delivery +2. Install the snowball client / AWS OpsHub on your servers +3. Connect the snowball to your servers and copy files using the client +4. Ship back the device when you’re done (goes to the right AWS facility) +5. Data will be loaded into an S3 bucket +6. Snowball is completely wiped + +## What is Edge Computing? + +* Process data while it’s being created on an edge location + * A truck on the road, a ship on the sea, a mining station underground... +* These locations may have + * Limited / no internet access + * Limited / no easy access to computing power +* We setup a **Snowball Edge / Snowcone** device to do edge computing +* Use cases of Edge Computing: + * Preprocess data + * Machine learning at the edge + * Transcoding media streams +* Eventually (if need be) we can ship back the device to AWS (for transferring data for example) + +## Snow Family – Edge Computing + +* **Snowcone (smaller)** + * 2 CPUs, 4 GB of memory, wired or wireless access + * USB-C power using a cord or the optional battery +* **Snowball Edge – Compute Optimized** + * 52 vCPUs, 208 GiB of RAM + * Optional GPU (useful for video processing or machine learning) + * 42 TB usable storage +* **Snowball Edge – Storage Optimized** + * Up to 40 vCPUs, 80 GiB of RAM + * Object storage clustering available +* All: Can run EC2 Instances & AWS Lambda functions (using AWS IoT Greengrass) +* Long-term deployment options: 1 and 3 years discounted pricing + +## AWS OpsHub + +* Historically, to use Snow Family devices, you needed a CLI (Command Line Interface tool) +* Today, you can use **AWS OpsHub** (a software you install on your computer / laptop) to manage your Snow Family Device + * Unlocking and configuring single or clustered devices + * Transferring files + * Launching and managing instances running on Snow Family Devices + * Monitor device metrics (storage capacity, active instances on your device) + * Launch compatible AWS services on your devices (ex: Amazon EC2 instances, AWS DataSync, Network File System (NFS)) + +## Hybrid Cloud for Storage + +* AWS is pushing for ”hybrid cloud” + * Part of your infrastructure is on-premises + * Part of your infrastructure is on the cloud +* This can be due to + * Long cloud migrations + * Security requirements + * Compliance requirements + * IT strategy +* S3 is a proprietary storage technology (unlike EFS / NFS), so how do you expose the S3 data on-premise? +* AWS Storage Gateway! + +## AWS Storage Gateway + +* Bridge between on-premise data and cloud data in S3 +* Hybrid storage service to allow on- premises to seamlessly use the AWS Cloud +* Use cases: disaster recovery, backup & restore, tiered storage +* Types of Storage Gateway: + * File Gateway + * Volume Gateway + * Tape Gateway +* No need to know the types at the exam + +## Amazon S3 – Summary + +* Buckets vs Objects: global unique name, tied to a region +* S3 security: IAM policy, S3 Bucket Policy (public access), S3 Encryption +* S3 Websites: host a static website on Amazon S3 +* S3 Versioning: multiple versions for files, prevent accidental deletes +* S3 Access Logs: log requests made within your S3 bucket +* S3 Replication: same-region or cross-region, must enable versioning +* S3 Storage Classes: Standard, IA, 1Z-IA, Intelligent, Glacier, Glacier Deep Archive +* S3 Lifecycle Rules: transition objects between classes +* S3 Glacier Vault Lock / S3 Object Lock: WORM (Write Once Read Many) +* Snow Family: import data onto S3 through a physical device, edge computing +* OpsHub: desktop application to manage Snow Family devices +* Storage Gateway: hybrid solution to extend on-premises storage to S3 \ No newline at end of file