[Modify/Add] Add Databases & Analytics, and Other Compute Service Doc.

2024-10-20 17:46:26 +09:00
parent 4224f41b73
commit daad911e23
6 changed files with 510 additions and 6 deletions
--- a/README.md
+++ b/README.md
@@ -21,6 +21,10 @@
  - Scalability & High Availability, Vertical Scalability, Horizontal Scalability, High Availability, High Availability & Scalability For EC2, Scalability vs Elasticity (vs Agility), What is load balancing?, What’s an Auto Scaling Group?
 - [Amazon S3](./sections/s3.md)
  - S3 Use cases, Amazon S3 Overview - Buckets, Amazon S3 Overview - Objects, S3 Websites, S3 Storage Classes, S3 Object Lock & Glacier Vault Lock, Shared Responsibility Model for S3, AWS Snow Family, What is Edge Computing?, Snow Family - Edge Computing, AWS OpsHub, Hybrid Cloud for Storage, AWS Storage Gateway
 - [Databases & Analytics](./sections/databases.md)
  - Databases Intro, Relational Databases, NoSQL Databases, Databases & Shared Responsibility on AWS, AWS RDS Overview, Amazon Aurora, Amazon ElastiCache Overview, DynamoDB, Redshift Overview, Amazon EMR, Amazon Athena, Amazon QuickSight, DocumentDB, Amazon Neptune, Amazon QLDB
 - [Other Compute Section](./sections/other_compute.md)
  - What is Docker?, ECS, Fargate, ECR, What’s serverless?, Why AWS Lambda ?, Amazon API Gateway, AWS Batch, Batch vs Lambda, Amazon Lightsail, Lambda Summary
 ## Practice Exams ( dumps )
--- a/sections/databases.md
+++ b/sections/databases.md
@@ -0,0 +1,306 @@
 # Databases & Analytics
 - [Databases \& Analytics](#databases--analytics)
  - [Databases Intro](#databases-intro)
  - [Relational Databases (SQL)](#relational-databases-sql)
  - [NoSQL Databases](#nosql-databases)
    - [NoSQL data example: JSON](#nosql-data-example-json)
  - [Databases \& Shared Responsibility on AWS](#databases--shared-responsibility-on-aws)
  - [AWS RDS Overview](#aws-rds-overview)
    - [Advantage over using RDS versus deploying DB on EC2](#advantage-over-using-rds-versus-deploying-db-on-ec2)
    - [RDS Deployments](#rds-deployments)
    - [RDS Deployments: Read Replicas, Multi-AZ](#rds-deployments-read-replicas-multi-az)
    - [RDS Deployments: Multi-Region](#rds-deployments-multi-region)
  - [Amazon Aurora](#amazon-aurora)
  - [Amazon ElastiCache Overview](#amazon-elasticache-overview)
  - [DynamoDB](#dynamodb)
    - [DynamoDB Accelerator (DAX)](#dynamodb-accelerator-dax)
    - [DynamoDB Global Tables](#dynamodb-global-tables)
  - [Redshift Overview](#redshift-overview)
  - [Amazon EMR (Elastic MapReduce)](#amazon-emr-elastic-mapreduce)
  - [Amazon Athena](#amazon-athena)
  - [Amazon QuickSight](#amazon-quicksight)
  - [DocumentDB (with MongoDB Compatibility)](#documentdb-with-mongodb-compatibility)
  - [Amazon Neptune](#amazon-neptune)
  - [Amazon QLDB](#amazon-qldb)
  - [Amazon Managed Blockchain](#amazon-managed-blockchain)
  - [AWS Glue](#aws-glue)
  - [DMS - Database Migration Service](#dms---database-migration-service)
  - [Databases \& Analytics Summary](#databases--analytics-summary)
 ## Databases Intro
 - Storing data on disk (EFS, EBS, EC2 Instance Store, S3) can have its limits
 - Sometimes, you want to store data in a database…
 - You can structure the data
 - You build indexes to efficiently query / search through the data
 - You define relationships between your datasets
 - Databases are optimized for a purpose and come with different features, shapes and constraint
 - **Managed Databases**: AWS takes care of maintenance, backups, and security for databases.
 - **Benefits**: Reduced operational complexity, built-in high availability, disaster recovery, scalability, and enhanced security.
 - **Types**:
  - **Relational Databases** (SQL)
  - **NoSQL Databases**
  - **Data Warehousing**
  - **In-memory Caching**
 ## Relational Databases (SQL)
 - **Structured Data**: Stored in predefined schema tables, managed with SQL.
 - **Use Cases**: Transactional applications, financial systems.
 - **Examples**: MySQL, PostgreSQL, Oracle, SQL Server, MariaDB.
 ## NoSQL Databases
 - **Flexible Schema**: No predefined schema, designed for fast and scalable data storage.
 - **Use Cases**: Real-time applications, IoT, mobile apps.
 - Benefits:
  - Flexibility: easy to evolve data model
  - Scalability: designed to scale-out by using distributed clusters
  - High-performance: optimized for a specific data model
  - Highly functional: types optimized for the data model
 - **Examples**: DynamoDB, MongoDB (DocumentDB), Key-value, document, graph, in-memory, search databases
 ### NoSQL data example: JSON
 - JSON is a common form of data that fits into a NoSQL model
 - Data can be nested
 - Fields can change over time
 - Support for new types: arrays, etc…
 ```json
 {
  "name": "Abc",
  "age": 30,
  "cars": [
    "Ford",
    "BMW",
    "Fiat"
  ],
  "address": {
    "type": "house",
    "number": 23,
    "street": "Abc Road"
  }
 }
 ```
 ## Databases & Shared Responsibility on AWS
 | **AWS Responsibility**                      | **Customer Responsibility**                      |
 | ------------------------------------------- | ------------------------------------------------ |
 | Infrastructure management, backups, patches | Data security, encryption, access controls (IAM) |
 | Availability and failover                   | Data management, monitoring, performance tuning  |
 ## AWS RDS Overview
 - **RDS (Relational Database Service)**: Fully managed service for relational databases.
  - It’s a managed DB service for DB use SQL as a query language.
  - Supports **MySQL**, **PostgreSQL**, **MariaDB**, **Oracle**, **SQL Server**.
  - Handles **backup**, **patching**, **high availability** (Multi-AZ), and **scaling**.
 ### Advantage over using RDS versus deploying DB on EC2
 - RDS is a managed service:
  - Automated provisioning, OS patching
  - Continuous backups and restore to specific timestamp (Point in Time Restore)!
  - Monitoring dashboards
  - Read replicas for improved read performance
  - Multi AZ setup for DR (Disaster Recovery)
  - Maintenance windows for upgrades
  - Scaling capability (vertical and horizontal)
  - Storage backed by EBS (gp2 or io1)
 - BUT you can’t SSH into your instances
 ### RDS Deployments
 - **Read Replicas**: Improves read performance, **asynchronous** replication.
 - **Multi-AZ**: Automatic failover, high availability for production environments.
 - **Multi-Region**: Disaster recovery across regions, global availability.
 ### RDS Deployments: Read Replicas, Multi-AZ
 | Read Replicas                       | Multi-AZ                                          |
 | ----------------------------------- | ------------------------------------------------- |
 | Scale the read workload of your DB  | Failover in case of AZ outage (high availability) |
 | Can create up to 5 Read Replicas    | Data is only read/written to the main database    |
 | Data is only written to the main DB | Can only have 1 other AZ as failover              |
 ![Read Replicas Multi-AZ](../images/read_replicas_multi_AZ.png)
 ### RDS Deployments: Multi-Region
 - Multi-Region (Read Replicas)
  - Disaster recovery in case of region issue
  - Local performance for global reads
  - Replication cost
 ![Multi-Region](../images/multi_region.png)
 ## Amazon Aurora
 - **Amazon Aurora**: High-performance RDS database.
  - Compatible with **MySQL** and **PostgreSQL**.
  - **5x faster** than MySQL, **3x faster** than PostgreSQL.
  - **Auto-scaling** storage up to **64 TB**.
  - Supports **Multi-AZ** and up to **15 read replicas**.
  - Great for **enterprise-grade** applications requiring high availability and performance.
  - Aurora costs more than RDS (20% more) – but is more efficient
 ## Amazon ElastiCache Overview
 - **ElastiCache**: In-memory data caching service.
  - **Redis**: Advanced key-value store with replication and persistence.
  - **Memcached**: Simple, memory-only caching service.
  - Reduces database load and speeds up applications by **caching frequent queries**.
  - Caches are in-memory databases with high performance, low latency
  - AWS takes care of OS maintenance / patching, optimizations, setup, configuration, monitoring, failure recovery and backup
 ## DynamoDB
 - Fully managed, serverless NoSQL database.
 - Supports key-value and document data models.
 - Automatically scales based on demand.
 - Provides high availability and durability with replication across 3 AZ
 - Millions of requests per seconds, trillions of row, 100s of TB of storage
 - Fast and consistent in performance
 - Single-digit millisecond latency – low latency retrieval
 - Integrated with IAM for security, authorization and administration
 - Low cost and auto scaling capabilities
 - Standard & Infrequent Access (IA) Table Class
 ### DynamoDB Accelerator (DAX)
 - In-memory caching for DynamoDB.
 - **10x faster** read performance.  ingle-digit millisecond latency to microseconds latency – when accessing your DynamoDB tables
 - Secure, highly scalable & highly available
 - Ideal for use cases where **low-latency reads** are critical.
 ### DynamoDB Global Tables
 - Multi-region replication for **global** applications.
 - **Low-latency** reads and writes across multiple regions.
 - Ensures data availability globally with **multi-master replication**.
 ## Redshift Overview
 - Managed data warehousing service.
 - Optimized for **online analytical processing (OLAP)** and big data analytics.
 - Uses **columnar storage** for fast query performance.
 - 10x better performance than other data warehouses, scale to PBs of data
 - Columnar storage of data (instead of row based)
 - Supports integration with **BI tools** (QuickSight, Tableau).
 - Massively Parallel Query Execution (MPP), highly available.
 - Has a SQL interface for performing the queries.
 - Pay-per-query or **reserved instances** for cost savings.
 - Designed for **massive datasets**.
 ## Amazon EMR (Elastic MapReduce)
 - Managed big data processing service.
 - Uses **Hadoop**, **Apache Spark**, and **Hive** for processing large data sets.
 - Ideal for **data transformation**, **machine learning**, and **ETL** (Extract, Transform, Load).
 - Integration with **S3**, **DynamoDB**, and **Redshift**.
 - The clusters can be made of hundreds of EC2 instances
 - EMR takes care of all the provisioning and configuration
 - Auto-scaling and integrated with Spot instances
 - Use cases: data processing, machine learning, web indexing, big data
 ## Amazon Athena
 - Serverless query service
 - Use **SQL** to query structured and unstructured data stored in **S3**.
 - No infrastructure to manage, pay-per-query.
 - Supports various formats like **CSV**, **JSON**, **Parquet**, and **ORC**.
 - Pricing: $5.00 per TB of data scanned
 - Use compressed or columnar data for cost-savings (less scan)
 - Use cases: Business intelligence / analytics / reporting, analyze & query VPC Flow Logs, ELB Logs, CloudTrail trails, etc...
 - Analyze data in S3 using serverless SQL, use Athena
 ## Amazon QuickSight
 - Business Intelligence (BI) tool for data visualization.
 - Serverless machine learning-powered business intelligence service to create interactive dashboards
 - Fast, automatically scalable, embeddable, with per-session pricing
 - Supports data from S3, Redshift, RDS, and other AWS data sources.
 - **Pay-per-session** pricing model for cost efficiency.
 - Use cases:
  - Business analytics
  - Building visualizations
  - Perform ad-hoc analysis
  - Get business insights using data
 ## DocumentDB (with MongoDB Compatibility)
 - Managed document database, **MongoDB-compatible**.
 - DocumentDB is the same for MongoDB (which is a NoSQL database)
 - Highly scalable and durable with **Multi-AZ**.
 - Built for **JSON** document storage.
 - Aurora storage automatically grows in increments of 10GB, up to 64 TB.
 - Automatically scales to workloads with millions of requests per seconds
 - Use cases: Content management, cataloging, and mobile backends.
 ## Amazon Neptune
 - Fully managed graph database
 - A popular graph dataset would be a social network
  - Users have friends
  - Posts have comments
  - Comments have likes from users
  - Users share and like posts…
 - Highly available across 3 AZ, with up to 15 read replicas
 - Build and run applications working with highly connected datasets – optimized for these complex and hard queries
 - Can store up to billions of relations and query the graph with milliseconds latency
 - Highly available with replications across multiple AZs
 - Great for knowledge graphs (Wikipedia), fraud detection, recommendation engines, social networking
 ## Amazon QLDB
 - QLDB stands for ”Quantum Ledger Database”
 - A ledger is a book **recording financial transactions**
 - Fully Managed, Serverless, High available, Replication across 3 AZ
 - Used to **review history of all the changes made to your application data** over time
 - **Immutable** system: no entry can be removed or modified, cryptographically verifiable
 - 2-3x better performance than common ledger blockchain frameworks, manipulate data using SQL
 - Difference with Amazon Managed Blockchain: no decentralization component, in accordance with financial regulation rules
 ## Amazon Managed Blockchain
 - Blockchain makes it possible to build applications where multiple parties can execute transactions without the need for a trusted, central authority.
 - Amazon Managed Blockchain is a managed service to:
  - Join public blockchain networks
  - Or create your own scalable private network
 - Compatible with the frameworks Hyperledger Fabric & Ethereum
 ## AWS Glue
 - Managed extract, transform, and load (ETL) service
 - Useful to prepare and transform data for analytics
 - Fully serverless service
 - Glue Data Catalog: catalog of datasets
  - can be used by Athena, Redshift, EMR
 ## DMS - Database Migration Service
 - Quickly and securely migrate databases to AWS, resilient, self healing
 - The source database remains available during the migration
 - Supports:
  - Homogeneous migrations: ex Oracle to Oracle
  - Heterogeneous migrations: ex Microsoft SQL Server to Aurora
 ## Databases & Analytics Summary
 - Relational Databases - OLTP: RDS & Aurora (SQL)
 - Differences between Multi-AZ, Read Replicas, Multi-Region
 - In-memory Database: ElastiCache
 - Key/Value Database: DynamoDB (serverless) & DAX (cache for DynamoDB)
 - Warehouse - OLAP: Redshift (SQL)
 - Hadoop Cluster: EMR
 - Athena: query data on Amazon S3 (serverless & SQL)
 - QuickSight: dashboards on your data (serverless)
 - DocumentDB: “Aurora for MongoDB” (JSON – NoSQL database)
 - Amazon QLDB: Financial Transactions Ledger (immutable journal, cryptographically verifiable)
 - Amazon Managed Blockchain: managed Hyperledger Fabric & Ethereum blockchains
 - Glue: Managed ETL (Extract Transform Load) and Data Catalog service
 - Database Migration: DMS
 - Neptune: graph database
--- a/sections/elb_asg.md
+++ b/sections/elb_asg.md
@@ -58,7 +58,7 @@
 ## Scalability vs Elasticity (vs Agility)
 | **Term**        | **Definition**                                                                                                    |
-|--------------------|--------------------------------------------------------------------------------------------------|
+| --------------- | ----------------------------------------------------------------------------------------------------------------- |
 | **Scalability** | Ability to increase or decrease the capacity to handle varying levels of traffic or load.                         |
 | **Elasticity**  | Automatically adjusts resources up or down based on the load in real-time, preventing under or over-provisioning. |
 | **Agility**     | The ability to deploy and manage resources quickly and efficiently in response to changing demands.               |
--- a/sections/iam.md
+++ b/sections/iam.md
@@ -38,6 +38,7 @@
 - **Users**: Represent individual identities that interact with AWS services. Users have unique credentials (username, password, access keys).
 - **Groups**: Logical grouping of users to simplify permission management.
  - Permissions assigned to a group are automatically inherited by its users.
 - Flexibility in User Management in IAM, users do not have to belong to a group, and a user can belong to multiple groups. This allows user to manage access permissions in a granular and efficient manner. For example, a user could belong to both the “QAs" group and the “Developers” group, inheriting permissions from both.
 | **IAM Users**                                              | **IAM Groups**                                           |
 |------------------------------------------------------------|----------------------------------------------------------|
@@ -55,6 +56,8 @@
 ### IAM Policies Inheritance
 ![IAM Policies Inheritance](../images/IAM_Policies_inheritance.png)
 - Policies are evaluated together for a user, including:
  - **Directly attached policies**.
  - **Group policies**.
--- a/sections/other_compute.md
+++ b/sections/other_compute.md
@@ -0,0 +1,192 @@
 # Other Compute
 ## What is Docker?
 - Docker is a software development platform to deploy apps
 - Apps are packaged in containers that can be run on any OS
 - Apps run the same, regardless of where they’re run
  - Any machine
  - No compatibility issues
  - Predictable behavior
  - Less work
  - Easier to maintain and deploy
  - Works with any language, any OS, any technology
 - Scale containers up and down very quickly (seconds)
 ### Where are Docker images stored?
 - **Docker Hub**: Centralized public repository for storing Docker images.
 - Public: Docker Hub <https://hub.docker.com/>
 - Private: **Amazon ECR (Elastic Container Registry)**: AWS service for storing, managing, and deploying container images.
 ### Docker versus Virtual Machines
 - Docker is ”sort of” a virtualization technology, but not exactly
 - Resources are shared with the host => many containers on one server
 | **Docker Containers**                  | **Virtual Machines (VMs)**                |
 | -------------------------------------- | ----------------------------------------- |
 | Lightweight, shares the host OS kernel | Heavier, includes full OS                 |
 | Starts in seconds                      | Slower startup (minutes)                  |
 | Portable, fast scaling                 | Not as portable, more resource-intensive  |
 | Best for microservices & modern apps   | Best for running multiple OS environments |
 ## ECS (Elastic Container Service)
 - Fully managed container orchestration service.
 - Supports Docker containers.
 - Launch Docker containers on AWS
 - AWS takes care of starting / stopping containers
 - **Two launch modes**: **EC2** (self-managed instances) and **Fargate** (serverless).
 - Provides integration with IAM, VPC, ELB, and ECR.
 ## Fargate
 - Serverless compute engine for containers, works with ECS and EKS.
 - No need to manage EC2 instances.
 - Pay for resources used (vCPU and memory).
 - AWS just runs containers for you based on the CPU / RAM you need
 ## ECR (Elastic Container Registry)
 - Fully managed Docker container registry.
 - Stores, manages, and secures Docker images.
 - Integrated with **ECS**, **EKS**, and **Fargate** for easy deployment.
 - This is where you store your Docker images so they can be run by ECS or Fargate
 ## What’s Serverless?
 - No need to provision, scale, or manage servers.
 - Resources are automatically provisioned and scaled by AWS.
 - Serverless is a new paradigm in which the developers don’t have to manage servers anymore…
 - They just deploy code
 - They just deploy… functions !
 - Initially... Serverless == FaaS (Function as a Service)
 - Serverless was pioneered by AWS Lambda but now also includes anything that’s managed: “databases, messaging, storage, etc.”
 - Serverless does not mean there are no servers…
 - it means you just don’t manage / provision / see them
 - Ideal for event-driven and stateless applications.
 ## Why AWS Lambda?
 - Serverless compute service to run code without managing infrastructure.
 - Executes code in response to events (e.g., API calls, file uploads).
 - Scales automatically and you only pay for usage.
 | EC2                                                | Lambda                                    |
 | -------------------------------------------------- | ----------------------------------------- |
 | Virtual Servers in the Cloud                       | Virtual functions – no servers to manage! |
 | Limited by RAM and CPU                             | Limited by time - short executions        |
 | Continuously running                               | Run on-demand                             |
 | Scaling means intervention to add / remove servers | Scaling is automated!                     |
 ### Benefits of AWS Lambda
 - **No server management**: AWS handles the infrastructure.
 - **Automatic scaling**: Scales based on event triggers.
 - **Flexible scaling**: Runs from a few requests per day to thousands per second.
 - **Event-driven architecture**: Ideal for apps that need to respond to events.
 - Easy Pricing:
  - Pay per request and compute time
  - Free tier of 1,000,000 AWS Lambda requests and 400,000 GBs of compute time
 - Integrated with the whole AWS suite of services
 - Event-Driven: functions get invoked by AWS when needed
 - Integrated with many programming languages
 - Easy monitoring through AWS CloudWatch
 - Easy to get more resources per functions (up to 10GB of RAM!)
 - Increasing RAM will also improve CPU and network!
 ### AWS Lambda Language Support
 - Node.js
 - Python
 - Ruby
 - Java
 - Go
 - .NET Core
 - custom runtime (via container images) (community supported, example Rust)
 - Lambda Container Image
  - The container image must implement the Lambda Runtime API
  - ECS / Fargate is preferred for running arbitrary Docker images
 ### AWS Lambda Pricing: Example
 - Based on number of requests and execution time.
 - You can find overall pricing information here: <https://aws.amazon.com/lambda/pricing/>
 - First **1 million requests/month** are free.
 - After that, **$0.20 per million requests**.
 - **Execution duration**: $0.00001667 for every GB-second used (first 400,000 GB-seconds free per month).
 - - Pay per duration: (in increment of 1 ms)
  - 400,000 GB-seconds of compute time per month for FREE
  - == 400,000 seconds if function is 1GB RAM
  - == 3,200,000 seconds if function is 128 MB RAM
  - After that $1.00 for 600,000 GB-seconds
 - It is usually **very cheap** to run AWS Lambda so it’s **very popular**
 ## Amazon API Gateway
 - Managed service for creating, publishing, and monitoring REST, HTTP, and WebSocket APIs.
 - Integrates with AWS Lambda for fully serverless APIs.
 - Serverless and scalable
 - Support for security, user authentication, API throttling, API keys, monitoring.
 - **Throttling**, **caching**, and **authorization** features built-in.
 ## AWS Batch
 - Fully managed service for running batch processing workloads.
 - Dynamically provisions compute resources based on job requirements.
 - Suitable for large-scale data processing, such as machine learning and rendering tasks.
 - Efficiently run 100,000s of computing batch jobs on AWS
 - A “batch” job is a job with a start and an end (opposed to continuous)
 - Batch will dynamically launch EC2 instances or Spot Instances
 - AWS Batch provisions the right amount of compute / memory
 - You submit or schedule batch jobs and AWS Batch does the rest!
 - Batch jobs are defined as Docker images and run on ECS
 - Helpful for cost optimizations and focusing less on the infrastructure
 ## Batch vs Lambda
 | **AWS Batch**                               | **AWS Lambda**                             |
 | ------------------------------------------- | ------------------------------------------ |
 | Designed for **batch processing**           | Designed for **event-driven** architecture |
 | Handles large-scale compute jobs            | Executes short-lived functions             |
 | Custom EC2 instances or Fargate tasks       | Fully serverless, no server management     |
 | Jobs may take minutes to hours              | Max execution time of 15 minutes           |
 | Rely on EBS / instance store for disk space | Limited temporary disk space               |
 ## Amazon Lightsail
 - Virtual servers, storage, databases, and networking
 - Low & predictable pricing
 - Simpler alternative to using EC2, RDS, ELB, EBS, Route 53…
 - Great for people with little cloud experience!
 - Can setup notifications and monitoring of your Lightsail resources
 - Use cases:
  - Simple web applications (has templates for LAMP, Nginx, MEAN, Node.js…)
  - Websites (templates for WordPress, Magento, Plesk, Joomla)
  - Dev / Test environment
 - Has high availability but no auto-scaling, limited AWS integrations
 ## Lambda Summary
 - Lambda is Serverless, Function as a Service, seamless scaling, reactive
 - Lambda Billing:
  - By the time run x by the RAM provisioned
  - By the number of invocations
 - Language Support: many programming languages except (arbitrary) Docker
 - Invocation time: up to 15 minutes
 - Use cases:
  - Create Thumbnails for images uploaded onto S3
  - Run a Serverless cron job
 - API Gateway: expose Lambda functions as HTTP API
 ## Other Compute Summary
 - Docker: container technology to run applications
 - ECS: run Docker containers on EC2 instances
 - Fargate:
 - Run Docker containers without provisioning the infrastructure
 - Serverless offering (no EC2 instances)
 - ECR: Private Docker Images Repository
 - Batch: run batch jobs on AWS across managed EC2 instances
 - Lightsail: predictable & low pricing for simple application & DB stacks
--- a/sections/s3.md
+++ b/sections/s3.md
@@ -421,4 +421,3 @@
 - Snow Family: import data onto S3 through a physical device, edge computing
 - OpsHub: desktop application to manage Snow Family devices
 - Storage Gateway: hybrid solution to extend on-premises storage to S3