[Modify/Add] Add Cloud Monitoring Doc.

2024-12-19 22:23:11 +09:00
parent 7b88370b2a
commit 538a5617dc
2 changed files with 245 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -31,6 +31,8 @@
  - Why make a global application?, Amazon Route 53 Overview, Route 53 Routing Policies, AWS CloudFront, AWS Global Accelerator, AWS Outposts, AWS WaveLength, AWS Local Zones
 - [Cloud Integration](sections/cloud_integration.md)
  - Amazon SQS - Simple Queue Service, Amazon Kinesis, Amazon SNS, Amazon MQ
 - [Cloud Monitoring](./sections/cloud_monitoring.md)
  - Amazon CloudWatch, AWS CloudTrail, AWS X-Ray, Amazon CodeGuru, AWS Status - Service Health Dashboard, AWS Personal Health Dashboard
 ## Practice Exams ( dumps )
--- a/sections/cloud_monitoring.md
+++ b/sections/cloud_monitoring.md
@@ -0,0 +1,243 @@
 # Cloud Monitoring
 - [Cloud Monitoring](#cloud-monitoring)
  - [Amazon CloudWatch](#amazon-cloudwatch)
    - [Important Metrics](#important-metrics)
    - [Amazon CloudWatch Alarms](#amazon-cloudwatch-alarms)
    - [Amazon CloudWatch Logs](#amazon-cloudwatch-logs)
      - [CloudWatch Logs for EC2](#cloudwatch-logs-for-ec2)
    - [Amazon CloudWatch Events](#amazon-cloudwatch-events)
    - [Amazon EventBridge](#amazon-eventbridge)
  - [AWS CloudTrail](#aws-cloudtrail)
    - [CloudTrail Events](#cloudtrail-events)
    - [CloudTrail Insights Events](#cloudtrail-insights-events)
    - [CloudTrail Events Retention](#cloudtrail-events-retention)
  - [AWS X-Ray](#aws-x-ray)
    - [AWS X-Ray advantages](#aws-x-ray-advantages)
  - [Amazon CodeGuru](#amazon-codeguru)
    - [Amazon CodeGuru Reviewer](#amazon-codeguru-reviewer)
    - [Amazon CodeGuru Profiler](#amazon-codeguru-profiler)
  - [AWS Status - Service Health Dashboard](#aws-status---service-health-dashboard)
  - [AWS Personal Health Dashboard](#aws-personal-health-dashboard)
  - [Cloud Monitoring Summary](#cloud-monitoring-summary)
 ## Amazon CloudWatch
 - A monitoring and observability service for AWS resources and applications.
 - Enables real-time monitoring of AWS resources, applications, and custom metrics.
 - Metric is a variable to monitor (CPUUtilization, NetworkIn, etc..)
 - Can create CloudWatch dashboards of metrics
 **Key Features:**
 - Collect and track metrics.
 - Set alarms and take automated actions.
 - Store and access logs for troubleshooting.
 ### Important Metrics
 - **EC2 Instances:** CPU utilization, disk I/O, network I/O.
  - Default metrics every 5 minutes
  - Option for Detailed Monitoring ($$$): metrics every 1 minute
 - **EBS volumes**: Disk Read/Writes
 - **RDS Databases:** CPU utilization, free storage space, read/write IOPS.
 - **S3 Buckets:** Number of requests, latency, and errors., AllRequests
 - **Lambda Functions:** Invocation count, error count, duration.
 - **Billing**:Total Estimated Charge (only in us-east-1)
 - **Service Limits**: how much you’ve been using a service API
 - **Custom metrics**: push your own metrics
 ### Amazon CloudWatch Alarms
 - Trigger notifications or automated actions when a metric exceeds a threshold.
 - Examples:
  - Send an alert if EC2 CPU utilization exceeds 80%.
  - Scale out EC2 instances based on demand.
  - EC2 Actions: stop, terminate, reboot or recover an EC2 instance
  - SNS notifications: send a notification into an SNS topic
 - Various options (sampling, %, max, min, etc…)
 - Example: create a billing alarm on the CloudWatch Billing metric
 - Alarm States: OK. INSUFFICIENT_DATA, ALARM
 ### Amazon CloudWatch Logs
 - Centralized logging for AWS services and applications.
 - CloudWatch Logs can collect log from:
  - Elastic Beanstalk: collection of logs from application
  - ECS: collection from containers
  - AWS Lambda: collection from function logs
  - CloudTrail based on filter
  - CloudWatch log agents: on EC2 machines or on-premises servers
  - Route53: Log DNS queries
 - Enables real-time monitoring of logs
 - Adjustable CloudWatch Logs retention
 #### CloudWatch Logs for EC2
 - By default, no logs from your EC2 instance will go to CloudWatch
 - You need to run a CloudWatch agent on EC2 to push the log files you want
 - Make sure IAM permissions are correct
 - The CloudWatch log agent can be setup on-premises too
 ### Amazon CloudWatch Events
 - Delivers a stream of system events describing changes in AWS resources.
 - Example: Trigger a Lambda function when an EC2 instance state changes.
 - Schedule: Cron jobs (scheduled scripts)
  - Schedule Every hour => Trigger script on Lambda function
 - Event Pattern: Event rules to react to a service doing something
  - IAM Root User Sign in Event => SNS Topic with Email Notification
 - Trigger Lambda functions, send SQS/SNS messages
 ### Amazon EventBridge
 - EventBridge is the next evolution of CloudWatch Events
 - Default event bus: generated by AWS services (CloudWatch Events)
 - Partner event bus: receive events from SaaS service or applications (Zendesk, DataDog, Segment, Auth0…)
 - Custom Event buses: for your own applications
 - Schema Registry: model event schema
 - EventBridge has a different name to mark the new capabilities
 - The CloudWatch Events name will be replaced with EventBridge
 ## AWS CloudTrail
 - Tracks and logs API calls made in your AWS account for auditing and governance.
 - Useful for security analysis, compliance, and operational troubleshooting.
 - CloudTrail is enabled by default!
 - Get an history of events / API calls made within your AWS Account by:
  - Console
  - SDK
  - CLI
  - AWS Services
 - Can put logs from CloudTrail into CloudWatch Logs or S3
 - A trail can be applied to All Regions (default) or a single Region.
 - If a resource is deleted in AWS, investigate CloudTrail first!
 **Key Features:**
 - Logs API calls across AWS services, including CLI, SDK, and Management Console.
 - Tracks who made the call, when, and from where.
 ### CloudTrail Events
 - Management Events:
  - Operations that are performed on resources in your AWS account
  - Examples:
    - Configuring security (IAM AttachRolePolicy)
    - Configuring rules for routing data (Amazon EC2 CreateSubnet)
    - Setting up logging (AWS CloudTrail CreateTrail)
  - By default, trails are configured to log management events.
  - Can separate Read Events (that don’t modify resources) from Write Events (that may modify resources)
 - Data Events:
  - By default, data events are not logged (because high volume operations)
  - Amazon S3 object-level activity (ex: GetObject, DeleteObject, PutObject): can separate Read and Write Events
  - AWS Lambda function execution activity (the Invoke API)
 ### CloudTrail Insights Events
 - Enable CloudTrail Insights to detect unusual activity in your account:
  - inaccurate resource provisioning
  - hitting service limits
  - Bursts of AWS IAM actions
  - Gaps in periodic maintenance activity
 - CloudTrail Insights analyzes normal management events to create a baseline
 - And then continuously analyzes write events to detect unusual patterns
  - Anomalies appear in the CloudTrail console
  - Event is sent to Amazon S3
  - An EventBridge event is generated (for automation needs)
 ### CloudTrail Events Retention
 - Events are stored for 90 days in CloudTrail
 - To keep events beyond this period, log them to S3 and use Athena
 ## AWS X-Ray
 - Helps analyze and debug distributed applications by providing request tracing.
  - Test locally
  - Add log statements everywhere
  - Re-deploy in production
 **Key Features:**
 - Trace requests across AWS services and custom applications.
 - Identify performance bottlenecks and errors.
 - Visualize service maps to understand dependencies.
 ### AWS X-Ray advantages
 - Troubleshooting performance (bottlenecks)
 - Understand dependencies in a microservice architecture
 - Pinpoint service issues
 - Review request behavior
 - Find errors and exceptions
 - Are we meeting time SLA?
 - Where I am throttled?
 - Identify users that are impacted
 ## Amazon CodeGuru
 - Code review and performance profiling service.
 - Provides suggestions to improve the performance of applications.
 - Identifies the most costly lines of applications.
 - It is based on machine learning models long used at Amazon.
 - Identifies code errors and risks with automatic code reviews.
 - CodeGuru Reviewer: automated code reviews for static code analysis (development)
 - CodeGuru Profiler: visibility/recommendations about application performance during runtime (production)
 ### Amazon CodeGuru Reviewer
 - Uses machine learning to identify:
  - Security vulnerabilities.
  - Code inefficiencies.
  - Best practices violations.
 - Provides recommendations to improve code quality.
 - Supports Java and Python
 - Integrates with GitHub, Bitbucket, and AWS CodeCommit
 ### Amazon CodeGuru Profiler
 - Helps understand the runtime behavior of your application
 - Example: identify if your application is consuming excessive CPU capacity on a logging routine
 - Features:
  - Identify and remove code inefficiencies
  - Improve application performance (e.g., reduce CPU utilization)
  - Decrease compute costs
  - Provides heap summary (identify which objects using up memory)
  - Anomaly Detection
 - Support applications running on AWS or on- premise
 - Minimal overhead on application
 ## AWS Status - Service Health Dashboard
 - Service Health Dashboard is the single place to learn about the availability and operations of AWS services.
 - You can view the overall status of AWS services, and you can sign in to view personalized communications about your particular AWS account or organization.
 - Shows all regions, all services health
 - Shows historical information for each day
 - Has an RSS feed you can subscribe to
 - <https://status.aws.amazon.com/>
 ## AWS Personal Health Dashboard
 - AWS Personal Health Dashboard provides alerts and remediation guidance when AWS is experiencing events that may impact you.
 - While the Service Health Dashboard displays the general status of AWS services, Personal Health Dashboard gives you a personalized view into the performance and availability of the AWS services underlying your AWS resources.
 - The dashboard displays relevant and timely information to help you manage events in progress and provides proactive notification to help you plan for scheduled activities.
 - Global service <https://phd.aws.amazon.com/>
 - Shows how AWS outages directly impact you & your AWS resources
 - Alert, remediation, proactive, scheduled activities
 ## Cloud Monitoring Summary
 | **Service**               | **Key Features**                                                                   |
 | ------------------------- | ---------------------------------------------------------------------------------- |
 | Amazon CloudWatch         | Metrics, Alarms, Logs, Events, EventBridge.                                        |
 |                           | - Metrics: monitor the performance of AWS services and billing metrics             |
 |                           | - Alarms: automate notification, perform EC2 action, notify to SNS based on metric |
 |                           | - Logs: collect log files from EC2 instances, servers, Lambda functions…           |
 |                           | - Events (or EventBridge): react to events in AWS, or trigger a rule on a schedule |
 | AWS CloudTrail            | Tracks API calls, detects unusual activity.                                        |
 | CloudTrail Insights       | automated analysis of your CloudTrail Events                                       |
 | AWS X-Ray                 | Trace requests made through your distributed applications                          |
 | Amazon CodeGuru           | automated code reviews and application performance recommendations                 |
 | Service Health Dashboard  | status of all AWS services across all regions                                      |
 | Personal Health Dashboard | AWS events that impact your infrastructure                                         |