[Modify/Add] Add Cloud Monitoring Doc.
This commit is contained in:
@@ -31,6 +31,8 @@
|
|||||||
- Why make a global application?, Amazon Route 53 Overview, Route 53 Routing Policies, AWS CloudFront, AWS Global Accelerator, AWS Outposts, AWS WaveLength, AWS Local Zones
|
- Why make a global application?, Amazon Route 53 Overview, Route 53 Routing Policies, AWS CloudFront, AWS Global Accelerator, AWS Outposts, AWS WaveLength, AWS Local Zones
|
||||||
- [Cloud Integration](sections/cloud_integration.md)
|
- [Cloud Integration](sections/cloud_integration.md)
|
||||||
- Amazon SQS - Simple Queue Service, Amazon Kinesis, Amazon SNS, Amazon MQ
|
- Amazon SQS - Simple Queue Service, Amazon Kinesis, Amazon SNS, Amazon MQ
|
||||||
|
- [Cloud Monitoring](./sections/cloud_monitoring.md)
|
||||||
|
- Amazon CloudWatch, AWS CloudTrail, AWS X-Ray, Amazon CodeGuru, AWS Status - Service Health Dashboard, AWS Personal Health Dashboard
|
||||||
|
|
||||||
## Practice Exams ( dumps )
|
## Practice Exams ( dumps )
|
||||||
|
|
||||||
|
|||||||
243
sections/cloud_monitoring.md
Normal file
243
sections/cloud_monitoring.md
Normal file
@@ -0,0 +1,243 @@
|
|||||||
|
# Cloud Monitoring
|
||||||
|
|
||||||
|
- [Cloud Monitoring](#cloud-monitoring)
|
||||||
|
- [Amazon CloudWatch](#amazon-cloudwatch)
|
||||||
|
- [Important Metrics](#important-metrics)
|
||||||
|
- [Amazon CloudWatch Alarms](#amazon-cloudwatch-alarms)
|
||||||
|
- [Amazon CloudWatch Logs](#amazon-cloudwatch-logs)
|
||||||
|
- [CloudWatch Logs for EC2](#cloudwatch-logs-for-ec2)
|
||||||
|
- [Amazon CloudWatch Events](#amazon-cloudwatch-events)
|
||||||
|
- [Amazon EventBridge](#amazon-eventbridge)
|
||||||
|
- [AWS CloudTrail](#aws-cloudtrail)
|
||||||
|
- [CloudTrail Events](#cloudtrail-events)
|
||||||
|
- [CloudTrail Insights Events](#cloudtrail-insights-events)
|
||||||
|
- [CloudTrail Events Retention](#cloudtrail-events-retention)
|
||||||
|
- [AWS X-Ray](#aws-x-ray)
|
||||||
|
- [AWS X-Ray advantages](#aws-x-ray-advantages)
|
||||||
|
- [Amazon CodeGuru](#amazon-codeguru)
|
||||||
|
- [Amazon CodeGuru Reviewer](#amazon-codeguru-reviewer)
|
||||||
|
- [Amazon CodeGuru Profiler](#amazon-codeguru-profiler)
|
||||||
|
- [AWS Status - Service Health Dashboard](#aws-status---service-health-dashboard)
|
||||||
|
- [AWS Personal Health Dashboard](#aws-personal-health-dashboard)
|
||||||
|
- [Cloud Monitoring Summary](#cloud-monitoring-summary)
|
||||||
|
|
||||||
|
## Amazon CloudWatch
|
||||||
|
|
||||||
|
- A monitoring and observability service for AWS resources and applications.
|
||||||
|
- Enables real-time monitoring of AWS resources, applications, and custom metrics.
|
||||||
|
- Metric is a variable to monitor (CPUUtilization, NetworkIn, etc..)
|
||||||
|
- Can create CloudWatch dashboards of metrics
|
||||||
|
|
||||||
|
**Key Features:**
|
||||||
|
|
||||||
|
- Collect and track metrics.
|
||||||
|
- Set alarms and take automated actions.
|
||||||
|
- Store and access logs for troubleshooting.
|
||||||
|
|
||||||
|
### Important Metrics
|
||||||
|
|
||||||
|
- **EC2 Instances:** CPU utilization, disk I/O, network I/O.
|
||||||
|
- Default metrics every 5 minutes
|
||||||
|
- Option for Detailed Monitoring ($$$): metrics every 1 minute
|
||||||
|
- **EBS volumes**: Disk Read/Writes
|
||||||
|
- **RDS Databases:** CPU utilization, free storage space, read/write IOPS.
|
||||||
|
- **S3 Buckets:** Number of requests, latency, and errors., AllRequests
|
||||||
|
- **Lambda Functions:** Invocation count, error count, duration.
|
||||||
|
- **Billing**:Total Estimated Charge (only in us-east-1)
|
||||||
|
- **Service Limits**: how much you’ve been using a service API
|
||||||
|
- **Custom metrics**: push your own metrics
|
||||||
|
|
||||||
|
### Amazon CloudWatch Alarms
|
||||||
|
|
||||||
|
- Trigger notifications or automated actions when a metric exceeds a threshold.
|
||||||
|
- Examples:
|
||||||
|
- Send an alert if EC2 CPU utilization exceeds 80%.
|
||||||
|
- Scale out EC2 instances based on demand.
|
||||||
|
- EC2 Actions: stop, terminate, reboot or recover an EC2 instance
|
||||||
|
- SNS notifications: send a notification into an SNS topic
|
||||||
|
- Various options (sampling, %, max, min, etc…)
|
||||||
|
- Example: create a billing alarm on the CloudWatch Billing metric
|
||||||
|
- Alarm States: OK. INSUFFICIENT_DATA, ALARM
|
||||||
|
|
||||||
|
### Amazon CloudWatch Logs
|
||||||
|
|
||||||
|
- Centralized logging for AWS services and applications.
|
||||||
|
- CloudWatch Logs can collect log from:
|
||||||
|
- Elastic Beanstalk: collection of logs from application
|
||||||
|
- ECS: collection from containers
|
||||||
|
- AWS Lambda: collection from function logs
|
||||||
|
- CloudTrail based on filter
|
||||||
|
- CloudWatch log agents: on EC2 machines or on-premises servers
|
||||||
|
- Route53: Log DNS queries
|
||||||
|
- Enables real-time monitoring of logs
|
||||||
|
- Adjustable CloudWatch Logs retention
|
||||||
|
|
||||||
|
#### CloudWatch Logs for EC2
|
||||||
|
|
||||||
|
- By default, no logs from your EC2 instance will go to CloudWatch
|
||||||
|
- You need to run a CloudWatch agent on EC2 to push the log files you want
|
||||||
|
- Make sure IAM permissions are correct
|
||||||
|
- The CloudWatch log agent can be setup on-premises too
|
||||||
|
|
||||||
|
### Amazon CloudWatch Events
|
||||||
|
|
||||||
|
- Delivers a stream of system events describing changes in AWS resources.
|
||||||
|
- Example: Trigger a Lambda function when an EC2 instance state changes.
|
||||||
|
- Schedule: Cron jobs (scheduled scripts)
|
||||||
|
- Schedule Every hour => Trigger script on Lambda function
|
||||||
|
- Event Pattern: Event rules to react to a service doing something
|
||||||
|
- IAM Root User Sign in Event => SNS Topic with Email Notification
|
||||||
|
- Trigger Lambda functions, send SQS/SNS messages
|
||||||
|
|
||||||
|
### Amazon EventBridge
|
||||||
|
|
||||||
|
- EventBridge is the next evolution of CloudWatch Events
|
||||||
|
- Default event bus: generated by AWS services (CloudWatch Events)
|
||||||
|
- Partner event bus: receive events from SaaS service or applications (Zendesk, DataDog, Segment, Auth0…)
|
||||||
|
- Custom Event buses: for your own applications
|
||||||
|
- Schema Registry: model event schema
|
||||||
|
- EventBridge has a different name to mark the new capabilities
|
||||||
|
- The CloudWatch Events name will be replaced with EventBridge
|
||||||
|
|
||||||
|
## AWS CloudTrail
|
||||||
|
|
||||||
|
- Tracks and logs API calls made in your AWS account for auditing and governance.
|
||||||
|
- Useful for security analysis, compliance, and operational troubleshooting.
|
||||||
|
- CloudTrail is enabled by default!
|
||||||
|
- Get an history of events / API calls made within your AWS Account by:
|
||||||
|
- Console
|
||||||
|
- SDK
|
||||||
|
- CLI
|
||||||
|
- AWS Services
|
||||||
|
- Can put logs from CloudTrail into CloudWatch Logs or S3
|
||||||
|
- A trail can be applied to All Regions (default) or a single Region.
|
||||||
|
- If a resource is deleted in AWS, investigate CloudTrail first!
|
||||||
|
|
||||||
|
**Key Features:**
|
||||||
|
|
||||||
|
- Logs API calls across AWS services, including CLI, SDK, and Management Console.
|
||||||
|
- Tracks who made the call, when, and from where.
|
||||||
|
|
||||||
|
### CloudTrail Events
|
||||||
|
|
||||||
|
- Management Events:
|
||||||
|
- Operations that are performed on resources in your AWS account
|
||||||
|
- Examples:
|
||||||
|
- Configuring security (IAM AttachRolePolicy)
|
||||||
|
- Configuring rules for routing data (Amazon EC2 CreateSubnet)
|
||||||
|
- Setting up logging (AWS CloudTrail CreateTrail)
|
||||||
|
- By default, trails are configured to log management events.
|
||||||
|
- Can separate Read Events (that don’t modify resources) from Write Events (that may modify resources)
|
||||||
|
- Data Events:
|
||||||
|
- By default, data events are not logged (because high volume operations)
|
||||||
|
- Amazon S3 object-level activity (ex: GetObject, DeleteObject, PutObject): can separate Read and Write Events
|
||||||
|
- AWS Lambda function execution activity (the Invoke API)
|
||||||
|
|
||||||
|
### CloudTrail Insights Events
|
||||||
|
|
||||||
|
- Enable CloudTrail Insights to detect unusual activity in your account:
|
||||||
|
- inaccurate resource provisioning
|
||||||
|
- hitting service limits
|
||||||
|
- Bursts of AWS IAM actions
|
||||||
|
- Gaps in periodic maintenance activity
|
||||||
|
- CloudTrail Insights analyzes normal management events to create a baseline
|
||||||
|
- And then continuously analyzes write events to detect unusual patterns
|
||||||
|
- Anomalies appear in the CloudTrail console
|
||||||
|
- Event is sent to Amazon S3
|
||||||
|
- An EventBridge event is generated (for automation needs)
|
||||||
|
|
||||||
|
### CloudTrail Events Retention
|
||||||
|
|
||||||
|
- Events are stored for 90 days in CloudTrail
|
||||||
|
- To keep events beyond this period, log them to S3 and use Athena
|
||||||
|
|
||||||
|
## AWS X-Ray
|
||||||
|
|
||||||
|
- Helps analyze and debug distributed applications by providing request tracing.
|
||||||
|
- Test locally
|
||||||
|
- Add log statements everywhere
|
||||||
|
- Re-deploy in production
|
||||||
|
|
||||||
|
**Key Features:**
|
||||||
|
|
||||||
|
- Trace requests across AWS services and custom applications.
|
||||||
|
- Identify performance bottlenecks and errors.
|
||||||
|
- Visualize service maps to understand dependencies.
|
||||||
|
|
||||||
|
### AWS X-Ray advantages
|
||||||
|
|
||||||
|
- Troubleshooting performance (bottlenecks)
|
||||||
|
- Understand dependencies in a microservice architecture
|
||||||
|
- Pinpoint service issues
|
||||||
|
- Review request behavior
|
||||||
|
- Find errors and exceptions
|
||||||
|
- Are we meeting time SLA?
|
||||||
|
- Where I am throttled?
|
||||||
|
- Identify users that are impacted
|
||||||
|
|
||||||
|
## Amazon CodeGuru
|
||||||
|
|
||||||
|
- Code review and performance profiling service.
|
||||||
|
- Provides suggestions to improve the performance of applications.
|
||||||
|
- Identifies the most costly lines of applications.
|
||||||
|
- It is based on machine learning models long used at Amazon.
|
||||||
|
- Identifies code errors and risks with automatic code reviews.
|
||||||
|
- CodeGuru Reviewer: automated code reviews for static code analysis (development)
|
||||||
|
- CodeGuru Profiler: visibility/recommendations about application performance during runtime (production)
|
||||||
|
|
||||||
|
### Amazon CodeGuru Reviewer
|
||||||
|
|
||||||
|
- Uses machine learning to identify:
|
||||||
|
- Security vulnerabilities.
|
||||||
|
- Code inefficiencies.
|
||||||
|
- Best practices violations.
|
||||||
|
- Provides recommendations to improve code quality.
|
||||||
|
- Supports Java and Python
|
||||||
|
- Integrates with GitHub, Bitbucket, and AWS CodeCommit
|
||||||
|
|
||||||
|
### Amazon CodeGuru Profiler
|
||||||
|
|
||||||
|
- Helps understand the runtime behavior of your application
|
||||||
|
- Example: identify if your application is consuming excessive CPU capacity on a logging routine
|
||||||
|
- Features:
|
||||||
|
- Identify and remove code inefficiencies
|
||||||
|
- Improve application performance (e.g., reduce CPU utilization)
|
||||||
|
- Decrease compute costs
|
||||||
|
- Provides heap summary (identify which objects using up memory)
|
||||||
|
- Anomaly Detection
|
||||||
|
- Support applications running on AWS or on- premise
|
||||||
|
- Minimal overhead on application
|
||||||
|
|
||||||
|
## AWS Status - Service Health Dashboard
|
||||||
|
|
||||||
|
- Service Health Dashboard is the single place to learn about the availability and operations of AWS services.
|
||||||
|
- You can view the overall status of AWS services, and you can sign in to view personalized communications about your particular AWS account or organization.
|
||||||
|
- Shows all regions, all services health
|
||||||
|
- Shows historical information for each day
|
||||||
|
- Has an RSS feed you can subscribe to
|
||||||
|
- <https://status.aws.amazon.com/>
|
||||||
|
|
||||||
|
## AWS Personal Health Dashboard
|
||||||
|
|
||||||
|
- AWS Personal Health Dashboard provides alerts and remediation guidance when AWS is experiencing events that may impact you.
|
||||||
|
- While the Service Health Dashboard displays the general status of AWS services, Personal Health Dashboard gives you a personalized view into the performance and availability of the AWS services underlying your AWS resources.
|
||||||
|
- The dashboard displays relevant and timely information to help you manage events in progress and provides proactive notification to help you plan for scheduled activities.
|
||||||
|
- Global service <https://phd.aws.amazon.com/>
|
||||||
|
- Shows how AWS outages directly impact you & your AWS resources
|
||||||
|
- Alert, remediation, proactive, scheduled activities
|
||||||
|
|
||||||
|
## Cloud Monitoring Summary
|
||||||
|
|
||||||
|
| **Service** | **Key Features** |
|
||||||
|
| ------------------------- | ---------------------------------------------------------------------------------- |
|
||||||
|
| Amazon CloudWatch | Metrics, Alarms, Logs, Events, EventBridge. |
|
||||||
|
| | - Metrics: monitor the performance of AWS services and billing metrics |
|
||||||
|
| | - Alarms: automate notification, perform EC2 action, notify to SNS based on metric |
|
||||||
|
| | - Logs: collect log files from EC2 instances, servers, Lambda functions… |
|
||||||
|
| | - Events (or EventBridge): react to events in AWS, or trigger a rule on a schedule |
|
||||||
|
| AWS CloudTrail | Tracks API calls, detects unusual activity. |
|
||||||
|
| CloudTrail Insights | automated analysis of your CloudTrail Events |
|
||||||
|
| AWS X-Ray | Trace requests made through your distributed applications |
|
||||||
|
| Amazon CodeGuru | automated code reviews and application performance recommendations |
|
||||||
|
| Service Health Dashboard | status of all AWS services across all regions |
|
||||||
|
| Personal Health Dashboard | AWS events that impact your infrastructure |
|
||||||
Reference in New Issue
Block a user