Files
AWS-CCP-Notes/sections/cloud_monitoring.md
2022-08-22 23:13:04 +09:00

216 lines
9.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Cloud Monitoring
- [Cloud Monitoring](#cloud-monitoring)
- [Amazon CloudWatch](#amazon-cloudwatch)
- [Important Metrics](#important-metrics)
- [Amazon CloudWatch Alarms](#amazon-cloudwatch-alarms)
- [Amazon CloudWatch Logs](#amazon-cloudwatch-logs)
- [CloudWatch Logs for EC2](#cloudwatch-logs-for-ec2)
- [Amazon CloudWatch Events](#amazon-cloudwatch-events)
- [Amazon EventBridge](#amazon-eventbridge)
- [AWS CloudTrail](#aws-cloudtrail)
- [CloudTrail Events](#cloudtrail-events)
- [CloudTrail Insights Events](#cloudtrail-insights-events)
- [CloudTrail Events Retention](#cloudtrail-events-retention)
- [AWS X-Ray](#aws-x-ray)
- [AWS X-Ray advantages](#aws-x-ray-advantages)
- [Amazon CodeGuru](#amazon-codeguru)
- [Amazon CodeGuru Reviewer](#amazon-codeguru-reviewer)
- [Amazon CodeGuru Profiler](#amazon-codeguru-profiler)
- [AWS Status - Service Health Dashboard](#aws-status---service-health-dashboard)
- [AWS Personal Health Dashboard](#aws-personal-health-dashboard)
- [Cloud Monitoring Summary](#cloud-monitoring-summary)
## Amazon CloudWatch
- CloudWatch provides metrics for every services in AWS
- Metric is a variable to monitor (CPUUtilization, NetworkIn, etc..)
- Metrics have timestamps
- Can create CloudWatch dashboards of metrics
### Important Metrics
- EC2 instances: CPU Utilization, Status Checks, Network (not RAM)
- Default metrics every 5 minutes
- Option for Detailed Monitoring ($$$): metrics every 1 minute
- EBS volumes: Disk Read/Writes
- S3 buckets: BucketSizeBytes, NumberOfObjects, AllRequests
- Billing:Total Estimated Charge (only in us-east-1)
- Service Limits: how much youve been using a service API
- Custom metrics: push your own metrics
### Amazon CloudWatch Alarms
- Alarms are used to trigger notifications for any metric
- Alarms actions…
- Auto Scaling: increase or decrease EC2 instances “desired” count
- EC2 Actions: stop, terminate, reboot or recover an EC2 instance
- SNS notifications: send a notification into an SNS topic
- Various options (sampling, %, max, min, etc…)
- Can choose the period on which to evaluate an alarm
- Example: create a billing alarm on the CloudWatch Billing metric
- Alarm States: OK. INSUFFICIENT_DATA, ALARM
### Amazon CloudWatch Logs
- CloudWatch Logs can collect log from:
- Elastic Beanstalk: collection of logs from application
- ECS: collection from containers
- AWS Lambda: collection from function logs
- CloudTrail based on filter
- CloudWatch log agents: on EC2 machines or on-premises servers
- Route53: Log DNS queries
- Enables real-time monitoring of logs
- Adjustable CloudWatch Logs retention
#### CloudWatch Logs for EC2
- By default, no logs from your EC2 instance will go to CloudWatch
- You need to run a CloudWatch agent on EC2 to push the log files you want
- Make sure IAM permissions are correct
- The CloudWatch log agent can be setup on-premises too
### Amazon CloudWatch Events
- Schedule: Cron jobs (scheduled scripts)
- Schedule Every hour => Trigger script on Lambda function
- Event Pattern: Event rules to react to a service doing something
- IAM Root User Sign in Event => SNS Topic with Email Notification
- Trigger Lambda functions, send SQS/SNS messages
### Amazon EventBridge
- EventBridge is the next evolution of CloudWatch Events
- Default event bus: generated by AWS services (CloudWatch Events)
- Partner event bus: receive events from SaaS service or applications (Zendesk, DataDog, Segment, Auth0…)
- Custom Event buses: for your own applications
- Schema Registry: model event schema
- EventBridge has a different name to mark the new capabilities
- The CloudWatch Events name will be replaced with EventBridge
## AWS CloudTrail
- Provides governance, compliance and audit for your AWS Account
- CloudTrail is enabled by default!
- Get an history of events / API calls made within your AWS Account by:
- Console
- SDK
- CLI
- AWS Services
- Can put logs from CloudTrail into CloudWatch Logs or S3
- A trail can be applied to All Regions (default) or a single Region.
- If a resource is deleted in AWS, investigate CloudTrail first!
### CloudTrail Events
- Management Events:
- Operations that are performed on resources in your AWS account
- Examples:
- Configuring security (IAM AttachRolePolicy)
- Configuring rules for routing data (Amazon EC2 CreateSubnet)
- Setting up logging (AWS CloudTrail CreateTrail)
- By default, trails are configured to log management events.
- Can separate Read Events (that dont modify resources) from Write Events (that may modify resources)
- Data Events:
- By default, data events are not logged (because high volume operations)
- Amazon S3 object-level activity (ex: GetObject, DeleteObject, PutObject): can separate Read and Write Events
- AWS Lambda function execution activity (the Invoke API)
### CloudTrail Insights Events
- Enable CloudTrail Insights to detect unusual activity in your account:
- inaccurate resource provisioning
- hitting service limits
- Bursts of AWS IAM actions
- Gaps in periodic maintenance activity
- CloudTrail Insights analyzes normal management events to create a baseline
- And then continuously analyzes write events to detect unusual patterns
- Anomalies appear in the CloudTrail console
- Event is sent to Amazon S3
- An EventBridge event is generated (for automation needs)
### CloudTrail Events Retention
- Events are stored for 90 days in CloudTrail
- To keep events beyond this period, log them to S3 and use Athena
## AWS X-Ray
- Debugging in Production, the good old way:
- Test locally
- Add log statements everywhere
- Re-deploy in production
- Log formats differ across applications and log analysis is hard.
- Debugging: one big monolith “easy”, distributed services “hard”
- No common views of your entire architecture
### AWS X-Ray advantages
- Troubleshooting performance (bottlenecks)
- Understand dependencies in a microservice architecture
- Pinpoint service issues
- Review request behavior
- Find errors and exceptions
- Are we meeting time SLA?
- Where I am throttled?
- Identify users that are impacted
## Amazon CodeGuru
- An ML-powered service for automated code reviews and application performance recommendations
- Provides two functionalities
- CodeGuru Reviewer: automated code reviews for static code analysis (development)
- CodeGuru Profiler: visibility/recommendations about application performance during runtime (production)
### Amazon CodeGuru Reviewer
- Identify critical issues, security vulnerabilities, and hard-to-find bugs
- Example: common coding best practices, resource leaks, security detection, input validation
- Uses Machine Learning and automated reasoning
- Hard-learned lessons across millions of code reviews on 1000s of open-source and Amazon repositories
- Supports Java and Python
- Integrates with GitHub, Bitbucket, and AWS CodeCommit
### Amazon CodeGuru Profiler
- Helps understand the runtime behavior of your application
- Example: identify if your application is consuming excessive CPU capacity on a logging routine
- Features:
- Identify and remove code inefficiencies
- Improve application performance (e.g., reduce CPU utilization)
- Decrease compute costs
- Provides heap summary (identify which objects using up memory)
- Anomaly Detection
- Support applications running on AWS or on- premise
- Minimal overhead on application
## AWS Status - Service Health Dashboard
- Shows all regions, all services health
- Shows historical information for each day
- Has an RSS feed you can subscribe to
- <https://status.aws.amazon.com/>
## AWS Personal Health Dashboard
- AWS Personal Health Dashboard provides alerts and remediation guidance when AWS is experiencing events that may impact you.
- While the Service Health Dashboard displays the general status of AWS services, Personal Health Dashboard gives you a personalized view into the performance and availability of the AWS services underlying your AWS resources.
- The dashboard displays relevant and timely information to help you manage events in progress and provides proactive notification to help you plan for scheduled activities.
- Global service <https://phd.aws.amazon.com/>
- Shows how AWS outages directly impact you & your AWS resources
- Alert, remediation, proactive, scheduled activities
## Cloud Monitoring Summary
- CloudWatch:
- Metrics: monitor the performance of AWS services and billing metrics
- Alarms: automate notification, perform EC2 action, notify to SNS based on metric
- Logs: collect log files from EC2 instances, servers, Lambda functions…
- Events (or EventBridge): react to events in AWS, or trigger a rule on a schedule
- CloudTrail: audit API calls made within your AWS account
- CloudTrail Insights: automated analysis of your CloudTrail Events
- X-Ray: trace requests made through your distributed applications
- Service Health Dashboard: status of all AWS services across all regions
- Personal Health Dashboard: AWS events that impact your infrastructure
- Amazon CodeGuru: automated code reviews and application performance recommendations