diff --git a/README.md b/README.md
index 35b8d73..d14392e 100644
--- a/README.md
+++ b/README.md
@@ -15,6 +15,7 @@
- [Deploying and Managing Infrastructure at Scale](sections/deploying.md)
- [Global Infrastructure](sections/global_infrastructure.md)
- [Cloud Integration](sections/cloud_integration.md)
+- [Cloud Monitoring](sections/cloud_monitoring.md)
### Contributors
diff --git a/sections/cloud_monitoring.md b/sections/cloud_monitoring.md
new file mode 100644
index 0000000..9f86a84
--- /dev/null
+++ b/sections/cloud_monitoring.md
@@ -0,0 +1,215 @@
+# Cloud Monitoring
+
+- [Cloud Monitoring](#cloud-monitoring)
+ - [Amazon CloudWatch](#amazon-cloudwatch)
+ - [Important Metrics](#important-metrics)
+ - [Amazon CloudWatch Alarms](#amazon-cloudwatch-alarms)
+ - [Amazon CloudWatch Logs](#amazon-cloudwatch-logs)
+ - [CloudWatch Logs for EC2](#cloudwatch-logs-for-ec2)
+ - [Amazon CloudWatch Events](#amazon-cloudwatch-events)
+ - [Amazon EventBridge](#amazon-eventbridge)
+ - [AWS CloudTrail](#aws-cloudtrail)
+ - [CloudTrail Events](#cloudtrail-events)
+ - [CloudTrail Insights Events](#cloudtrail-insights-events)
+ - [CloudTrail Events Retention](#cloudtrail-events-retention)
+ - [AWS X-Ray](#aws-x-ray)
+ - [AWS X-Ray advantages](#aws-x-ray-advantages)
+ - [Amazon CodeGuru](#amazon-codeguru)
+ - [Amazon CodeGuru Reviewer](#amazon-codeguru-reviewer)
+ - [Amazon CodeGuru Profiler](#amazon-codeguru-profiler)
+ - [AWS Status - Service Health Dashboard](#aws-status---service-health-dashboard)
+ - [AWS Personal Health Dashboard](#aws-personal-health-dashboard)
+ - [Cloud Monitoring Summary](#cloud-monitoring-summary)
+
+## Amazon CloudWatch
+
+- CloudWatch provides metrics for every services in AWS
+- Metric is a variable to monitor (CPUUtilization, NetworkIn, etc..)
+- Metrics have timestamps
+- Can create CloudWatch dashboards of metrics
+
+### Important Metrics
+
+- EC2 instances: CPU Utilization, Status Checks, Network (not RAM)
+ - Default metrics every 5 minutes
+ - Option for Detailed Monitoring ($$$): metrics every 1 minute
+- EBS volumes: Disk Read/Writes
+- S3 buckets: BucketSizeBytes, NumberOfObjects, AllRequests
+- Billing:Total Estimated Charge (only in us-east-1)
+- Service Limits: how much you’ve been using a service API
+- Custom metrics: push your own metrics
+
+### Amazon CloudWatch Alarms
+
+- Alarms are used to trigger notifications for any metric
+- Alarms actions…
+ - Auto Scaling: increase or decrease EC2 instances “desired” count
+ - EC2 Actions: stop, terminate, reboot or recover an EC2 instance
+ - SNS notifications: send a notification into an SNS topic
+- Various options (sampling, %, max, min, etc…)
+- Can choose the period on which to evaluate an alarm
+- Example: create a billing alarm on the CloudWatch Billing metric
+- Alarm States: OK. INSUFFICIENT_DATA, ALARM
+
+### Amazon CloudWatch Logs
+
+- CloudWatch Logs can collect log from:
+ - Elastic Beanstalk: collection of logs from application
+ - ECS: collection from containers
+ - AWS Lambda: collection from function logs
+ - CloudTrail based on filter
+ - CloudWatch log agents: on EC2 machines or on-premises servers
+ - Route53: Log DNS queries
+- Enables real-time monitoring of logs
+- Adjustable CloudWatch Logs retention
+
+#### CloudWatch Logs for EC2
+
+- By default, no logs from your EC2 instance will go to CloudWatch
+- You need to run a CloudWatch agent on EC2 to push the log files you want
+- Make sure IAM permissions are correct
+- The CloudWatch log agent can be setup on-premises too
+
+### Amazon CloudWatch Events
+
+- Schedule: Cron jobs (scheduled scripts)
+ - Schedule Every hour => Trigger script on Lambda function
+- Event Pattern: Event rules to react to a service doing something
+ - IAM Root User Sign in Event => SNS Topic with Email Notification
+- Trigger Lambda functions, send SQS/SNS messages
+
+### Amazon EventBridge
+
+- EventBridge is the next evolution of CloudWatch Events
+- Default event bus: generated by AWS services (CloudWatch Events)
+- Partner event bus: receive events from SaaS service or applications (Zendesk, DataDog, Segment, Auth0…)
+- Custom Event buses: for your own applications
+- Schema Registry: model event schema
+- EventBridge has a different name to mark the new capabilities
+- The CloudWatch Events name will be replaced with EventBridge
+
+## AWS CloudTrail
+
+- Provides governance, compliance and audit for your AWS Account
+- CloudTrail is enabled by default!
+- Get an history of events / API calls made within your AWS Account by:
+ - Console
+ - SDK
+ - CLI
+ - AWS Services
+- Can put logs from CloudTrail into CloudWatch Logs or S3
+- A trail can be applied to All Regions (default) or a single Region.
+- If a resource is deleted in AWS, investigate CloudTrail first!
+
+### CloudTrail Events
+
+- Management Events:
+ - Operations that are performed on resources in your AWS account
+ - Examples:
+ - Configuring security (IAM AttachRolePolicy)
+ - Configuring rules for routing data (Amazon EC2 CreateSubnet)
+ - Setting up logging (AWS CloudTrail CreateTrail)
+ - By default, trails are configured to log management events.
+ - Can separate Read Events (that don’t modify resources) from Write Events (that may modify resources)
+- Data Events:
+ - By default, data events are not logged (because high volume operations)
+ - Amazon S3 object-level activity (ex: GetObject, DeleteObject, PutObject): can separate Read and Write Events
+ - AWS Lambda function execution activity (the Invoke API)
+
+### CloudTrail Insights Events
+
+- Enable CloudTrail Insights to detect unusual activity in your account:
+ - inaccurate resource provisioning
+ - hitting service limits
+ - Bursts of AWS IAM actions
+ - Gaps in periodic maintenance activity
+- CloudTrail Insights analyzes normal management events to create a baseline
+- And then continuously analyzes write events to detect unusual patterns
+ - Anomalies appear in the CloudTrail console
+ - Event is sent to Amazon S3
+ - An EventBridge event is generated (for automation needs)
+
+### CloudTrail Events Retention
+
+- Events are stored for 90 days in CloudTrail
+- To keep events beyond this period, log them to S3 and use Athena
+
+## AWS X-Ray
+
+- Debugging in Production, the good old way:
+ - Test locally
+ - Add log statements everywhere
+ - Re-deploy in production
+- Log formats differ across applications and log analysis is hard.
+- Debugging: one big monolith “easy”, distributed services “hard”
+- No common views of your entire architecture
+
+### AWS X-Ray advantages
+
+- Troubleshooting performance (bottlenecks)
+- Understand dependencies in a microservice architecture
+- Pinpoint service issues
+- Review request behavior
+- Find errors and exceptions
+- Are we meeting time SLA?
+- Where I am throttled?
+- Identify users that are impacted
+
+## Amazon CodeGuru
+
+- An ML-powered service for automated code reviews and application performance recommendations
+- Provides two functionalities
+- CodeGuru Reviewer: automated code reviews for static code analysis (development)
+- CodeGuru Profiler: visibility/recommendations about application performance during runtime (production)
+
+### Amazon CodeGuru Reviewer
+
+- Identify critical issues, security vulnerabilities, and hard-to-find bugs
+- Example: common coding best practices, resource leaks, security detection, input validation
+- Uses Machine Learning and automated reasoning
+- Hard-learned lessons across millions of code reviews on 1000s of open-source and Amazon repositories
+- Supports Java and Python
+- Integrates with GitHub, Bitbucket, and AWS CodeCommit
+
+### Amazon CodeGuru Profiler
+
+- Helps understand the runtime behavior of your application
+- Example: identify if your application is consuming excessive CPU capacity on a logging routine
+- Features:
+ - Identify and remove code inefficiencies
+ - Improve application performance (e.g., reduce CPU utilization)
+ - Decrease compute costs
+ - Provides heap summary (identify which objects using up memory)
+ - Anomaly Detection
+- Support applications running on AWS or on- premise
+- Minimal overhead on application
+
+## AWS Status - Service Health Dashboard
+
+- Shows all regions, all services health
+- Shows historical information for each day
+- Has an RSS feed you can subscribe to
+-
+
+## AWS Personal Health Dashboard
+
+- AWS Personal Health Dashboard provides alerts and remediation guidance when AWS is experiencing events that may impact you.
+- While the Service Health Dashboard displays the general status of AWS services, Personal Health Dashboard gives you a personalized view into the performance and availability of the AWS services underlying your AWS resources.
+- The dashboard displays relevant and timely information to help you manage events in progress and provides proactive notification to help you plan for scheduled activities.
+- Global service
+- Shows how AWS outages directly impact you & your AWS resources
+- Alert, remediation, proactive, scheduled activities
+
+## Cloud Monitoring Summary
+
+- CloudWatch:
+ - Metrics: monitor the performance of AWS services and billing metrics
+ - Alarms: automate notification, perform EC2 action, notify to SNS based on metric
+ - Logs: collect log files from EC2 instances, servers, Lambda functions…
+ - Events (or EventBridge): react to events in AWS, or trigger a rule on a schedule
+- CloudTrail: audit API calls made within your AWS account
+- CloudTrail Insights: automated analysis of your CloudTrail Events
+- X-Ray: trace requests made through your distributed applications
+- Service Health Dashboard: status of all AWS services across all regions
+- Personal Health Dashboard: AWS events that impact your infrastructure
+- Amazon CodeGuru: automated code reviews and application performance recommendations