Architecting for the Cloud: AWS Best Practices is the whitepaper recommended to read for AWS Solution Architect Associate Certificate Test.

AWS Cloud Design Principles

Scalability

There are generally two ways to scale an IT architecture: vertically and horizontally.

Scaling Vertically vs Scaling Horizontally

Scaling Vertically

Increase the specification of an individual resource. For example, upgrading a server with larger hard drive or better I/O capacity.
Not always cost-effective.
Will reach the limit.
Not always highly available.
Easy to implement and can be sufficient for many user cases especially in the short ter,.

Scaling Horizontally

Scaling horizontally takes place through an increase in the number of resources, such as adding more hard drives to a storage array or adding more servers to support an application.
A great way to build internet-scale applications that leverage the elasticity of cloud computing.

Now let’s examine some possible horizontal scaling scenarios.

Stateless Applications

A stateless application is an application that does not need knowledge of previous interactions and does not store session information: given the same input, produces the same response to any end user.
Stateless Applications can be scaled horizontally:

Any of the available compute resource (EC2/Lambda) can serve the request.
Each resource do not need to be aware of their peers: all that is required is a way to distribute workload to them.
Distribute Load to Multiple Nodes
To distribute the workload to multiple nodes, you can use either a push or pull model.
push model
With a push model, you can use Elastic Load Balancing to distribute the workload.
A Network Load Balancer operates at level 4 of the open systems interconnection model.
WIth container-based services, you can also use Application Load Balancer.
You can use Amazon Route 53 to implement a DNS round robin.
In this case, DNS responses return an IP address from a list of valid hosts in a round-robin fashion. While easy to use, this approach does not always work with with the elasticity of cloud computing, because even if you can set low time to live (TTL) values for your DNS records, caching DNS resolvers are outside the control of Amazon Route 53 and might not always respect your settings.
pull model
Instead of a load balancing solution, you can implement a pull model for asynchronous, event-driven workloads.
In a pull model, tasks that need to be performed or data that needs to be processed can be stored as messages in a queue using Amazon Simple Queue Service (Amazon SQS) or as a streaming data solution such as Amazon Kinesis.
Multiple compute resources can then pull and consume those messages, processing them in a distributed fashion.

Stateless Component

In practice, most applications maintain some kind of state information. You can still make a portion of your architecture stateless by not storing anything that needs to persist to more than a single request in the local file system.

Using HTTP cookies to store session information

Consider only storing a unique session identifier in an HTTP cookie, and store more detailed session information on the server side. Most programming platforms provide a native session management mechanism.

Server side session storage

User session information is often stored on the local file system by default, and result in a stateful architecture.

A common solution is to store information in database. Amazon DynamoDB is a great choice to store the server side user session information.
When larger files is required to be stored, you can place those files in a shared storage layer, such as Amazon S3 or Amazon EFS, and you can still avoid the introduction of stateful components, i.e. the workload can be picked up by another EC2 instance when needed.
Multi-step workflow
You can use AWS Step Functions to centrally store execution history and make these workloads stateless.

Stateful Components

Examples of inevitably stateful components:

database
many legacy applications were designed to run on a single server by relying on local compote resources
User cases that require client devices to maintain a connection to a specific server for prolonged periods, for example: real-time multiplayer gaming, much easier to achieve a non-distributed implementation where participants are connected to the same server.
You might still be able to scale those components horizontally by distributing the load to multiple nodes with session affinity: you bind all the transactions of a session to a specific compute resource.
Session Affinity
session affinity(会话保持)是在负载均衡器上的一种机制，在完成负载均衡任务的同时，还负责一系列相关连的访问请求会分配到一台服务器上｡
Limitations of Session Affinity
Existing sessions won’t benefit from the introduction of newly launched compute nodes
If a compute node is terminated, users bound to them will lose their session-specific data.
Implement Session Affinity Option 1
For HTTP/HTTPS traffic, you can use sticky session feature of an Application Load Balancer to bind a user’s session to specific instance.
With this feature, an Application Load Balancer will try to use the same server for that user for the duration of the session.

Use client-side load balancing

In this model, the clients need a way of discovering valid server endpoints to directly connect to. You can use DNS for that, or you can build a simple discovery API to provide that information to the software running on the client.
Health checking mechanism also needs to be implemented on the client side.

Distributed Processing

By dividing a task and its data into many small fragments of work, you can execute them in parallel across a set of compute resources.

Offline batch jobs can be horizontally scaled by using distributed data processing engines such as AWS Batch, AWS Glue, and Apache Hadoop.
On AWS, you can use Amazon EMR(Amazon Elastic MapReduce) to run Hadoop workloads on top of a fleet of EC2 instances without the operational complexity.
For real-time processing of streaming data, Amazon Kinesis partitions data in multiple shards that can then be consumed by multiple Amazon EC2 or AWS Lambda resources to achieve elasticity.

Disposable Resources instead of Fixed Servers

When designing for AWS, you can take advantage of the dynamically provisioned nature of cloud computing.
You can think of servers and other components as temporary resources.

Immutable infrastructure pattern

It solves the configuration drift issue with fixed, long running servers.

Configuration drift: Changes and software patches applied through time can result in untested and heterogeneous configurations across different environments

Immutable infrastructure pattern: a server—once launched—is never updated. Instead, when there is a problem or need for an update, the problem server is replaced with a new server that has the latest configuration.
This enables resources to always be in a consistent (and tested) state, and makes rollbacks easier to perform. This is more easily supported with stateless architectures.

Instantiating Compute Resources

Make the creation and configuration of new compute note/other components an automated and repeatable process, so that you can dispose resources and launch new ones swiftly.

Bootstrapping

Execute automated bootstrapping actions: scripts that install software/copy data to bring that resource to a particular state.

Set up EC2 instances with user data scripts and cloud-init directives
Script and configuration management tools such as Chef or Puppet
With custom scripts and the AWS APIs, or with AWS CloudFormation support for AWS Lambda-backed custom resources, you can write provisioning logic that acts on almost any AWS resource.
Golden Image
Golden image is a snapshot of a particular state of the resource.
Resources can be launched from a golder image:
EC2 instances (AMI)
Amazon RDS DB instances (instantiating it from a Amazon RDS snapshot)
Amazon Elastic BLock Store (Amazon EBS) volumes (EBS snapshot)
When compared to the bootstrapping approach, a golden image results in faster start times and removes dependencies to configuration services or third-party repositories. This is important in auto-scaled environments where you want to be able to quickly and reliably launch additional resources as a response to demand changes.

Containers

Docker

Docker allows you to package a piece of software in a Docker image, which is a standarised unit for software development.
These services allow you to deploy and manage multiple containers across a cluster of EC2 instances:

AWS Elastic Beanstalk
Amazon Elastic Container Service (ECS)
AWS Fargate
You can build golden Docker images and use the ECS Container
Kubernetes and Amazon EKS
With Kubernetes and Amazon Elastic Container Service for Kubernetes, you can easily deploy, manage and scale containerized applications.

Hybrid

You can use a combination of Golden Images and Bootstrapping.
Items that do not change often or that introduce external dependencies will typically be part of your golden image. An example of a good candidate is your web server software that would otherwise have to be downloaded by a third-party repository each time you launch an instance.
Items that change often or differ between your various environments can be set up dynamically through bootstrapping actions.

Elastic Beanstalk follows the Hybrid model. It provides preconfigured run time environments - each initiated from its AMI, but allows you to run bootstrap actions through .ebextensions configuration files, and configure environmental variables to parameterize the environment differences.

Infrastructure as Code

Applications of the disposable resources principles are not limited to the individual resource level. AWS assets are programmable, and you can use code to make your whole infrastructure reusable, maintainable, extensible, and testable:

AWS CloudFormation

AWS CloudFormation templates give you an easy way to create and manage a collection of related AWS resources.
Your CloudFormation templates can live with your application in your version control repository, which allows you to reuse architectures and reliably clone production environments for testing.

Automation

Consider introducing one or more of these types of automation into your application architecture to ensure more resiliency, scalability, and performance.

Serverless Management and Deployment

Deployment pipeline

AWS CodePipeline
AWS CodeBuild
AWS CodeDeploy

Infrastructure Management and Deployment

AWS Elastic Beanstalk
Amazon EC2 auto recovery: create an Amazon CloudWatch alarm that monitors an EC2 instance and automatically recovers it if it becomes impaired. (but you lose your in-memory data)
AWS Systems Manager: automatically collect software inventory, apply OS patches, create a system image to configure Windows and Linux operating systems
Auto Scaling: maintain application availability and scale your Amazon EC2, Amazon DynamoDB, Amazon ECS, Amazon Elastic Container Service for Kubernetes (Amazon EKS) capacity.
You can use Auto Scaling to help make sure that you are running the desired number of healthy EC2 instances across multiple Availability Zones.

Alarms and Events

Amazon CloudWatch alarms -> Amazon Simple Notification Service
Amazon CloudWatch events: a near real-time stream of system events that describe changes in AWS resources. You can route events to one/more targets, such as
- Lambda Functions
- Kinesis Streams
- SNS topic
AWS Lambda scheduled events: configure a Lambda function that execute on a regular schedule.
AWS WAF security automations: you can administer AWS Web Application Firewall completely through APIs, which makes security automation easy, enabling rapid rule propagation and fast incident response.

Loose Coupling

As application complexity increases, a desirable attribute of an IT system is that it can be broken into smaller, loosely coupled components.
Reduce dependencies.
A change or a failure in one component should not cascade to other components.

Well-Defined Interfaces

Components interact only through specific, technology-agnostic（技术不可知） interfaces, such as RESTful APIs.

microservices architecture

This granular design pattern of decoupling components with interfaces is commonly referred to as a microservices architecture.

Amazon API Gateway

A fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale.

traffic management,
authorization and access control,
monitoring, and
API version management.

Service Discovery

Loose coupling is a crucial element if you want to take advantage of the elasticity of cloud computing, where new resources can be launched or terminated at any point in time.
Those compute resources running smaller services need a way to be addressed/discovered.

Elastic Load Balancing

each load balancer gets its own hostname, you can consume a service through a stable endpoint.

DNS and private Amazon Route 53 zones

The particular load balancer’s endpoint can be abstracted and modified at any time.

Amazon Route 53 auto naming

Auto naming lets you automatically create DNS records based on a configuration you define.

Asynchronous Integration

Suitable for any interaction that does not need an immediate response and where an acknowledgement that a request has been registered will suffice.
It involves

one component that generates events
another component that consumes them

The two components do not integrate through direct point-to-point interaction, but usually through an intermediate durable storage layer, such as:

an SQS(Simple Queueing Service) queue
a streaming data platform such as Amazon Kinesis
Cascading Lambda events
AWS Step functions
Amazon Simple Workflow Service

This approach decouples the two components and introduces additional resiliency. So, for example, if a process that is reading messages from the queue fails, messages can still be added to the queue and processed when the system recovers.

Distributed Systems Best Practices

build applications that handle component failure in a graceful manner.

A request that fails can be retried with an exponential backoff and Jitter strategy, or it can be stored in a queue for later processing. For front-end interfaces, it might be possible to provide alternative or cached content instead of failing completely when, for example, your database server becomes unavailable.

The Amazon Route 53 DNS failover feature also gives you the ability to monitor your website and automatically route your visitors to a backup site if your primary site becomes unavailable. You can host your backup site as a static website on Amazon S3 or as a separate dynamic environment.

Services, Not Servers

Managed Services

AWS managed services provide building blocks that developers can consume to power their applications. Examples of managed services that power your applications:

SQS for messaging cluster
S3 for storage of as much data as you need (5T each individual object)
Amazon CloudFront for content delivery
ELB for load balancing
Amazon DynamoDB for NoSQL databased
Amazon CloudSearch for search workloads
Amazon Elastic Transcoder for video encoding
Amazon Simple Email Service (SES) for sending and receiving emails

Serverless Architectures

It is possible to build both event-driven and synchronous services for mobile, web, analytics, CDN business logic, and IoT without managing any server infrastructure.

Create a serverless application

By using Amazon API Gateway, you can develop virtually infinitely scalable synchronous APIs powered by AWS Lambda.
When combined with Amazon S3 for serving static content assets, this pattern can deliver a complete web application.
Authentication and Access Control
You can use Amazon Cognito so that you don’t have to manage a back-end solution to handle user authentication, network state, storage, and sync.
Amazon Cognito provides temporary AWS credentials to your users, allowing the mobile application running on the device to interact directly with IAM-protected AWS services.
AWS IoT
AWS IoT provides a fully managed device gateway that scales automatically with your usage without any operational overhead.
Responsive services at edge locations

AWS Lambda@Edge lets you run Lambda functions at Amazon CloudFront edge locations in response to CloudFront events.

Data Analytics

Amazon Athena is an interactive query service that makes it easy for you to analyze data in Amazon S3 using standard SQL. Serverless, you pay for the queries that you run.

Databases

On AWS, constraints to choose database technologies are removed.

Relational Database

Scalability

Scale vertically by upgrading to a larger Amazon RDS DB instances; also consider using Amazon Aurora, supports higher throughput.
For read-heavy applications, you can also scale horizontally by adding read replicas.

Read Replicas separate database instances that are replicated asynchronously. As a result, they are subject to replication lag and might be missing some of the latest transactions.
Application designers need to consider which queries have tolerance to slightly stale data. Those queries can be executed on a read replica, while the remainder should run on the primary node. Read replicas can also not accept any write queries.
Relational database workloads that need to scale their write capacity beyond the constraints of a single DB instance require a different approach called data partitioning or sharding.

The application’s data access layer needs to be modified to have awareness of how data is split so that it can direct queries to the right instance. In addition, schema changes must be performed across multiple database schemas, so it is worth investing some effort to automate this process.

High Availability

Direct Connection(DX) is highly available when there are two ports open.
ElastiCache is highly available when you group 2 to 6 nodes into a cluster with replicas where 1 to 5 read-only nodes contain replicate data of the group’s single read/write primary node.
RedShift is highly available when you build multi-region or multi-availability zone (AZ) clusters

For any production relational database, we recommend using the Amazon RDS MultiAZ deployment feature, which creates a synchronously replicated standby instance in a different Availability Zone.

Resilient applications can be designed for Graceful Failure by offering reduced functionality, such as read-only mode by using read replicas.

Anti-Patterns

Consider a NoSQL database if:

Your application primarily indexes and queries data with no need for joins or complex transactions
If you expect a write throughput beyond the constraints of a single instance
If you have large binary files:
Consider using Amazon S3 to hold the actual files
Only hold the metadata for the files in your database.

NoSQL Databases

NoSQL databases trade some of the query and transaction capabilities of relational databases for a more flexible data model that seamlessly scales horizontally.

Amazon DynamoDB is a fast and flexible NoSQL database service for applications that need consistent, single-digit, millisecond latency at any scale. It is a fully managed cloud database and supports both document and key-value store models.

Scalability

NoSQL database engines will typically perform data partitioning and replication to scale both the reads and the writes in a horizontal fashion. They do this transparently, and don’t need the data partitioning logic implemented in the data access layer of your application.

Amazon Dynamo DB manages table partitioning automatically, adding new partitions as your table grows in size or read-provisioned and write-provisioned capacity changes.
Amazon DynamoDB Accelerator (DAX) is a managed, highly available, in-memory cache for DynamoDB to leverage significant performance improvements.
High Availability
Amazon DynamoDB synchronously replicates data across three facilitates in an AWS Region.
Global Tables are replicated across your selected AWS Regions.
Anti-patterns
If your schema cannot be denormalized, and your application requires joins or complex transactions, consider a relational database instead.
If you have large binary files (audio, video, and image), consider storing the files in Amazon S3 and storing the metadata for the files in your database.

Data Warehouse

A data warehouse is a specialized type of relational database, which is optimized for analysis and reporting of large amounts of data.

It can be used to combine transactional data from disparate sources (such as user behavior in a web application, data from your finance and billing system, or customer relationship management or CRM) to make them available for analysis and decision-making.

On AWS, you can leverage Amazon Redshift, a managed data warehouse service that is designed to operate at less than a tenth the cost of traditional solutions

Scalability

Amazon Redshift achieves efficient storage and optimum query performance through a combination of

massively parallel processing (MPP),
columnar data storage, and
targeted data compression encoding schemas.

The Amazon Redshift MPP architecture enables you to increase performance by increasing the number of nodes in your data warehouse cluster.
Amazon Redshift Spectrum enables Amazon Redshift SQL queries against exabytes of data in Amazon S3

High Availability

We recommend that you deploy production workloads in multi-node clusters, so that data that is written to a node is automatically replicated to other nodes within the cluster.
Data is also continuously backed up to Amazon S3.
Amazon Redshift continuously monitors the health of the cluster and automatically rereplicates data from failed drives and replaces nodes as necessary.

Anti-patterns

Because Amazon Redshift is an SQL-based relational database management system (RDBMS), it is compatible with other RDBMS applications and business intelligence tools.
Although Amazon Redshift provides the functionality of a typical RDBMS, including online transaction processing (OLTP) functions, it is not designed for these workloads. If you expect a high concurrency workload that generally involves reading and writing all of the columns for a small number of records at a time, you should instead consider using Amazon RDS or Amazon DynamoDB.

Search

A query is a formal database query, which is addressed in formal terms to a specific data set. Search enables datasets to be queried that are not precisely structured.
For this reason, applications that require sophisticated search functionality will typically outgrow the capabilities of relational or NoSQL databases.

On AWS, you can choose between Amazon CloudSearch and Amazon Elasticsearch Service (Amazon ES)

Amazon CloudSearch is a managed service that requires little configuration and will scale automatically.
Amazon Elasticsearch offers an open-source API and gives you more control over the configuration details. Amazon ES has also evolved to become more than just a search solution. Often used as an analytics engine for use cases such as:
- log analytics
- real-time application monitoring
- click stream analytics.

Scalability

Both Amazon CloudSearch and Amazon ES use data partitioning and replication to scale horizontally.

The difference is that with Amazon CloudSearch, you don’t need to worry about how many partitions and replicas you need because the service automatically handles that.

High Availability

Both Amazon CloudSearch and Amazon ES include features that store data redundantly across Availability Zones.

Graph Databases

A graph database uses graph structures for queries.

A graph is defined as consisting of edges (relationships), which directly relate to nodes (data entities) in the store.
The relationships enable data in the store to be linked together directly, which allows the fast retrieval of hierarchical structures in relational systems.
Graph databases are purposefully built in user cases like:
- social networking
- recommendation engines
- fraud detection

Amazon Neptune is a fully managed graph database service.

Scalability

Amazon Neptune is a purpose-built, high-performance graph database optimized for processing graph queries.

High Availability

Amazon Neptune has:

read replicas
point-in-time recovery
continuous backup to Amazon S3
replication across Availability Zones
support for encryption at rest and in transit

Managing Increasing Volumes of Data

A data lake is an architectural approach that allows you to store massive amounts of data in a central location so that it’s readily available to be categorized, processed, analyzed, and consumed by diverse groups within your organization.

Since data can be stored as-is, you do not have to convert it to a predefined schema, and you no longer need to know what questions to ask about your data beforehand. This enables you to select the correct technology to meet your specific analytical requirements.

AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. Useful for data lakes where data in the central lake gets extracted, transformed and loaded.

Removing Single Points of Failure (High Availability)

Production systems typically come with defined or implicit objectives for uptime.
A system is highly available when it can withstand the failure of an individual component
or multiple components, such as hard disks, servers, and network links.

To attain high availability:

automate recovery
reduce disruption at every layer of architecture

Introducing Redundancy

Remove single points of failure with redundancy: have multiple resources for the same task.

Standby Redundancy

In standby redundancy, when a resource fails, functionality is recovered on a secondary resource with the failover process.
Failover process requires some time. During the failover process the resource remains unavailable. The secondary resource can either be:

launched automatically only when needed (reduce cost)
running idle (to accelerate failover and minimize disruption)

Standby redundancy is often used for stateful components such as relational databases.

Active Redundancy

Requests are distributed to multiple redundant compute resources. When one of them fails, the rest can simply absorb a larger share of the workload. Compared to standby redundancy, active redundancy can achieve better usage and affect a smaller population when there is a failure.

Detect Failure

You should aim to build as much automation as possible in both detecting and reacting to failure.

You can use services such as ELB and Amazon Route 53 to configure health checks and mask failure by routing traffic to healthy endpoints.
You can replace unhealthy nodes automatically using Auto Scaling or by using the Amazon EC2 auto-recovery feature or services such as AWS Elastic Beanstalk.

Design Good Health Checks

In a typical three-tier application, you configure health checks on ELB. Design your health checks with the objective of reliably assessing the health of the back-end nodes.
- Not a simple TCP health check: won’t be able to tell if the instance itself is healthy but the web server process has crashed. Instead, you should assess whether the web server can return an HTTP 200 response for some simple request.
- Not a deep health check (a test that depends on other layers of your application) at this layer.
- A layered approach is often the best.
A deep health check might be appropriate at the Amazon Route 53 level.
- By running a more holistic check that determines if that environment is able to actually provide the required functionality, you can configure Amazon Route 53 to failover to a static version of your website until your database is up and running again.

Durable Data Storage

Data replication is the technique that introduces redundant copies of data. It can happen in a few different modes.

Synchronous replication

Only acknowledges a transaction after it has been durably stored in both the primary location and its replicas.

Synchronous replication redundantly stores all updates to your data.
For Amazon S3 objects, you can use versioning to preserve, retrieve, and restore their versions.
Asynchronous Replication
Decouples the primary node from its replicas at the expense of introducing replication lag: changes on the primary node are not immediately reflected on its replicas.
Asynchronous replicas are used to horizontally scale the system’s read capacity for queries that can tolerate that replication lag.
It can also be used to increase data durability when some loss of recent transactions can be tolerated during a failover. For example, you can maintain an asynchronous replica of a database in a separate AWS Region as a disaster recovery solution.

Quorum-based replication

Combines synchronous and asynchronous replication.
Replication to multiple nodes can be managed by defining a minimum number of nodes that must participate in a successful write operation.

RPO and RTO

It is important to understand where each technology you are using fits in these data storage models. Their behavior during various failover or backup/restore scenarios should align to your recovery point objective (RPO) and your recovery time objective (RTO).

Automated Multi-Data Centre Resilience

Infrastructure

Each AWS Region contains multiple distinct locations, or Availability Zones. Each Availability Zone is engineered to be independent from failures in other Availability Zones. An Availability Zone is a data center, and in some cases, an Availability Zone consists of multiple data centers. Availability Zones within a Region provide inexpensive, low-latency network connectivity to other zones in the same Region. This allows you to replicate your data across data centers in a synchronous manner so that failover can be automated and be transparent for your users.

Active Redundancy

A fleet of application servers can be distributed across multiple Availability Zones and be attached to ELB. When the EC2 instances of a particular Availability Zone fail their health checks, ELB stops sending traffic to those nodes.
AWS Auto Scaling ensures that the correct number of EC2 instances are available to run your application, launching and terminating instances based on demand and defined by your scaling policies.
AWS services that are inherently designed according to the multiple Availability Zone (multi-AZ) principle:
Amazon RDS
Amazon S3 (except for Amazon S3 One Zone-Infrequent Access)
Amazon DynamoDB synchronously replicates data across three facilitates in an AWS Region.

Fault Isolation and Traditional Horizontal Scaling

What if every instance is affected?

If a particular request happens to trigger a bug that causes the system to fail over, then the caller may trigger a cascading failure by repeatedly trying the same request against all instances. You need to ISOLATE the fault.

Shuffle Sharding

For example, if you have eight instances for your service, you might create four shards of two instances each (two instances for some redundancy within each shard) and distribute each customer to a specific shard.
You can thus reduce the impact on customers in direct proportion to the number of shards you have.

Route 53 Infima’s Shuffle Sharding takes this pattern of rapidly diminishing likelihood for an increasing number of matches.

You get many shard combinations if each shard randomly pick a certain “hand” of instances.
There will be instance overlaps in the shuffled shards, but we can make the client fault tolerance:
By having simple retry logic in the client that causes it to try every endpoint in a Shuffle Shard, until one succeeds, we get a dramatic bulkhead effect.

The Bulkhead pattern is a type of application design that is tolerant of failure. In a bulkhead architecture, elements of an application are isolated into pools so that if one fails, the others will continue to function.

Optimize for Cost

Right Sizing

Choose the right instance types for:

Amazon EC2
Amazon RDS
Amazon Redshift
Amazon ES
Choose the right storage solution:
Select the right Amazon S3 storage class
Select the right EBS volume type (magnetic, general purpose SSD, provisioned IOSP SSD) for:
- Amazon EC2
- Amazon RDS
- Amazon ES
  AWS provides tools to help you identify cost-saving opportunities and keep your resources right-sized:
Cost Explorer (graphs)
AWS Budgets (alerts)
AWS Cost and Usage Reports (detailed line items)
Map your AWS costs and usage into meaningful categories with Cost Categories
Cost Allocation Tags: After you or AWS applies tags to your AWS resources (such as Amazon EC2 instances or Amazon S3 buckets) and you activate the tags in the Billing and Cost Management console, AWS generates a cost allocation report as a comma-separated value (CSV file) with your usage and costs grouped by your active tags.
AWS Price List API and AWS Price List Service API lets you query prices of AWS services via JSON(AWS Price List Service API) or HTML(AWS Price List API). You can also subscribe to Amazon Simple Notification Service (Amazon SNS) notifications to get alerts when prices for the services change.
Logging Billing and Cost Management API calls with AWS CloudTrail

Elasticity

Auto Scaling

Plan to implement Auto Scaling for as many Amazon EC2 workloads as possible, so that you horizontally scale up when needed and scale down and automatically reduce your spending when you don’t need that capacity anymore.

Automate turning off non-productive workloads

Using Amazon CloudWatch alarm actions, you can create alarms that automatically stop, terminate, reboot, or recover your EC2 instances.

Consider which compute workloads you could implement on AWS Lambda

so that you never pay for idle or redundant resources.

Replace EC2 workload with AWS managed services

Where possible, replace Amazon EC2 workloads with AWS managed services that either don’t require you to make any capacity decisions. such as:

ELB (elastic load balancing)
Amazon CloudFront
Amazon SQS, (simple queue service)
Amazon Kinesis Firehose (reliably load streaming data into data lakes, data stores and analytics tools)
AWS Lambda,
Amazon SES, (simple email service)
Amazon CloudSearch
Amazon EFS (elastic file storage)

Or AWS managed services that enable you to easily modify capacity as and when need, such as:

Amazon DynamoDB
Amazon RDS
Amazon ES

Take Advantage of the Variety of Purchasing Options

Reserved Instances

Ideal for applications with predictable minimum capacity requirements.
You can take advantage of tools such as AWS Trusted Advisor or Amazon EC2 usage reports to identify the compute resources that you use most often and that you should consider reserving.
Other services have reserved capacity options too:

Amazon Redshift,
Amazon RDS
Amazon DynamoDB
Amazon CloudFront
Spot Instances
The hourly price for a Spot instance (of each instance type in each Availability Zone) is set by Amazon EC2, and adjusted gradually based on the long-term supply of, and demand for, Spot instances.
Your Spot instance runs whenever capacity is available and the maximum price per hour for your request (bid price) exceeds the Spot price.

See here for details for spot instance pricing when interrupted.

Caching

Caching is a technique that stores previously calculated data for future use.

Application Data Caching

Applications can be designed so that they store and retrieve information from fast, managed, in-memory caches.

Amazon ElasticCache is a web service that makes it easy to deploy, operate and scale an in-memory cache in the cloud. Supports two in-memory cache engines:
- Memcached
- Redis
**Amazon DynamoDB Accelerator(DAX) is a fully managed, highly available, in-memory cache for DynamoDB for high through-put.
DAX adds in-memory acceleration to your DynamoDB tables without requiring you to manage cache invalidation, data population, or cluster management.
Edge Caching
Copies of static content (images, CSS files, or streaming pre-recorded video) and dynamic content (responsive HTML, live video) can be cached at an Amazon CloudFront edge location, which is a CDN with multiple points of presence around the world.

Security

Use AWS features for Defense in Depth

You can build a VPC topology that isolates parts of the infrastructure through the use of subnets, security groups, and routing controls.
Services like AWS WAF, a web application firewall, can help protect your web applications from SQL injection and other vulnerabilities in your application code.
For access control, you can use IAM to define a granular set of policies and assign them to users, groups, and AWS resources.
The AWS Cloud offers many options to protect your data, whether it is in transit or at rest with encryption.

AWS is responsible for the security of the underlying cloud infrastructure
you are responsible for securing the workloads you deploy in AWS.
For example, when you use services such as Amazon RDS and Amazon ElastiCache, security patches are applied automatically to your configuration settings. This not only reduces operational overhead for your team, but it could also reduce your exposure to vulnerabilities.

Reduce Privileged Access

In a traditional environment, service accounts are often assigned long-term credentials that are stored in a configuration file. On AWS, you can instead use IAM roles to grant permissions to applications running on EC2 instances through the use of short-term credentials, which are automatically distributed and rotated.
For mobile applications, you can use Amazon Cognito to allow client devices to access AWS resources through temporary tokens with fine-grained permissions.

Security as Code
Golden Environment: capture security frameworks, regulations, and organizational policies in a template. This template is used by AWS CloudFormation and deploys your resources in alignment with your security policy. You can reuse security best practices among multiple projects, as a part of your continuous integration pipeline.
For greater control and security, AWS CloudFormation templates can be imported as products into AWS Service Catalog. This allows you to centrally manage your resources to support consistent governance, security, and compliance requirements, while enabling your users to quickly deploy only the approved IT services they need.

Real-time Auditing

On AWS, you can implement continuous monitoring and automation of controls to minimize exposure to security risks. Services to provide tis real-time auditing:

AWS Config
Amazon Inspector
AWS Trusted Advisor

With AWS Config rules you also know if a resource was out of compliance even for a brief period of time, making both point-in-time and period-in-time audits very effective.

You can implement extensive logging for your applications (using Amazon CloudWatch Logs) and for the actual AWS API calls by enabling AWS CloudTrail

AWS CloudTrail is a web service that records API calls to supported AWS services in your AWS account and delivers a log file to your S3 bucket.

You can use AWS Lambda, Amazon EMR (Amazon Elastic MapReduce), Amazon ES, Amazon Athena, or third-party tools from AWS Marketplace to scan log data to detect events such as unused permissions, privileged account overuse, key usage, anomalous logins, policy violations, and system abuse.

AWS Cloud Design Principles

Scalability

Scaling Vertically vs Scaling Horizontally

Scaling Vertically

Scaling Horizontally

Stateless Applications

Distribute Load to Multiple Nodes

push model

pull model

Stateless Component

Using HTTP cookies to store session information

Server side session storage

Multi-step workflow

Stateful Components

Session Affinity

Limitations of Session Affinity

Implement Session Affinity Option 1

Use client-side load balancing

Distributed Processing

Disposable Resources instead of Fixed Servers

Immutable infrastructure pattern

Instantiating Compute Resources

Bootstrapping

Golden Image

Containers

Docker

Kubernetes and Amazon EKS

Hybrid

Infrastructure as Code

AWS CloudFormation

Automation

Serverless Management and Deployment

Infrastructure Management and Deployment

Alarms and Events

Loose Coupling

Well-Defined Interfaces

microservices architecture

Amazon API Gateway

Service Discovery

Elastic Load Balancing

DNS and private Amazon Route 53 zones

Amazon Route 53 auto naming

Asynchronous Integration

Distributed Systems Best Practices

Services, Not Servers

Managed Services

Serverless Architectures

Create a serverless application

Authentication and Access Control

AWS IoT

Responsive services at edge locations

Data Analytics

Databases

Relational Database

Scalability

High Availability

Anti-Patterns

NoSQL Databases

Scalability

High Availability

Anti-patterns

Data Warehouse

Scalability

High Availability

Anti-patterns

Search

Scalability

High Availability

Graph Databases

Scalability

High Availability

Managing Increasing Volumes of Data

Removing Single Points of Failure (High Availability)

Introducing Redundancy

Standby Redundancy

Active Redundancy

Detect Failure

Design Good Health Checks

Durable Data Storage

Synchronous replication