Amazon ElastiCache

Amazon ElastiCache is a fully managed service that provides in-memory data stores for Redis and Memcached. It is designed to significantly improve the performance of read-heavy application workloads or compute-intensive workloads. It provides push-button scalability for memory, writes, and reads and is billed by node size and hours of use. ElastiCache can be used for storing session state, and its access is controlled by VPC security groups and subnet groups. ElastiCache nodes are deployed in clusters and can span more than one subnet of the same subnet group. Applications connect to ElastiCache clusters using endpoints. Maintenance windows can be defined and allow software patching to occur.

There are two types of ElastiCache engines:

  • Memcached – simplest model, can run large nodes with multiple cores/threads, can be scaled in and out, can cache objects such as DBs.
  • Redis – complex model, that supports encryption, master/slave replication, cross AZ (HA), automatic failover, and backup/restore.

Use Cases:

The following table describes a few typical use cases for ElastiCache:

Use Case Benefit
Web session store In cases with load-balanced web servers, store web session information in Redis so if a server is lost, the session info is not lost, and another web server can pick it up
Database caching Use Memcached in front of AWS RDS to cache popular queries to offload work from RDS and return results faster to users
Leaderboards Use Redis to provide a live leaderboard for millions of users of your mobile app
Streaming data dashboards Provide a landing spot for streaming sensor data on the factory floor, providing live real-time dashboard displays

Exam tip: the key use cases for ElastiCache are offloading reads from a database and storing the results of computations and session state. Also, remember that ElastiCache is an in-memory database and it’s a managed service (so you can’t run it on EC2).

The table below describes the requirements that would determine whether to use the Memcached or Redis engine:

Memcached Redis
Simple, no-frills You need encryption
You need to elasticity (scale out and in) You need HIPAA compliance
You need to run multiple CPU cores and threads Support for clustering
You need to cache objects (e.g. database queries) You need complex data types
You need HA (replication
Pub/Sub capability

Memcached:

Memcached is a widely adopted memory object caching system that is multi-threaded and can run large nodes. It can be scaled in and out by adding or removing nodes and is ideal as a front-end for data stores such as RDS or DynamoDB. It can cache objects such as databases and is a simple, no-frills engine that can run multiple CPU cores and threads.

Use cases:

  • Cache the contents of a DB.
  • Cache data from dynamically generated web pages.
  • Transient session data.
  • High-frequency counters for admission control in high volume web apps.

Amazon ElastiCache Memcached supports up to 100 nodes per region and 1-20 nodes per cluster (soft limits). It can integrate with SNS for node failure/recovery notification and supports auto-discovery for nodes added/removed from the cluster. It scales out/in horizontally by adding/removing nodes and scales up/down vertically by changing the node family/type. However, it does not support multi-AZ failover or replication and does not support snapshots. You can place nodes in different AZs.

With ElastiCache Memcached each node represents a partition of data and nodes in a cluster can span availability zones:

Untitled

Redis:

Redis is an open-source, in-memory key-value store that supports more complex data structures such as sorted sets and lists. Data is persistent and can be used as a data store. Redis is not multi-threaded and scales by adding shards, not nodes. It supports master/slave replication and multi-AZ for cross-AZ redundancy. Redis supports automatic failover and backup/restore. A Redis shard is a subset of the cluster’s keyspace that can include a primary node and zero or more read replicas. Redis also supports automatic and manual snapshots (S3) and backups which include cluster data and metadata. You can restore your data by creating a new Redis cluster and populating it from a backup. Automated backups are enabled by default but are automatically deleted with Redis deletion. You can only move snapshots between regions by exporting them from ElastiCache before moving between regions (can then populate a new cluster with data).

Clustering mode disabled:

You can only have one shard. One shard can have one read/write primary node and 0-5 read-only replicas. You can distribute the replicas over multiple AZs in the same region. Replication from the primary node is asynchronous.

A Redis cluster with cluster mode disabled is represented in the diagram below:

Fig: With ElastiCache Memcached each node represents a partition of data and nodes in a cluster can span availability zones

Clustering mode enabled

When cluster mode is enabled in Redis on Amazon ElastiCache, a cluster can have up to 15 shards, with each shard having one primary node and up to 5 read-only replicas. It is recommended to take snapshots from the read replicas to avoid slowing down the nodes. A diagram representing a Redis cluster with cluster mode enabled is also provided in the document.

Untitled

Multi-AZ failover

In case of failures, ElastiCache detects them and automatically promotes the replica that has the lowest replica lag. DNS records remain the same but point to the IP of the new primary, and other replicas start to sync with the new primary. You can have a fully automated, fault-tolerant ElastiCache-Redis implementation by enabling both cluster mode and multi-AZ failover.

The following table compares the Memcached and Redis engines:

Feature Memcached Redis (cluster mode disabled) Redis (cluster mode enabled)
Data persistence No Yes Yes
Data types Simple Complex Complex
Data partitioning Yes No Yes
Encryption No Yes Yes
High availability (replication) No Yes Yes
Multi-AZ Yes, place nodes in multiple AZs. No failover or replication Yes, with auto-failover. Uses read replicas (0-5 per shard) Yes, with auto-failover. Uses read replicas (0-5 per shard)
Scaling Up (node type); out (add nodes) Single shard (can add replicas) Add shards
Multithreaded Yes No No
Backup and restore No (and no snapshots) Yes, automatic and manual snapshots Yes, automatic and manual snapshots

Caching strategies

There are two caching strategies available: Lazy Loading and Write-Through:

1. Lazy Loading

  • ElastiCache loads data into the cache only when necessary (if a cache miss occurs).
  • Lazy loading avoids filling up the cache with data that won’t be requested.
  • If requested data is in the cache, ElastiCache returns it to the application.
  • If the data is not in the cache or has expired, ElastiCache returns null.
  • The application then fetches the data from the database and writes it into the cache, making it available for next time.
  • Data in the cache can become stale if lazy loading is implemented without other strategies, such as Time To Live (TTL).

Untitled

2. Write Through

  • When using a write-through strategy, the cache is updated whenever a new write or update is made to the underlying database.
  • Allows cache data to remain up to date.
  • This can add wait time to write operations in your application.
  • Without a TTL you can end up with a lot of cached data that is never read.

Untitled

Dealing with stale data – Time to Live (TTL)

  • The drawbacks of lazy loading and write-through techniques can be mitigated by a TTL.
  • The TTL specifies the number of seconds until the key (data) expires to avoid keeping stale data in the cache.
  • When reading an expired key, the application checks the value in the underlying database.
  • Lazy Loading treats an expired key as a cache miss and causes the application to retrieve the data from the database and subsequently write the data into the cache with a new TTL.
  • Depending on the frequency with which data changes this strategy may not eliminate stale data – but helps to avoid it.

Exam tip: Compared to DynamoDB Accelerator (DAX) remember that DAX is optimized for DymamoDB specifically and only supports the write-through caching strategy (does not use lazy loading).

Monitoring and Reporting:

1. Memcached Metrics

The following CloudWatch metrics offer good insight into ElastiCache Memcached performance:

CPUUtilization This is a host-level metric reported as a percent. Because Memcached is multi-threaded, this metric can be as high as 90%. If you exceed this threshold, scale your cache cluster up by using a larger cache node type, or scale out by adding more cache nodes.

SwapUsage This is a host-level metric reported in bytes. This metric should not exceed 50 MB. If it does, we recommend that you increase the ConnectionOverhead parameter value.

Evictions This is a cache engine metric. If you exceed your chosen threshold, scale your cluster up by using a larger node type, or scale out by adding more nodes.

CurrConnections This is a cache engine metric. An increasing number of CurrConnections might indicate a problem with your application; you will need to investigate the application behavior to address this issue.

2. Redis Metrics

The following CloudWatch metrics offer good insight into ElastiCache Redis performance:

EngineCPUUtilization Provides CPU utilization of the Redis engine thread. Since Redis is single-threaded, you can use this metric to analyze the load of the Redis process itself.

MemoryFragmentationRatio Indicates the efficiency in the allocation of memory of the Redis engine. Certain thresholds will signify different behaviors. The recommended value is to have fragmentation above 1.0.

CacheHits The number of successful read-only key lookups in the main dictionary.

CacheMisses The number of unsuccessful read-only key lookups in the main dictionary.

CacheHitRate Indicates the usage efficiency of the Redis instance. If the cache ratio is lower than ~0.8, it means that a significant number of keys are evicted, expired, or do not exist.

CurrConnections The number of client connections, excluding connections from read replicas. ElastiCache uses two to four of the connections to monitor the cluster in each case.

Logging and Auditing

All Amazon ElastiCache actions are logged by AWS CloudTrail.

Every event or log entry contains information about who generated the request. The identity information helps you determine the following:

  • Whether the request was made with root or IAM user credentials.
  • Whether the request was made with temporary security credentials for a role or federated user.
  • Whether the request was made by another AWS service.

Authorization and Access Control

Access to Amazon ElastiCache requires credentials that AWS can use to authenticate your requests. Those credentials must have permissions to access AWS resources, such as an ElastiCache cache cluster or an Amazon Elastic Compute Cloud (Amazon EC2) instance.

You can use identity-based policies with Amazon ElastiCache to provide the necessary access.

You can use Redis Auth to require a token with ElastiCache Redis.

The Redis authentication tokens enable Redis to require a token (password) before allowing clients to run commands, thereby improving data security.

Pricing

Pricing is per Node-hour consumed for each Node Type.

Partial Node-hours consumed are billed as full hours.

There is no charge for data transfer between Amazon EC2 and Amazon ElastiCache within the same Availability Zone.

High Availability for ElastiCache

  1. Memcached:
    • Because Memcached does not support replication, a node failure will result in data loss.
    • Use multiple nodes to minimize data loss on node failure.
    • Launch multiple nodes across available AZs to minimize data loss on AZ failure.
  2. Redis:
    • Use multiple nodes in each shard and distribute the nodes across multiple AZs.
    • Enable Multi-AZ on the replication group to permit automatic failover if the primary nodes fail.
    • Schedule regular backups of your Redis cluster.

Conclusion:

In conclusion, Amazon ElastiCache provides a fully managed solution for caching frequently accessed data in the cloud. With support for both Redis and Memcached, ElastiCache is a powerful tool for improving application latency and throughput. By offloading reads from databases and storing the results of computations and session state, ElastiCache can significantly improve the performance of read-heavy application workloads. With push-button scalability and automatic failover, ElastiCache is a reliable and easy-to-use solution for managing in-memory data stores in the cloud.