How to accelerate the Data Migration on Redis Cluster?

Topic 3: Asynchronous Data Migration

  1. If an object is small enough (smaller than a specified size, or with zip encoding), the migration will fallback to the default behaviour like command RESTORE does. The input will be serialised into RDB encoded SDS string, then transferred and deserialised on the target Redis instance.

  2. Otherwise, for example, a big QuickList, HashTable or a SkipList encoded object, it will be cut into multiple pieces and migrated in a pipeline.


Two new commands are introduced for this design, SLOTSRESTORE-ASYNC and SLOTSRESTORE-ASYNC-ACK. Please ignore the prefix SLOTS.

  1. Command RESTORE-ASYNC has multiple subcommands for different object types or encodings.

  2. Command RESTORE-ASYNC-ACK is used to deliver the RESTORE-ASYNC message.

Here are the formats of these two commands.

/* *
 * SLOTSRESTORE-ASYNC select $db
 *                    del    $key
 *                    expire $key $ttl
 *                    object $key $ttl $payload
 *                    string $key $ttl $payload
 *                    list   $key $ttl $hint [$elem1 ...]
 *                    hash   $key $ttl $hint [$hkey1 $hval1 ...]
 *                    dict   $key $ttl $hint [$elem1 ...]
 *                    zset   $key $ttl $hint [$elem1 $score1 ...]
 *
 * SLOTSRESTORE-ASYNC-ACK $errno $message
 * */

Let’s take an example, suppose we’re migrating a key/value pair from A to B:

  1. Node A generates the first RESTORE-ASYNC command and send it to node B to start a migration;

  2. Node B will handle the request from A:

    • Update its database with the payload (RDB encoded string, or only a partial data) with a temporary TTL (typically 3x timeout).
      • If the subcommand is list/hash/dict/zset, corresponding operation (such as RPUSH/HMSET/SADD/ZADD) will be simulated and performed.
    • Respond A with message RESTORE-ASYNC-ACK.
  3. The key point is that the response message form B appears in multi-bulkbytes format, which means it will be decoded and handled as a normal command by node A.

    • If the migration is not finished, a next RESTORE-ASYNC command with more data will be generated.
    • Otherwise, a final RESTORE-ASYNC will be sent to correct the TTL with the real expire time.

Obviously, the process of migration can be accelerated by generating more than one request on every delivery message (grows exponentially). Thus, I also implemented a configurable speed controller to limit the CPU/Memory usage.


To achieve this purpose, two level of iterators, singleObjectIterator and batchedObjectIterator are designed & implemented.

  1. singleObjectIterator is used in the migration of single key/value pair.

    • It’s used to hold the reference counters of key and value, record current cursor/lindex/zindex of HashTable/QuickList/SkipList, and generate RESTORE-ASYNC commands during the migration.
  2. batchedObjectIterator is the top level of data structure, and used to manage the operations on multiple keys.

    • It's cancelable.
    • It will remove all key/value pairs from local database safely once the migration is finished.

We have clusterRedirectClient in Redis Cluster, so we can do the same thing in asynchronous migration: reply the client with -TRYAGAIN if current operation is a write operation on the keys that is being migrated.


Topic 4: Asynchronous Deletion

Maybe the biggest challenge of asynchronous deletion is the modification of reference counter is not atomic in Redis 3.2.8.

  1. Copy the implementation of makeObjectShared from Redis 4.0 to Redis 3.2.8 to protect the shared objects, namely shared.integers.
  2. Design the process of deletion very carefully:
    1. Hold the reference counter (call incrRefCount) before any operation.
    2. Complete all other operations (such as dbDelete) before proceeding deletion.
    3. Release the reference counter:
      • Pass the object to a bio thread to call decrRefCount asynchronously if its reference counter is 1, or call decrRefCount directly in the main thread. (That’s tricky, but works.)

That's my implementation of bio thread, and here's the source code.


That's all. :"P

/r/redis Thread Parent