If an object is small enough (smaller than a specified size, or with zip encoding), the migration will fallback to the default behaviour like command RESTORE
does. The input will be serialised into RDB encoded SDS string, then transferred and deserialised on the target Redis instance.
Otherwise, for example, a big QuickList, HashTable or a SkipList encoded object, it will be cut into multiple pieces and migrated in a pipeline.
Two new commands are introduced for this design, SLOTSRESTORE-ASYNC
and SLOTSRESTORE-ASYNC-ACK
. Please ignore the prefix SLOTS
.
Command RESTORE-ASYNC
has multiple subcommands for different object types or encodings.
Command RESTORE-ASYNC-ACK
is used to deliver the RESTORE-ASYNC
message.
Here are the formats of these two commands.
/* *
* SLOTSRESTORE-ASYNC select $db
* del $key
* expire $key $ttl
* object $key $ttl $payload
* string $key $ttl $payload
* list $key $ttl $hint [$elem1 ...]
* hash $key $ttl $hint [$hkey1 $hval1 ...]
* dict $key $ttl $hint [$elem1 ...]
* zset $key $ttl $hint [$elem1 $score1 ...]
*
* SLOTSRESTORE-ASYNC-ACK $errno $message
* */
Let’s take an example, suppose we’re migrating a key/value pair from A to B:
Node A generates the first RESTORE-ASYNC
command and send it to node B to start a migration;
Node B will handle the request from A:
list/hash/dict/zset
, corresponding operation (such as RPUSH/HMSET/SADD/ZADD
) will be simulated and performed.RESTORE-ASYNC-ACK
.The key point is that the response message form B appears in multi-bulkbytes
format, which means it will be decoded and handled as a normal command by node A.
RESTORE-ASYNC
command with more data will be generated.RESTORE-ASYNC
will be sent to correct the TTL with the real expire time.Obviously, the process of migration can be accelerated by generating more than one request on every delivery message (grows exponentially). Thus, I also implemented a configurable speed controller to limit the CPU/Memory usage.
To achieve this purpose, two level of iterators, singleObjectIterator
and batchedObjectIterator
are designed & implemented.
singleObjectIterator
is used in the migration of single key/value pair.
cursor/lindex/zindex
of HashTable/QuickList/SkipList
, and generate RESTORE-ASYNC
commands during the migration.batchedObjectIterator
is the top level of data structure, and used to manage the operations on multiple keys.
We have clusterRedirectClient
in Redis Cluster, so we can do the same thing in asynchronous migration: reply the client with -TRYAGAIN
if current operation is a write operation on the keys that is being migrated.
Maybe the biggest challenge of asynchronous deletion is the modification of reference counter is not atomic in Redis 3.2.8.
makeObjectShared
from Redis 4.0 to Redis 3.2.8 to protect the shared objects, namely shared.integers
.incrRefCount
) before any operation.dbDelete
) before proceeding deletion.decrRefCount
asynchronously if its reference counter is 1
, or call decrRefCount
directly in the main thread. (That’s tricky, but works.)That's my implementation of bio thread, and here's the source code.
That's all. :"P