Understanding Redis Source Code

The Redis Serialization Protocol (RESP)

Newer versions of Redis employ RESP (Redis Serialization Protocol) to transfer requests and responses as binary strings. From a high level, RESP provides an easy-to-implement specification for representing common data types such as strings, integers, arrays. In RESP, strings are represented as a $ character, followed by the length of the string in decimal format, followed by \r\n. The string itself can contain arbitrary characters including \0, \r and \n and can hence be used to represent binary blobs. Integers are represented as a : character, followed by the decimal representation, followed by \r\n. Arrays are represented as a * character, followed by the number of elements of the array in decimal format, followed by \r\n. The elements of the array then follows the array header, which themselves can be of any of the valid data types.

Both Redis requests and responses are encoded by RESP before they are sent over the connection. For requests, the RESP format is always an array of strings, with the first string being the command and the rest of them being the arguments. Responses, however, can be of any valid RESP data type. The most common form of a reply is a string beginning with either +, indicating that the command has executed successfully, or -, indicating that the command failed to execute.

The Asynchronous Event (AE) Library

Redis implements an Asynchronous Event Library, or the AE library, to facilitate socket-based communication. The AE library is defined in ae.h and ae.c. At a high level, the AE library works as an event loop, which is also the main event loop of Redis server. The event loop monitors the read and write status of file descriptors using blocking system calls. When one or more file descriptors become readable or writable, the loop will unblock and the corresponding event will be handled by invoking the callback functions. The rest of Redis registers the file descriptors and callback handlers to the AE library such that the operation of the server can be properly driven.

The central data structure of the AE library is struct aeEventLoop (file ae.h) which contains information for event handling and registration. In particular, field events stores an array of registered file descriptors and callback handlers (of type struct aeFileEvent). Field fired stores an array of file descriptors that become readable or writable in the current iteration. There are also two callback functions, namely, beforesleep and aftersleep, that are registered to the event loop object. These two functions are set during server initialization and will be called before and after the blocking system call, respectively.

The System Call Layer

The AE library is compatible with a number of system calls that monitor the status of file descriptors, including evport(), epoll(), kqueue(), and select(), with the preference being in a descending order (selected in file ae.c as a sequence of #ifdefs). In the following sections, we use select() as an example, but the workflow does not really change between the system calls used.

The select-compatible implementation resides in ae_select.c. The file defines struct aeApiState, which contains the file descriptor arrays to be used with select() system call. The file implements three vital functions. Function aeApiCreate() initializes the aeApiState object and sets the apidata field of the event loop object to point to the initialized object. Function aeApiAddEvent() adds a given file descriptor into the descriptor array, which enables the descriptor to be monitored by the library. Function aeApiPoll() invokes select() system call with the file descriptor array. The system call is blocking and will return when one or more file descriptor become available, or on a timeout. After the system call returns, the function will scan the file descriptor array to determine which of them have fired, and inserts them into the fired array of the event loop object. The function also returns the number of fired file descriptors to the caller.

The Event Handling Layer

The singleton event loop object is initialized during server initialization by calling aeCreateEventLoop() (file ae.c), which initializes the loop object and returns it to the server. The server saves the event loop object in the server object field el.

During the operation of the server, the function aeCreateFileEvent() will be called to register new file descriptors and callback handlers to the AE library. This function carries the descriptor to be registered, the callback handler proc (which is a function pointer), and the argument to the callback handler clientData (which is the client object for client sockets). Note that although only one callback handler is passed to this function, the AE Library internally distinguishes between read handlers and write handlers (as evidenced by the rfileProc and wfileProc fields of struct aeFileEvent). Consequently, the provided handler will be used as both the read and the write handler.

After server initialization, the server enters the main event loop by calling aeMain() (file ae.c). This function loops infinitely and calls aeProcessEvents() in every iteration. Function aeProcessEvents() first invokes the beforesleep callback (which is registered by the server during initialization), then invokes aeApiPoll(), which may be blocked in the kernel. After the call returns, the function then invokes the after-sleep callback aftersleep, and then processes fired events. It simply scans the fired array, and for each file descriptor in the array, invokes its read or write callback handler with clientData as one of the arguments. A mask argument is also passed to the callback handler to indicate whether the fired descriptor is ready for read, write, or both.

Overall, the aeProcessEvents() function implements the main event loop of Redis server. Redis server uses the AE Library to multiplex between multiple clients as well as the listening socket, hence implementing the listening and the read path. In addition, Redis server uses the before-sleep callback mechanism of the event loop object to implement the write path. Reply messages to the clients are sent within the before-sleep callback after these messages are generated into the reply buffer in last iteration’s command processing.

General Workflow

The Listening Path

Binding and Listening

Redis server listens on one or more sockets and accepts connections from the clients. This listening path begins in function initServer() (file server.c) by calling listenToPort(). The function listenToPort() (file server.c) accepts a list IP addresses to bind to and a single port number. For every address, it invokes anetTcpServer() (file anet.c) to bind the address. The function anetTcpServer() wraps over _anetTcpServer() (file anet.c), which creates a new socket for listening by invoking the system call socket() followed by anetListen() (file anet.c). Function anetListen() simply invokes system calls bind() and then listen() to bind the address and start listening. Finally, the newly created file descriptor is returned to the caller. Note that Redis also supports other types of sockets, such as IPv6 and TLS, but we assume IPv4 sockets are used to simplify the discussion.

To summarize:

initServer()–> listenToPort()–(enters anet.c)–> anetTcpServer()–> _anetTcpServer()–> anetListen()–(enters kernel)–> bind() and listen()

Accepting New Connections

Later on during server initialization, the listening sockets are registered to the AE Library for monitoring. The path begins with createSocketAcceptHandler() (file server.c), which calls aeCreateFileEvent() to register the listening sockets one by one with the callback handler being acceptTcpHandler(). The callback handler acceptTcpHandler() (file networking.c), as discussed above, will be invoked when the AE Library fires it. The handler calls anetTcpAccept() (file anet.c), which wraps anetGenericAccept() (file anet.c). The latter accepts the connection by invoking the accept() system call. The newly assigned socket for communicating with the client is also returned to the caller for later usage.

To summarize:

initServer()–> createSocketAcceptHandler()–(enters ae.c)–> aeCreateFileEvent()–(via callbacks, enters networking.c)–> acceptTcpHandler()–(enters anet.c)–> anetTcpAccept()–> anetGenericAccept()–(enters kernel)–> accept()

Creating the Connection Object

After the connection is accepted at the OS level, the next step is to initialize the local data structures for keeping the client’s information. This path begins in function acceptTcpHandler() (file networking.c) by calling connCreateAcceptedSocket() (file connection.c). The function wraps connCreateSocket() which allocates a new connection object of type struct connection. The connection object represents the server-side state of the connection. The object stores the file descriptor returned from accept system call. The object also contains two critical callback handlers, the read_handler and the write_handler, which are invoked for reading and writing data from/into the socket. The type field of the connection object defines a series of function pointers that either operate on the connection object itself or on the socket. For example, type->set_read_handler assigns a new read handler to the connection object’s read_handler field, while type->read directly reads the socket using the read system call. The newly allocated connection object is returned to the caller for later usage.

To summarize:

acceptTcpHandler()–> connCreateAcceptedSocket()–(enters connection.c)–> connCreateSocket()

Creating the Client Object

After creating the connection object, the Redis server then proceeds to creating the client object. This path also begins in acceptTcpHandler() right after the connection object is returned. The returned object is passed into function acceptCommonHandler() (file networking.c), which first checks whether the connection is valid and legal (e.g., not exceeding the maximum number of concurrent connection), and then calls createClient() to create the client object.

Function createClient() (file networking.c) first creates a struct client object using zmalloc() and then sets the read handler of the connection object for the client to readQueryFromClient by calling connSetReadHandler(). Finally, the function initializes the client’s states including the send and receive data buffer and buffer pointers. The database object that the client operates on is also set to the default one on index zero by calling selectDb().

Function connSetReadHandler() (file connection.c) will indirectly call connSocketSetReadHandler() via the per-connection object type field. Function connSocketSetReadHandler() (file connection.c) stores the callback handler in the connection object’s read_handler field and then registers the file descriptor of the connection to the AE Library via aeCreateFileEvent(). The registered callback handler to the AE Library is function connSocketEventHandler() (file connection.c), which will in turn call read_handler and/or write_handler fields when the file descriptor fires in the AE Library.

Overall, during the client creation process, the file descriptor of the client is registered to the AE Library for monitoring. The callback handler readQueryFromClient will be invoked (after several levels of indirection) when the file descriptor is read to be read. Note that the client does not register any write handler to the connection to the object. As a result, the connection is only capable of reading from the client but not vise versa.

To summarize:

acceptTcpHandler()–> acceptCommonHandler()–> createClient()–(enters connection.c)–> connSetReadHandler()–> connSocketSetReadHandler()–(enters ae.c)–> aeCreateFileEvent()–(via callback, enters networking.c)–> readQueryFromClient()–>

The Read Path

As discussed earlier, when a client is initialized, its file descriptor is registered to the AE Library for read. The callback handler of the registration is function readQueryFromClient(), meaning that this function will be invoked every time some data arrives at the socket and is selected by the AE Library. The handler will be invoked with the connection object as its sole argument, which is passed to the AE Library at registration time.

Function readQueryFromClient() (file networking.c) first checks whether the buffer is big enough for the client message. In most cases, no action is taken, and the function then calls connRead() on the client’s connection object. Inline function connRead() (file connection.h) indirectly calls connSocketRead() via the connection object’s type->read, which in turn invokes the read() system call to pull data out of the socket stream. Note that the destination buffer of the read is the client’s querybuf, which is coupled with qblen to indicate the current length of data in the buffer. The length to be read is calculated as the remaining capacity of the buffer, as evidenced by the local variable readlen.

After data is read from the socket, the handler then invokes processInputBuffer() to parse and dispatch the command. We have already covered the command parsing and dispatching in an earlier section.

To summarize:

readQueryFromClient()–(enters connection.c)–> connRead()–> connSocketRead()–(enters kernel)–> read()

Command Parsing and Dispatching

The State Machine

Requests are read by a state machine over potentially multiple attempts to read from the connection, due to the fact that a request may not be fully received with a single read operation, especially the long ones. Redis maintains a few state variables in the struct client object (defined in file server.h). The first is multibulklen, which is the number of elements in the request RESP array. The second is bulklen, which is the length of the current RESP string being received. The struct member querybuf is an SDS type string used as the per-connection receiving buffer, with the member qb_pos as the current head of command parsing. The length of the SDS string, on the other hand, represents the current size of data in the receiving buffer.

The received RESP string objects are stored temporarily in the argv field of the client object as an array of robj objects, with the argc field indicating the current number of arguments that are already parsed. The total number of elements in the argv field is indicated by the value of multibulklen.

Reading from the Connection

The input parsing workflow begins with function readQueryFromClient() (file networking.c), which is registered to the AE library as the call back function of the connection and will be invoked when the socket is ready to be read. In the most common case, this function first attempts to allocate at least PROTO_IOBUF_LEN bytes in the receiving buffer by calling sdsMakeRoomFor(),
and then computes the number of bytes to read, readlen, by calling sdsavail() on the query buffer object c->querybuf, which returns the number of available bytes after the current valid content in the allocated memory block for the buffer.

After the buffer is set up, the function calls connRead() to actually read from the connection. The connRead() function indirectly invokes connSocketRead() (file connection.c), which in turn invokes the read() system call on the socket descriptor. The return value from the read() system call is also relayed back to the caller as local variable nread.

After the read call returns, the function first checks the return value for any anomalies (both 0 and -1 indicate anomalies). Then the size of the receiving buffer is updated by calling sdsIncrLen, which increases the length of the string object by nread bytes. Lastly, the function processInputBuffer() is called to parse the received content in the buffer. This function may return C_ERR to indicate a parsing failure. If a failure occurs, the function call to beforeNextClient() will close the connection and deallocate the client object, hence terminating the current session.

Parsing Command Header

Function processInputBuffer() (file networking.c) is called every time new data is received from the connection, which is is responsible for parsing command data and driving the state machine. This function starts parsing from c->qb_pos until the buffer is drained. It first checks whether c->reqtype is zero or not. If it is zero, indicating that the function is currently parsing a new command rather than in the middle of parsing, then it determines whether the command is a multiblock one (i.e., using RESP) or an inline command by checking whether the first character of the buffer is *. In the case of RESP, the request type field of the client object is set to PROTO_REQ_MULTIBULK. For RESP format requests, the function then calls processMultibulkBuffer() to parse the RESP strings. If the command is fully received, then function processMultibulkBuffer() will finish parsing and return C_OK, after which the command is executed by calling processCommandAndResetClient(). Otherwise, the command cannot be parsed because more data is to be received. In either case, the receiving buffer is truncated by calling sdsrange() (file sds.c), which shifts the unparsed content of the buffer after c->qb_pos to the beginning, preparing the buffer for the next receiving operation. The client object’s qb_pos field is also reset to zero to indicate that future parsing will start from the first byte of the buffer.

Function processMultibulkBuffer() (file networking.c) implements RESP parsing. The function first checks whether c->multibulklen is zero. If true, then a new command is being parsed, in which case the function reads the number of elements in the array by first verifying that a \r exists in the receiving buffer (meaning that the array header has been fully received), and then parsing the number of elements by calling string2ll() (file util.h) to convert the decimal representation in the array header to an integer. After the array size is parsed, the value is stored in c->multibulklen, and c->argv is initialized accordingly. Future invocations of this function will see a non-zero value for c->multibulklen, in which case it knows that the state machine is currently in the middle of parsing a partially received command and will therefore skip the above step.

Parsing Command Data

After parsing the header, the function then proceed to parse the array element one by one, using c->multibulklen as the loop control variable. For every element, the loop checks whether it begins with $, then parses the length of the RESP string using string2ll() after verifying the \r character, and finally reads the string into a newly allocated SDS object. The SDS object is wrapped by an robj object and then put into c->argv. At the end of the loop, c->multibulklen is decremented by one to indicate that one element has been successfully parsed.

However, simple as it seems, the parsing code must also take one possible scenario into consideration, i.e., when the element is too big to be fully received by one read() call. In this scenario, multiple invocations to the receiving function have to be made in order to fully receive the element before copying it to the argv as an SDS string. More specifically, the parsing function uses c->bulklen to store the remaining length of the current string element to be read after the length is parsed from the receiving buffer. Then in the next iteration of readQueryFromClient(), the c->bulklen will be used to determine the size of the receiving buffer such that the buffer can always hold the element in its entirety. The parsing function will not process the element before it is fully received.

After the element is fully received, the function processMultibulkBuffer() then allocates a new SDS string object and wraps it with a robj object by calling createStringObject() with the pointer to the receiving buffer and the length of the element as arguments. The function also implements an optimization here, i.e., if the length of the element exceeds a certain threshold (PROTO_MBULK_BIG_ARG) and the receiving buffer only contains the element’s data, then the receiving buffer will be directly used as the SDS string without redundantly copying its content to a newly created string object. In this case, a new receiving buffer will be created and assigned to the client object.

After fully receiving an element, the robj object will be inserted into the client object’s argv array, and c->argc is incremented by one to indicate that one more element has been parsed. At the end of the parsing loop, c->bulklen will be set to -1 to indicate that the state machine is not in the middle of receiving an element, and c->multibulklen is decremented by one. If c->multibulklen drops to zero after decrementing, then the command has been fully parsed, in which case the function returns C_OK to notify the caller that the command can be processed.

To summarize:

readQueryFromClient()–> processInputBuffer()–> processMultibulkBuffer()

Inline Requests

Alternatively, requests can be sent to and processed by the Redis server in an inline format where the command and the arguments are separated by one or more space characters and is terminated by the new line character \n. The inline request format is more human readable and favors manually generated requests via command line tools such as telnet.

When processing a new command from the client, the Redis server distinguishes between RESP format and inline format by checking whether the first character of the request is * or not. In the latter case, the function processInlineBuffer() is invoked to process the request as an inline request.

Function processInlineBuffer() (file networking.c) first verifies that the entire command has been received in the buffer by checking whether \n is in the buffer. Note that Redis cannot determine the length of the inline request if the request is not fully received, and hence will always report error back to the client if the \n is not found. In other words, inline requests cannot be received over multiple read() attempts and is therefore not recommended for usages beyond manual testing.

If the \n is found, then the function first creates a new SDS string object aux that only contains the content of the inline request from the receiving buffer, and then calls sdssplitargs() to split aux into an local argv vector containing substrings separated by space characters. The function then initializes c->argc and c->argv with newly created robj objects wrapping the substrings in the local argv vector.

To summarize:

readQueryFromClient()–> processInputBuffer()–> processInlineBuffer()

Command Dispatching

After receiving and parsing the command, the client object’s argc and argv are set to the number of arguments (including the command itself) and the substrings containing argument data, respectively. In this case, the parsing function returns C_OK to the caller function processInputBuffer(), and the command is processed by calling processCommandAndResetClient(), which in turn calls processCommand().

Function processCommand() (file server.c) implements the command dispatching logic as follows. The function first looks up the in-memory command dictionary by calling lookupCommand(), which searches the structure using the first element of the client object’s argv vector as the key. The in-memory command dictionary is initialized from the statically defined global variable redisCommandTable (file server.c) during server initialization by calling populateCommandTable(). The in-memory command dictionary is implemented as a dict object in the server object as a field named commands. During initialization, function populateCommandTable() traverses the table redisCommandTable, and for every table entry, inserts it into the in-memory command dictionary using the command name as the lookup key.

Function lookupCommand() (file server.c) simply wraps lookupCommandLogic(), which in turn calls dictFetchValue() on server.commands to look up the in-memory command dictionary. If the command is found, it will be returned back to the caller function processCommand() as a pointer to the struct redisCommand object and assigned to the cmd field of the client object. The function then performs command integrity checks such as the number of arguments (the arity check), sets a few flags according to the command’s statically-defined properties, and checks the permissions. If any of the checks fails, the server will reject the command and send an error message back to the client by calling rejectCommand(). At last, after all checks have passed, the command is executed by calling call() on the client object.

Function call() (file server.c) performs a large number of extra checks based on the runtime flags of the command and the server configuration. However, the most critical line of the function is the line that invokes c->cmd->proc(), which is the command handler registered to the struct redisCommand object. The command handler implements the specific command that corresponds to the command string in the request. Individual command handlers can be easily located in the global table redisCommandTable residing in file server.c.

To summarize:

processInputBuffer()–> processCommandAndResetClient()–(enters server.c)–> processCommand()–> call()–(via callback)–> c->cmd->proc()

Command Processing

Command process starts with the call back function registered in the command table redisCommandTable (file commands.c). Each command has its separate handler function which is called by call() in server.c.

We start with the simplest command SET. According to the command table, the SET command is handled by function setCommand() in t_string.c. The function wraps over setGenericCommand() (file t_string.c), which itself calls into setKey() after performing a few checks. Note that setGenericCommand() accepts its key and value arguments as robj objects. In the case of SET commands, the key and value are from c->argv[1] and c->argv[2], respectively.

Function setKey() (file db.c) writes the key and value into the given redisDb object. The function first checks whether the key already exists in the database. If false, if calls dbAdd() to create a new entry in the database and initializes the key and value as per the request.

Function dbAdd() (file db.c) first duplicates the key object into another sds object by calling sdsdup() on the key’s raw representation (i.e., key->ptr). It then inserts the key into a new entry by calling dictAddRaw() with the newly created key object. Finally, the value is also set to the entry by calling dictSetVal().

Function dictAddRaw() (file dict.c) inserts a new entry with a given key into the dictionary object (which is how the database is implemented). The function first computes the index of the hash bucket that the entry should be inserted into, then creates a new entry object by calling zmalloc() and links the new entry into the hash bucket d->ht_table, and finally sets the key value by calling dictSetKey() (recall that the key object is duplicated).

On the other hand, dictSetVal() (file dict.h) is defined as a macro in the header file. The macro first checks whether the dictionary object needs the value to be duplicated (by checking if the value duplication call back function (d)->type->valDup is NULL), and if true, duplicates the value object by calling valDup().

The redisDb object passed to functions in db.c is the client’s current database object c->db. This field is initialized in createClient() (file networking.c) to be the default database, i.e., the one on index 0. Redis identifies the databases using an integer as the index, and the object on index zero is the default one. A client’s database object can also be changed using the SELECT command, which is implemented by selectDb() (file db.c) as simply changing c->db to refer to a database on a different index.

To summarize:

call()–(via the command table)–> setCommand()–> setGenericCommand()–(enters db.c)–> setKey()–> dbAdd()–(enters dict.c)–> dictAddRaw()–> dictSetKey()

Sending The Reply

After a command is processed, the reply message is sent back to the client. Replies are generated into the client’s buffer by calling addReply() (file networking.c) with an robj object as the parameter. The function first checks whether the robj object is of string type using macro sdsEncodedObject() (file server.h). If true, then the string contained in the object is added to the reply buffer by calling _addReplyToBufferOrList() with the pointer to the sds string and the length of the string as arguments.

Function _addReplyToBufferOrList() (file networking.c) wraps _addReplyToBuffer() and _addReplyProtoToList(). The former is used if the reply message can fit into the client’s reply buffer. Otherwise, the reply message is added to the buffer in a linked list.

Function _addReplyToBuffer() (file networking.c) performs the copy from the reply object to the client’s buffer (c->buf) using memcpy(). The buffer pointer c->bufpos is also adjusted accordingly.

To summarize:

addReply()–> _addReplyToBufferOrList()–> _addReplyToBuffer()

Reply Objects

Redis defines reply objects for commonly used replies, e.g., "+OK". The reply objects are defined as a struct sharedObjectsStruct object in server.c. The object is a statically declared singleton named shared in server.c and it contains the robj objects that can be used for addReply(). The singleton shared object is populated in function createSharedObjects() (file server.c). The function initializes the object by creating sds string objects using createObject() (file object.c).

The Write Path

After the Redis server processes the command, the reply is generated into the client’s reply buffer by calling addReply(). The function eventually copies data in the reply message to the client’s reply buffer c->buf, which is coupled with c->bufpos to indicate the current write position.

In order to send the reply message back to the client, the Redis server, during initialization, registers a callback function to the AE Library as the before-sleep callback via aeSetBeforeSleepProc(). The callback function being registered is beforeSleep().

Recall that the before-sleep callback, i.e., function beforeSleep() (file server.c), is invoked right before the AE Library invokes select() (or other multiplexing system calls). The function invokes handleClientsWithPendingWritesUsingThreads() (file networking.c) whose name may be somehow misleading because Redis, by default, is not multi-threaded. However, after a careful examination of the function body, it turns out that the function simply wraps over handleClientsWithPendingWrites() if multi-threading is disabled (by checking server.io_threads_num, a configuration variable defined in config.c).

Function handleClientsWithPendingWrites() (file networking.c) traverses the list server.clients_pending_write, which contains clients that have reply messages to send. This list is populated at the beginning of addReply() by calling prepareClientToWrite() (file networking.c). For every client in the list, the function calls writeToClient(), which wraps over _writeToClient(). Function _writeToClient() (file networking.c) further calls connWrite() on the client’s connection object, which indirectly calls connSocketWrite() via the connection’s type field. The write path terminates at function connSocketWrite (file connection.c), which invokes the write() system call on the connection’s file descriptor. Note that connWrite() might be invoked several times for a single buffer due to write() not being able to accept the requested length (which is completely normal).

To summarize:

beforeSleep()–(enters networking.c)–> handleClientsWithPendingWritesUsingThreads()–> handleClientsWithPendingWrites()–> writeToClient()–> _writeToClient()–(enters connection.c)–> connWrite()–(enters kernel)–> write()–>

Configuration

Specifying the Configurations

Redis server configuration is set up during server initialization. Configuration capability is implemented in file config.c. The file contains a configuration table named configs, which stores all configurations. The element of the configs table is of type struct standardConfig, which consists of a name, an alias, flags, and two type-dependent objects. Both name and alias are the names that can be used as the option key. The type-dependent objects, namely interface and data, define the data storage of the configuration value and the interface functions for setting, getting, and initializing the configuration options. In particular, the data object contains a pointer to the location that the configuration value should be written to after they are read. It also contains the default value of the configuration in the case it is not explicitly given.

The configs table is just an array of struct standardConfig objects where configurable options are defined using the pre-defined macros. The macros are straightforward to use and the existing table is a good reference.

Reading Configuration Options

Redis server supports two forms of configuration. Either it is provided via a configuration file, or it is directly given in the command line option. In the former case, the file should be organized into lines, where each non-empty line not starting with # specifies the value of a configuration option. The first token of the line (character string ending with a space) is treated as the option key, and the rest of the line is treated as the value. In the case of command line options, the option key is given by prefixing the key with --, and the option value follows the key. Multiple values can be given for a single key, with space characters separating them. These command line values will be concatenated to form the actual value during server initialization

The server reads the configuration options in three stages. In the first stage, it initializes all options defined in the configs table to their default values by calling initServerConfig() (file server.c) in the main function. This function in turn calls initConfigValues() in file config.c, which simply iterates over all configuration entries in the configs table and writing the default value to the pointer stored in the entry (which all points to the fields of the singleton server object). Function initServerConfig() also sets the default value for non-configurable fields in the server object by directly assigning to them.

Then in the second stage, the main function of the server parses the command line options. If the configuration file name is given, it must be the first argument (i.e., argv[1]). Otherwise, all command line options will treated as options keys and values. The server iterates over the argv vector, treating every entry that begins with "--" as keys, and those between keys as values belonging to the former key. The parsed keys and values are concatenated to an sds string, such that each line of the string represents a configurable option. As mentioned earlier, if multiple values are specified for a key, all the values will be concatenated and appear on the same line, separated by a space.

To summarize:

main()–> initServerConfig()–(enters server.c)–> initConfigValues()

Parsing Configuration Options

In the final stage, the main function invokes loadServerConfig() (file config.c), passing the configuration file name (if given) and the sds string parsed from the command line options as an argument. The function will first search for the file (or files, if the file name is a regular expression), then read the file, and concatenate the sds string storing the command line options to the file content. Since options given by the command line are processed after those in the configuration file, the command line options have higher priority and can hence override those in the configuration file.

The combined sds string containing the configuration file content and command line options are then passed to function loadServerConfigFromString() (file config.c). The function parses the string by first splitting it into lines using sds utility function sdssplitlen(). Then the function splits each individual line that is not empty nor begins with # into tokens using sdssplitargs(). Next, the function searches the configs table to lookup the option key, which is the first token of the line. If the configuration entry is found in the table, the value is set by calling interface.set() of the entry. The setter function will convert the value or values into the correct type and then write them to the pointer stored in the configuration entry.

To summarize:

main()–(enters server.c)–> loadServerConfig()–> loadServerConfigFromString()

Data Structures

The Dict Object

Dictionary objects lie at the core of Redis as the database itself is implemented as a struct dict object. The struct is defined in dict.h and implemented in dict.c and it is quite simple. The dict object merely implements a standard chained hash table with incremental rehashing.

The Data Structure

Entries in the dict object is implemented as struct dictEntry objects. The object contains a key pointer, a value field that can be a pointer, an integer, a floating point number, etc, and a metadata field. The definition of the metadata field depends on the type of the dict object and it makes the dictEntry variable-sized. However, the metadata field is largely irrelevant to the operation of the dict object. The struct dictEntry objects in the same bucket are linked together as a linked list via the next pointer.

The dict object contains two instances of hash tables, stored in fields ht_table, ht_used, and ht_size_exp. Field ht_table stores two copies of the bucket array, with each bucket being a pointer to a linked list of struct dictEntry objects. Field ht_used tracks the number of entries in each of the two hash tables. Field ht_size_exp stores the log2 of the sizes of the ht_table array (hash table sizes are always powers of two).

Incremental Rehashing

When the number of entries exceeds a certain threshold (currently when the load factor grows above 1, or when rehashing is disabled but the load factor grows above 5), the hash table in the dict object will be resized via the rehashing operation. The rehashing operation iterates over entries in the first instance of the hash table (on index 0) and moves them to the second instance of the hash table (on index 1). If a rehashing is going on, insert operations will directly insert the new key into the second instance. Read operations, on the contrary, have to check both hash tables because the entry can reside in either of them depending on the rehashing progress.

Rehashes are triggered by function _dictKeyIndex() which computes key hash values. The function calls _dictExpandIfNeeded() to check for rehashing conditions. If the load factor of the hash table exceeds the threshold, it will call dictExpand() to initiate the rehashing. The size of the new table is twice as large as the previous one as evidenced by the second argument passed to dictExpand(), i.e., d->ht_used[0] + 1.

Function dictExpand() simply wraps over _dictExpand. The latter allocates the bucket array of the second hash table instance by calling zcalloc() and assigns it to d->ht_table[1]. In addition, the ht_used field is set to zero, and the ht_size_exp field is set to the log2 of the new size. Finally, the function sets d->rehashidx to zero, indicating that a rehashing is in progress. The value will be reset back to -1 after the rehashing completes.

On basically every hash table operation, the macro dictIsRehashing() (file dict.h) is called to check if the table is currently under rehashing. The macro simply checks whether d->rehashidx is -1 or not. If rehashing is in progress, then the function _dictRehashStep() is called to incrementally rehash a few buckets from the first hash table to the second one.

Function _dictRehashStep() wraps over dictRehash(). The latter rehashes entries by removing them from the first hash table and inserting them into the second hash table, with the values of d->ht_used being adjusted accordingly. The function returns when the first hash table becomes empty, or when n * 10 buckets have been rehashed in the current invocation. The field d->rehashidx stores the next index of the bucket to be rehashed and is hence incremented for every rehashed bucket.

Rehashing is completed when d->ht_used for the first table drop to zero. In this case, the second hash table is moved to the first table’s slot, and the first table is deallocated by calling zfree(). Field d->rehashidx is also reset to -1 such that no rehashing will be attempted.

Dict Operations

The lookup operation on the dict object is implemented in dictFind(). This function first calls dictHashKey() to compute the key hash value, then uses the hash value to find the bucket, and finally walks the entry linked list of the bucket and compares hash values and keys against the entries. Note that if rehashing is in progress, then both hash table instances will be checked. Otherwise, only the first instance is checked.

The insert operation is implemented in dictAddRaw(). This function first checks whether the key already exists (in both tables) by calling _dictKeyIndex(). _dictKeyIndex() returns the index of the bucket on the first hash table if no rehashing is in progress, or returns the index on the second hash table otherwise. Besides, the key will not be inserted if it already exists in any of the two tables. If the check passes, a new entry object is allocated and linked to the head of the bucket.

Deletion is implemented in dictGenericDelete(). This function locates the bucket using the hash value, walks the linked list, and unlinks the entry from the list if the hash value and key match. The function also takes an argument, nofree, which indicates whether the entry unlinked from the dictionary should be freed or not. If the caller needs the deleted entry (as is the case with dictUnlink()) then this value should be set to 1. Otherwise it is set to zero, as in dictDelete().

More complicated operations are also available on the dict object. However, these composite operations are just combinations of the above three primitives and can be easily understood.

Dict Types

Every dict object also has a type object of type struct dictType which is accessed via d->type. The type object consists of function pointers to handle a certain key and value types. For example, hashFunction defines the hash function for the key type. keyDup and keyDup define the duplication function for keys and values. These two callbacks are used by dictSetKey() and dictSetVal() macros to duplicate the key and value (if one is provided). Similarly, keyDestructor and valDestructor, when provided, are used to deallocate keys and values when entries are deallocated.

The Database Object

Initialization

The database object is the top-level data structure in Redis which maps keys to values. Database objects are initialized when the server is initialized in initServer() (file server.c). Users could specify the number of databases in the configuration file using databases option. This option is registered in the options table configs (file config.c), and when the configuration is applied, it sets server.dbnum field to the value given by the user (default to 16 otherwise).

During initialization, the databases are created as an array of redisDb objects using zmalloc() and stored in server.db field. Later in the same function, the databases are initialized. In particular,

Selection

Each client is assigned a database when created, which is a reference to the database object in the server object. By default, Redis assigns database zero to each newly created client in createClient(). In addition, clients can also switch database using the SELECT command, which is handled by the selectCommand() function (file db.c). The function parses the only argument as the database index and then invokes selectDb() to change the client’s current database reference.

Database Type

The database object contains a dict instance for key-value mapping, with the type being dbDictType (file server.c). The type object has all callbacks being set except key and value duplication functions, meaning that when a key-value pair is inserted into the database, the function that inserts it must duplicate the object if necessary. Besides, database value objects are reference counted, as indicated by the destructor callback function dictObjectDestructor() (file server.c). This function calls decrRefCount() (file object.c) on the value object. If the reference count drops to zero, decrRefCount() will then deallocate the value object based on its type using a switch block.

Key objects, on the contrary, is not reference counted. The destructor callback function for database keys is dictSdsDestructor() (file server.c), which simply deallocates the key string object by calling sdsfree() (file sds.c).

The Simple Dynamic String (SDS) Library

Redis encapsulates strings and binary data into a data type called the sds type. sds is an efficient and compact library for representing strings and arbitrary binary data. The implementation is in sds.h and sds.c.

Memory Layout

The sds type objects are referred to using the type name sds, which, surprisingly, is typedef’ed as char *. An sds type pointer, therefore, points to the beginning of the null-terminated string. However, compared with the standard C language strings, the sds object also has a header that is located before the sds pointer. The header stores the length and the allocated buffer size of the string and can be accessed by moving the pointer forward.

An sds header consists of three fields, i.e., a len field storing the length of the string (excluding the terminating '\0'), a alloc field storing the size of the allocated buffer (excluding the terminating '\0'), and a flag field storing the header type. Headers can be one of the four types, namely sdshdr8, sdshdr16, sdshdr32, and sdshdr64. These four types differ from each other by using different integer types for the len and alloc fields. The 8-bit flag field is placed at the end of the header and is therefore can be accessed via the sds pointer by subtracting one from it (e.g., sds[-1]). The flag field stores the header type, which must be read first in order to determine the size of the header.

Macro SDS_HDR(), given an sds pointer and the header type, returns the pointer to the header. Macro SDS_HDR_VAR() is translated into a variable definition of name sh, which points to the header of the given sds object.

SDS Object Creation

sds objects are created by calling sdsnewlen() or sdstrynewlen(), both wrapping the actual creation function _sdsnewlen(). Function _sdsnewlen() takes a pointer to the string or binary data, the length of data, and a flag trymalloc indicating whether malloc() failure should cause a panic. The function first computes the type of header to use by calling sdsReqType(). For shorter strings, we can encode its length and allocated buffer size with smaller integers, and therefore, the shorter header can be used to save space. It then allocates the storage for the sds object by calling s_malloc_usable(). The allocated size is the data size, plus header size, plus the trailing zero, although malloc may return a slightly larger buffer due to binned allocation, which is put in usable. Then the header is initialized with the length of data and the allocated size. Finally, the data is copied into the buffer using memcpy(). The sds type pointer is returned back to the caller. The pointer can be directly used as a C language string without any typecast or pointer arithmetic.

The SDS library also implements common string operations such as string copy, concatenation, trimming, etc. Their implementations are rather straightforward. Besides, the library provides functions to convert other types into sds objects, e.g., sdsfromlonglong(), and to print into an sds object from a format string just like snprintf().

Linked List

Redis contains a standard doubly linked list implementation in file adlist.h and adlist.c. The source code is simple and easy to understand with very little to cover. However, it is worth noting that Redis’s linked list object carries three callback functions, namely, dup, free, and match. These three functions will duplicate, deallocate, or compare for equality on the value object (value field of each node), respectively. As a result, the list object can be duplicated, deallocated, and searched for a particular key using the interface functions listDup(), listRelease(), and listSearch().

Intset

Redis implements sorted integer set in file intset.h and intset.c. Overall, the intset structure is just an array of integer elements stored compactly in sorted order. Lookup operations on the set involve binary search to locate the position of the given search key. Insertion operations need to shift the elements backwards if the key to be inserted is to be inserted into the middle of the element array.

Simple as it is, there are, however, several implementational highlights. First, the intset object implements three different element sizes, namely, 16-bit, 32-bit, and 64-bit integers. At any given moment, all elements must be of the same size, hence necessitating upgrade conversions between types when an element is inserted and the element cannot be represented in the current type. There is no downgrade, though, as an intset element will remain in the upgraded type even if all elements can be represented with shorter integers. Second, Redis performs endian conversion on both intset internal metadata and the set elements when they are read from and written into memory. The endian conversion is to maintain compatibility between small- and big-endian architectures when a database is dumped on one architecture and loaded back into the memory on another architecture with different endianness. Fortunately, Redis internally adopts small-endian representation for all data and metadata, meaning that the endianness conversion on x86 architecture is merely no-ops. To verify this claim, check out the endianness conversion macros and functions in endianconv.h and endianconv.c. Accordingly, the macros intrev32ifbe() and memrev16/32/64ifbe, which are heavily used in the intset implementation, can be safely ignored as no-ops.

Layout

The intset object contains only two fields. Field encoding stores the current element size, the value of which can be one of the INTSET_ENC_INT16, INTSET_ENC_INT32, and INTSET_ENC_INT64. Field length stores the current number of elements in the set. The element array follows the two fields and it fills the rest of the object. Note that the intset object itself is also variable-sized due to having the element array at the end.

Operations

An intset object is created via intsetNew(), which initializes an object with zero element. Lookup operations use intsetFind(), which calls into intsetSearch() to perform the binary search. Insert operations use intsetAdd(). This function first checks whether the newly inserted value can be represented with the intset’s current type. If negative, the function calls intsetUpgradeAndAdd() to first upgrade the set and then inserts the element. Function intsetUpgradeAndAdd() in turn calls intsetResize(), which uses realloc() to expand the memory block of the current intset object. The function then type casts all the existing elements in the element array to the upgraded size. Otherwise, the element can be directly into the intset without any conversion. In this case, the insert function calls intsetSearch() to locate the insertion point via binary search, then calls intsetResize() to potentially expand the intset’s memory block (it is essentially abusing malloc library’s allocation size feature), and finally inserts the element into the array after shifting the existing elements using intsetMoveTail() to make room for it. Deletion operations using intsetRemove() is just the reverse of insertion.

The Set Object

Redis’s setType object is the user-visible set type that can be manipulated using commands SADD, SREM, SCARD, and so on. The set object has two implementations. The first is the intset object that only stores 16-bit, 32-bit, or 64-bit integers. The second is the dict object that can store arbitrary elements as long as they can be hashed and compared. The initial type of a set object is determined by the first element inserted into the set. If the initial element can be parsed as an integer, then Redis will initialize the set as an intset. However, if later inserted elements can no longer be represented as integers, the set is implicitly converted into the dict object, hence allowing the insertion to happen without error.

Set Creation

The set object can be created via SADD and SMOVE if the (destination) key does not yet exist in the current database. In this case, the command handler calls setTypeCreate() (file t_set.c). The function checks whether the key can be parsed as a long integer using object utility function isSdsRepresentableAsLongLong() (file object.c), which itself calls into string2ll() (file util.c). If true, then the set object is created using createIntsetObject() (file object.c), which initializes an intset object and wraps it with robj type. Otherwise it is created using createSetObject() (file object.c), which is simply a dict object wrapped in robj. Note that Redis distinguishes these two representations via robj object’s encoding field (OBJ_ENCODING_INTSET and OBJ_ENCODING_HT, respectively). Besides, the dict type sets use setDictType (global data defined in file server.c) for key and values. The setDictType type object defines key comparison, key destructor, and key hash functions while the rest are left blank.

Set Operations

The client can check whether an element is a member of the set using the SISMEMBER command. Internally, this command is implemented by function setTypeIsMember() (file t_set.c). The function simply multiplexes dictFind() (file dict.c) and intsetFind() (file intset.c) for dict and intset types, respectively. The function returns an integer value to indicate whether the element exists. The integer value is also returned to the client as the result of the query.

New elements can be added via command SADD, SMOVE, etc. These commands are implemented with setTypeAdd() (file t_set.c). If the set object is a dict, it simply calls dictAddRaw() (file dict.c) to create an entry, and then sets the key of the entry to the element value.

For intset type, however, whenever a new element is added, the function needs to check whether the new element can be parsed into an integer using isSdsRepresentableAsLongLong(). If it is not the case, then the existing set is converted into a dict set by calling setTypeConvert(). This function first creates a dict type set using setTypeCreate() and then iterates over the intset object and converts the integer elements into sds objects using sdsfromlonglong() (file sds.c). Finally, the converted sds type keys are inserted into the newly created object. Finally, the old intset object is freed by calling zfree() on the robj’s ptr field, and the newly created dict object is assigned to the robj object. After conversion is completed, the new key is inserted into the set object by calling dictAddRaw().

One corner case of insertion is when the intset object grows to become overly large. In particular, when the size of the intset exceeds 1<<30 (1G entries), the intset object is force converted to a dict object to avoid allocating huge arrays from the system.

Set element is removed using command SREM, which is implemented by setTypeRemove() (file t_set.c). This function simply multiplexes between dictDelete() and intsetRemove() for dict and intset, respectively. Interestingly, this function also implements a global policy, which states that if the load factor of a dict type hash table (including the set object) drops below a compile-time constant HASHTABLE_MIN_FILL (which is 10%), then the hash table needs to be shrinked by calling dictResize() (file dict.c). The policy is implemented in function htNeedsResize(), which is defined in a seemingly unrelated place: server.c.

Lastly, the size of the set, also known as the “cadinality” of the set, can be obtained via command SCARD. This command is implemented by setTypeSize() (file t_set.c), which multiplexes between dictSize() and intsetLen().

The Listpack Object

Redis implements a listpack type to compactly represent lists of integers and strings. The listpack type is designed to maximize storage efficiency at the cost of lower read performance, especially random reads. The implementation is in file listpack.h and listpack.c.

Object Memory Layout

Overall, the listpack object is a single block of memory consisting of a header, a body, and an end mark. The header of the object consists of two fields. The first field is a 32-bit integer storing the total size of the object including all three parts. The second field is a 16-bit integer storing the number of list elements in the body.

The body of the listpack object consists of an array of variable-sized entries. Each entry consists of a 1-byte encoding field describing the encoding of the element (which can be a string or integer, but there are several forms of storage-efficient encoding for each type). The interpretation of the following bytes depend on the encoding field. In general, if the field indicates that the entry is a form of a string, then the next bytes will be the length of the string, followed by the string itself. On the other hand, if the field indicates that the entry is a form of an integer, then the next bytes will be the integer. Finally, there are also special string and integer encodings that “borrow” bits from the encoding field. In this case, the lower bits of the field will be used to store either the string length or the integer value.

At the end of the listpack object, there is an end mark of value 0xFF. The end mark can be thought of as a special encoding field that does not encode any data, but rather indicates the end of the list.

Entry Encoding

We next discuss the data layout of different encodings. If the encoding field is of value 2'b0xxx xxxx, then the entry is a 7-bit integer, and the integer value is stored in the lower 7 bits of the encoding field. In this case, the entry has no extra bytes as the value “borrows” 7 bits from the encoding field.

Similarly, if the field is of value 2'b10xx xxxx or 2'b110x xxxx, then the entry is a 6-bit or 13-bit integer. In the former case, the lower 6 bits of the field stores the integer value. In the latter case, the lower 5 bits and the next byte store the integer (and hence it has 5 + 8 = 13 bits).

If the encoding field is of value 2'b1110 xxxx, then the entry is a string with a 12-bit length field. In this case, the lower 4 bits of the field plus the next byte encodes the length of the string. The string value is stored compactly right after. Similarly, if the field is 2'b1111 0000, then the entry is a string with 32-bit length field. No bit is borrowed from the encoding field in this case, and the next 4 bytes encode the length of the string.

The rest three cases for encoding, i.e., 2'b1111 0001, 2'b1111 0010, 2'b1111 0011, 2'b1111 0100, represent 16-bit, 32-bit, and 64-bit integers, respectively. No bit is borrowed from the encoding field, and the integer value is stored after the field.

Helper Macros and Functions

The source code implementing the listpack type provides several helper macros and functions to aid programming and promote redability. Macro lpGetTotalBytes() takes a pointer to the header (first byte) of the object and returns the object size in total number of bytes. Macro lpGetNumElements() takes a pointer to the header (first byte) of the object and returns the number of elements. Similarly, macros lpSetTotalBytes() and lpSetNumElements() set the two header fields given a header pointer and the new value.

Macros whose name begins with LP_ENCODING_ facilitates encoding-related matters. The order that these macros are laid out in the source file is also important, as it is also the order that the encoding field should be tested due to the special encoding of the field.

Function lpCurrentEncodedSizeUnsafe(), given an entry pointer (pointing to the encoding field), returns the size of the field including the encoding field and the data. A similar function, lpCurrentEncodedSizeBytes(), returns the size of the encoding field plus the length field, if the entry stores an integer. Otherwise, it always returns 1 for integers.

Redis Python Client Library

At the client side, a client library is needed to communicate with Redis server. The client library implements Redis’s RESP protocol, which encodes strings, arrays, and so on into a specific format that can be understood by Redis server. Many open-sourced implementations of the Redis client interface are available, and in this section, we go over the Python language implementation, redis-py.

Initializing the Client Object

The Redis Python interface can be imported into the source using import redis. After that, a new Redis object representing the client can be created by initializing redis.Redis with the host name or IP address of the Redis server and the port number. Class Redis is defined in file client.py of the source tree. In the most general case, the object constructor creates a ConnectionPool object and saves it to the connection_pool field of the Redis object. The ConnectionPool object (file connection.py) is a thin wrapping layer over the actual connection object, class Connection. The ConnectionPool is responsible for dynamically maintaining a pool of connection objects to maximize the reuse of allocated OS sockets. The class provides two main interface methods. The first is get_connection(), which either returns an existing connection object from its _available_connections list, or creates a new connection object by calling make_connection() of itself if the list is empty. Either way, the new connection is created with the arguments passed into class Redis’s constructor, connected to the Redis server by calling connect() on the connection object, and finally returned back to the caller. The second method is release(), which returns a connection back to the pool object by inserting it back into the _available_connections list. Note that the pool object will keep connections alive and not disconnect them proactively from the client’s side. The pool object is both thread-safe and fork-safe such that connections will not be shared between threads and processes. The former is guarded by a thread lock such that concurrent usages of the Redis object will not cause data corruption. The latter is also necessary to avoid different processes after fork() to keep sharing connections.

Interestingly, the class Redis constructor can also be instructed to use Unix domain socket, which is an IPC mechanism provided by the OS kernel, when argument unix_socket_path is set to anything but None. Besides, argument single_connection_client, if set not None, will cause the class Redis object to only open a single connection and save it to field connection. In this case, the object is single-thread only.

The Connection Object

The connection object, which will be initialized and managed by the connection pool, is defined in file connection.py as class Connection. The most important interface of this object is connect(), which requests a socket from the OS and connects the to Redis server. In particular, this function first requests a socket from the OS by calling _connect() of itself, which in turn invokes Python library function socket() to access the system call. Then it invokes on_connect() to authenticate with the server, if the credential is given, set the client’s name, and then select the database. All the operations in this function are optional, and will only be performed when the information is given to the class Redis object (via the constructor) owning the connection object.

The default connection object assumes TCP/IP protocol without TLS. Alternatively, users can instruct the Redis object to use Unix domain sockets, which is a form of IPC that binds to a local file system node rather than to a host name of IP address. The domain socket connection object is implemented as class UnixDomainSocketConnection (file connection.py), which inherits from the connection class discussed above. The domain socket class overrides the _connect() method such that it creates a domain socket object by calling Python library function socket() with the socket type being AF_UNIX.

The connection type is selected by passing different class objects into class ConnectionPool’s contructor as the connection_class argument.

Executing Commands

After constructing the object and setting up the connection, the class Redis object is returned to the caller, which is ready to accept commands. Commonly used commands such as GET and SET are implemented in class BasicKeyCommands (file commands/core.py). For example, the GET command is implemented as a simple function get(), which does nothing else except
calling execute_command() with the command string GET as the first argument and the key as the second argument.

class BasicKeyCommands, together with a few other classes implementing commands, are inherited by the main Redis class. Therefore, calls to method execute_command() within the get() (which is called by the user using the Redis object as self pointer) will eventually land in class Redis’s execute_command() method. Function execute_command() (file client.py) accepts unnamed arguments in args and keyword arguments in options. This function first grabs a connection object either from its connection field (in the case of single connection Redis object), or by calling get_connection() of the pool object. Then it indirectly invokes _send_command_parse_response() to send the command and wait for the response. Function _send_command_parse_response() (file client.py) simply calls send_command() on the connection object, and then waits for and parses the response by calling parse_response(). The value from parse_response() is returned to the user.

Generating the RESP Request Stream

Function send_command() (file connection.py) generates the request stream by calling method pack() on its _command_packer field. The field is of type class PythonRespSerializer and is assigned during class Connection object’s construction by calling method _construct_command_packer() (file connection.py).

The request stream of a command is generated by calling pack() on the class PythonRespSerializer object. The pack method takes a list argument args and implements Redis’s RESP protocol. The RESP protocol serializes strings, integers, lists, and so on, into a single stream that can be understood by the Redis server. For Redis requests, the REST-compatible stream begins with a * symbol, followed by the number of arguments (including the command itself) and then \r\n. The command is encoded as a string object, which begins with $, followed by the length of the string, then followed by \r\n, and finally followed by the string itself (which can contain \r, \n, or both). The rest of the stream consists of the arguments in their corresponding RESP encoding. In the simplest scenario, there is no argument, and the stream only contains the command. In common scenarios such as GET, there is only one argument, which is the key, and the key is encoded by RESP as a string object.

Method pack() generates the request stream in the exact same way as described above. It first computes the size of the argument list, and generates the header of the stream by joining SYM_STAR, the length of the string, and SYM_CRLF together. Then, for every element in the argument list (including the command, which is the first element of the list), it generates a string object by joining SYM_DOLLAR, the length of the element, SYM_CRLF, the object content, and finally another SYM_CRLF. The final result can be in one or more buffer objects to avoid large data copy. The buffer objects are inserted into a list and returned to the caller.

Sending The Stream

The returned list of buffer objects from pack() is then given to the send_packed_command() method as the first argument. Function send_packed_command() iterates over the list, and for each buffer object, invokes sendall() on the socket object of the connection to send it to the Redis server.

Waiting for the Result

After the command is sent, the control flow returns to function _send_command_parse_response() (file client.py), and will then call parse_response() of the Redis object to wait for the result. Function parse_response() (file client.py) takes the connection object as its first argument and it further calls into the read_response() method of the connection object. Function read_response() (file connection.py) wraps over read_response() on the same _parser field. The _parser field is assigned during construction of the connection object (by calling set_parser() on self) and is of type class PythonParser.

Function read_response() (file connection.py) of class PythonParser calls _read_response() of the same object. The latter in turn calls readline() on its _buffer field. The _buffer field is assigned when the connection is initialized, in the connection object’s on_connect() method (which calls the on_connect() method of the parser). The type of the _buffer field is class SocketBuffer. Its readline() method (file connection.py) calls _read_from_socket() in a loop, which in turn invokes recv on the connection socket’s, receiving response data and appending it into the parser’s internal buffer object _buffer. Function readline() returns on seeing a trailing SYM_CRLF in the response stream.

After the response message is fully received, the control flow returns to the parer object’s _read_response() method. This method then inspects the first character of the response message and parses the rest based on the first character. If the first character indicates that more data should be received, the method will further call read() method of the _buffer to complete the receival process. The response message is returned to the caller after it is fully received. The response message will then climb up the call chain through the connect object’s _read_response() and read_response(), the Redis object’s parse_response(), _send_command_parse_response(), and execute_command(), the BasicKeyCommands object’s get(), and finally be returned to the user.

Build, Compilation, and Usage

Connecting to Redis Server using Telnet

Instead of using a client that implements RESP, users can interact with Redis server using telnet by manually typing the command. For a Redis server instance started on the local host on the default port 6379, users can connect to it using the following telnet command:

telnet localhost 6379

After connecting to the server (there is no prompt), users can then send commands with space-separated arguments. For example, in order to set a key key1 to string value1, type the following command:

set key1 value1

and Redis will return +OK to indicate successful execution of the command. In order to retrieve the value set by the previous command, type

get key1

and Redis will return value1.

Note that common key combinations such as Ctrl+C, Ctrl+D, and Ctrl+Z do not work on telnet as intended. In order to terminate the session, users need to first press Ctrl+] to switch to telnet console, and then type quit to close the connection.

Disabling Persistence

Redis has two independent persistence mechanisms: RDB and AOF. RDB uses copy-on-write (implemented in the OS kernel via fork()) to capture a consistent memory snapshot and save it to the disk. AOF (Append-Only File) is similar to write-ahead logging and it writes committed operations to a log file on the disk. Besides, when Redis exits via Ctrl-C, it will also save the dump of the database as dump.rdb in the current working directory.

In order to disable persistence entirely, pass the following command line options to redis-server:

./redis-server --save "" --appendonly no

The first --save option followed by an empty string disables RDB snapshotting. The second --appendonly option disables AOF.

Adding New Source Files

Redis has a rather clear and simple make system. In order to add a new source file for compilation, first you should create the file under ./src directory. Then update the Makefile under ./src by adding the object file name to the list named REDIS_SERVER_OBJ (assuming the file is part of the server).