Redis failover with StackExchange / Sentinel from C#

I was able to spend some time last week with the Linux guys testing scenarios and working on the C# side of this implementation and am using the following approach:

  • Read the sentinel addresses from config and create a ConnectionMultiplexer to connect to them
  • Subscribe to the +switch-master channel
  • Ask each sentinel server in turn what they think the master redis and slaves are, compare them all to make sure they all agree
  • Create a new ConnectionMultiplexer with the redis server addresses read from sentinel and connect, add event handler to ConnectionFailed and ConnectionRestored.
  • When I receive the +switch-master message I call Configure() on the redis ConnectionMultiplexer
  • As a belt and braces approach I always call Configure() on the redis ConnectionMultiplexer 12 seconds after receiving a connectionFailed or connectionRestored event when the connection type is ConnectionType.Interactive.

I find that generally I am working and reconfigured after about 5 seconds of losing the redis master. During this time I can't write but I can read (since you can read off a slave). 5 seconds is ok for us since our data updates very quickly and becomes stale after a few seconds (and is subsequently overwritten).

One thing I wasn't sure about was whether or not I should remove the redis server from the redis ConnectionMultiplexer when an instance goes down, or let it continue to retry the connection. I decided to leave it retrying as it comes back into the mix as a slave as soon as it comes back up. I did some performance testing with and without a connection being retried and it seemed to make little difference. Maybe someone can clarify whether this is the correct approach.

Every now and then bringing back an instance that was previously a master did seem to cause some confusion - a few seconds after it came back up I would receive an exception from writing - "READONLY" suggesting I can't write to a slave. This was rare but I found that my "catch-all" approach of calling Configure() 12 seconds after a connection state change caught this problem. Calling Configure() seems very cheap and therefore calling it twice regardless of whether or not it's necessary seemed OK.

Now that I have slaves I have offloaded some of my data cleanup code which does key scans to the slaves, which makes me happy.

All in all I'm pretty satisfied, it's not perfect but for something that should very rarely happen it's more than good enough.


I am including our Redis wrapper, it has changed somewhat from the original answer, for various reasons:

  • We wanted to use pub/sub
  • Sentinel didn't always appear to give us the master changed message at the 'right' time (i.e. it we called Configure() and ended up thinking a slave was a master)
  • The connectionMultiplexer didn't always seem to restore connctions every time, affecting pub/sub

I rather suspect this is down to our sentinel/redis configuration more than anything else. Either way, it just wasn't perfectly reliable despite destructive testing. Added to which, the master changed message took a long time since we had to increase timeouts due to sentinel being "too sensitive" and calling failovers when there weren't any. I think running in a virtual environment also exacerbated the problem.

Instead of listening to subscriptions now we simply attempt a write test every 5 seconds, and also have a "last message received" check for pub/sub. If we encounter any problems we strip down completely the connections and rebuild them. It seems overkill but actually it's pretty fast and still faster than waiting for the master changed message from sentinel...

This won't compile without various extension methods and other classes etc, but you get the idea.

namespace Smartodds.Framework.Redis
{
    public class RedisClient : IDisposable
    {
        public RedisClient(RedisEnvironmentElement environment, Int32 databaseId)
        {
            m_ConnectTimeout = environment.ConnectTimeout;
            m_Timeout = environment.Timeout;
            m_DatabaseId = databaseId;
            m_ReconnectTime = environment.ReconnectTime;
            m_CheckSubscriptionsTime = environment.CheckSubscriptions;
            if (environment.TestWrite == true)
            {
                m_CheckWriteTime = environment.TestWriteTime;
            }

            environment.Password.ToCharArray().ForEach((c) => m_Password.AppendChar(c));

            foreach (var server in environment.Servers)
            {
                if (server.Type == ServerType.Redis)
                {
                    // will be ignored if sentinel servers are used
                    m_RedisServers.Add(new RedisConnection { Address = server.Host, Port = server.Port });
                }
                else
                {
                    m_SentinelServers.Add(new RedisConnection { Address = server.Host, Port = server.Port });
                }
            }
        }

        public bool IsSentinel { get { return m_SentinelServers.Count > 0; } }

        public IDatabase Database { get { return _Redis.GetDatabase(m_DatabaseId); } }

        private ConnectionMultiplexer _Redis
        {
            get
            {
                if (m_Connecting == true)
                {
                    throw new RedisConnectionNotReadyException();
                }

                ConnectionMultiplexer redis = m_Redis;
                if (redis == null)
                {
                    throw new RedisConnectionNotReadyException();
                }

                return redis;
            }
        }

        private ConnectionMultiplexer _Sentinel
        {
            get
            {
                if (m_Connecting == true)
                {
                    throw new RedisConnectionNotReadyException("Sentinel connection not ready");
                }

                ConnectionMultiplexer sentinel = m_Sentinel;
                if (sentinel == null)
                {
                    throw new RedisConnectionNotReadyException("Sentinel connection not ready");
                }

                return sentinel;
            }
        }

        public void RegisterSubscription(string channel, Action<RedisChannel, RedisValue> handler, Int32 maxNoReceiveSeconds)
        {
            m_Subscriptions.Add(channel, new RedisSubscription
            {
                Channel = channel,
                Handler = handler,
                MaxNoReceiveSeconds = maxNoReceiveSeconds,
                LastUsed = DateTime.UtcNow,
            });
        }

        public void Connect()
        {
            _Connect(true);
        }

        private void _Connect(object state)
        {
            bool throwException = (bool)state;

            // if a reconnect is already being attempted, don't hang around waiting
            if (Monitor.TryEnter(m_ConnectionLocker) == false)
            {
                return;
            }

            // we took the lock, notify everything we are connecting
            m_Connecting = true;

            try
            {
                Stopwatch sw = Stopwatch.StartNew();
                LoggerQueue.Debug(">>>>>> REDIS CONNECTING... >>>>>>");

                // if this is a reconnect, make absolutely sure everything is cleaned up first
                _KillTimers();
                _KillRedisClient();

                if (this.IsSentinel == true && m_Sentinel == null)
                {
                    LoggerQueue.Debug(">>>>>> CONNECTING TO SENTINEL >>>>>> - " + sw.Elapsed);

                    // we'll be getting the redis servers from sentinel
                    ConfigurationOptions sentinelConnection = _CreateRedisConfiguration(CommandMap.Sentinel, null, m_SentinelServers);
                    m_Sentinel = ConnectionMultiplexer.Connect(sentinelConnection);
                    LoggerQueue.Debug(">>>>>> CONNECTED TO SENTINEL >>>>>> - " + sw.Elapsed);

                    _OutputConfigurationFromSentinel();

                    // get all the redis servers from sentinel and ignore any set by caller
                    m_RedisServers.Clear();
                    m_RedisServers.AddRange(_GetAllRedisServersFromSentinel());

                    if (m_RedisServers.Count == 0)
                    {
                        throw new RedisException("Sentinel found no redis servers");
                    }
                }

                LoggerQueue.Debug(">>>>>> CONNECTING TO REDIS >>>>>> - " + sw.Elapsed);

                // try to connect to all redis servers
                ConfigurationOptions connection = _CreateRedisConfiguration(CommandMap.Default, _SecureStringToString(m_Password), m_RedisServers);
                m_Redis = ConnectionMultiplexer.Connect(connection);
                LoggerQueue.Debug(">>>>>> CONNECTED TO REDIS >>>>>> - " + sw.Elapsed);

                // register subscription channels
                m_Subscriptions.ForEach(s =>
                {
                    m_Redis.GetSubscriber().Subscribe(s.Key, (channel, value) => _SubscriptionHandler(channel, value));
                    s.Value.LastUsed = DateTime.UtcNow;
                });

                if (this.IsSentinel == true)
                {
                    // check subscriptions have been sending messages
                    if (m_Subscriptions.Count > 0)
                    {
                        m_CheckSubscriptionsTimer = new Timer(_CheckSubscriptions, null, 30000, m_CheckSubscriptionsTime);
                    }

                    if (m_CheckWriteTime != null)
                    {
                        // check that we can write to redis
                        m_CheckWriteTimer = new Timer(_CheckWrite, null, 32000, m_CheckWriteTime.Value);
                    }

                    // monitor for connection status change to any redis servers
                    m_Redis.ConnectionFailed += _ConnectionFailure;
                    m_Redis.ConnectionRestored += _ConnectionRestored;
                }

                LoggerQueue.Debug(string.Format(">>>>>> ALL REDIS CONNECTED ({0}) >>>>>>", sw.Elapsed));
            }
            catch (Exception ex)
            {
                LoggerQueue.Error(">>>>>> REDIS CONNECT FAILURE >>>>>>", ex);

                if (throwException == true)
                {
                    throw;
                }
                else
                {
                    // internal reconnect, the reconnect has failed so might as well clean everything and try again
                    _KillTimers();
                    _KillRedisClient();

                    // faster than usual reconnect if failure
                    _ReconnectTimer(1000);
                }
            }
            finally
            {
                // finished connection attempt, notify everything and remove lock
                m_Connecting = false;
                Monitor.Exit(m_ConnectionLocker);
            }
        }

        private ConfigurationOptions _CreateRedisConfiguration(CommandMap commandMap, string password, List<RedisConnection> connections)
        {
            ConfigurationOptions connection = new ConfigurationOptions
            {
                CommandMap = commandMap,
                AbortOnConnectFail = true,
                AllowAdmin = true,
                ConnectTimeout = m_ConnectTimeout,
                SyncTimeout = m_Timeout,
                ServiceName = "master",
                TieBreaker = string.Empty,
                Password = password,
            };

            connections.ForEach(s =>
            {
                connection.EndPoints.Add(s.Address, s.Port);
            });

            return connection;
        }

        private void _OutputConfigurationFromSentinel()
        {
            m_SentinelServers.ForEach(s =>
            {
                try
                {
                    IServer server = m_Sentinel.GetServer(s.Address, s.Port);
                    if (server.IsConnected == true)
                    {
                        try
                        {
                            IPEndPoint master = server.SentinelGetMasterAddressByName("master") as IPEndPoint;
                            var slaves = server.SentinelSlaves("master");

                            StringBuilder sb = new StringBuilder();
                            sb.Append(">>>>>> _OutputConfigurationFromSentinel Server ");
                            sb.Append(s.Address);
                            sb.Append(" thinks that master is ");
                            sb.Append(master);
                            sb.Append(" and slaves are ");

                            foreach (var slave in slaves)
                            {
                                string name = slave.Where(i => i.Key == "name").Single().Value;
                                bool up = slave.Where(i => i.Key == "flags").Single().Value.Contains("disconnected") == false;

                                sb.Append(name);
                                sb.Append("(");
                                sb.Append(up == true ? "connected" : "down");
                                sb.Append(") ");
                            }
                            sb.Append(">>>>>>");
                            LoggerQueue.Debug(sb.ToString());
                        }
                        catch (Exception ex)
                        {
                            LoggerQueue.Error(string.Format(">>>>>> _OutputConfigurationFromSentinel Could not get configuration from sentinel server ({0}) >>>>>>", s.Address), ex);
                        }
                    }
                    else
                    {
                        LoggerQueue.Error(string.Format(">>>>>> _OutputConfigurationFromSentinel Sentinel server {0} was not connected", s.Address));
                    }
                }
                catch (Exception ex)
                {
                    LoggerQueue.Error(string.Format(">>>>>> _OutputConfigurationFromSentinel Could not get IServer from sentinel ({0}) >>>>>>", s.Address), ex);
                }
            });
        }

        private RedisConnection[] _GetAllRedisServersFromSentinel()
        {
            // ask each sentinel server for its configuration
            List<RedisConnection> redisServers = new List<RedisConnection>();
            m_SentinelServers.ForEach(s =>
            {
                try
                {
                    IServer server = m_Sentinel.GetServer(s.Address, s.Port);
                    if (server.IsConnected == true)
                    {
                        try
                        {
                            // store master in list
                            IPEndPoint master = server.SentinelGetMasterAddressByName("master") as IPEndPoint;
                            redisServers.Add(new RedisConnection { Address = master.Address.ToString(), Port = master.Port });

                            var slaves = server.SentinelSlaves("master");
                            foreach (var slave in slaves)
                            {
                                string address = slave.Where(i => i.Key == "ip").Single().Value;
                                string port = slave.Where(i => i.Key == "port").Single().Value;

                                redisServers.Add(new RedisConnection { Address = address, Port = Convert.ToInt32(port) });
                            }
                        }
                        catch (Exception ex)
                        {
                            LoggerQueue.Error(string.Format(">>>>>> _GetAllRedisServersFromSentinel Could not get redis servers from sentinel server ({0}) >>>>>>", s.Address), ex);
                        }
                    }
                    else
                    {
                        LoggerQueue.Error(string.Format(">>>>>> _GetAllRedisServersFromSentinel Sentinel server {0} was not connected", s.Address));
                    }
                }
                catch (Exception ex)
                {
                    LoggerQueue.Error(string.Format(">>>>>> _GetAllRedisServersFromSentinel Could not get IServer from sentinel ({0}) >>>>>>", s.Address), ex);
                }
            });

            return redisServers.Distinct().ToArray();
        }

        private IServer _GetRedisMasterFromSentinel()
        {
            // ask each sentinel server for its configuration
            foreach (RedisConnection sentinel in m_SentinelServers)
            {
                IServer sentinelServer = _Sentinel.GetServer(sentinel.Address, sentinel.Port);
                if (sentinelServer.IsConnected == true)
                {
                    try
                    {
                        IPEndPoint master = sentinelServer.SentinelGetMasterAddressByName("master") as IPEndPoint;
                        return _Redis.GetServer(master);
                    }
                    catch (Exception ex)
                    {
                        LoggerQueue.Error(string.Format(">>>>>> Could not get redis master from sentinel server ({0}) >>>>>>", sentinel.Address), ex);
                    }
                }
            }

            throw new InvalidOperationException("No sentinel server available to get master");
        }

        private void _ReconnectTimer(Nullable<Int32> reconnectMilliseconds)
        {
            try
            {
                lock (m_ReconnectLocker)
                {
                    if (m_ReconnectTimer != null)
                    {
                        m_ReconnectTimer.Dispose();
                        m_ReconnectTimer = null;
                    }

                    // since a reconnect will definately occur we can stop the check timers for now until reconnect succeeds (where they are recreated)
                    _KillTimers();

                    LoggerQueue.Warn(">>>>>> REDIS STARTING RECONNECT TIMER >>>>>>");

                    m_ReconnectTimer = new Timer(_Connect, false, reconnectMilliseconds.GetValueOrDefault(m_ReconnectTime), Timeout.Infinite);
                }
            }
            catch (Exception ex)
            {
                LoggerQueue.Error("Error during _ReconnectTimer", ex);
            }
        }

        private void _CheckSubscriptions(object state)
        {
            if (Monitor.TryEnter(m_ConnectionLocker, TimeSpan.FromSeconds(1)) == false)
            {
                return;
            }

            try
            {
                DateTime now = DateTime.UtcNow;
                foreach (RedisSubscription subscription in m_Subscriptions.Values)
                {
                    if ((now - subscription.LastUsed) > TimeSpan.FromSeconds(subscription.MaxNoReceiveSeconds))
                    {
                        try
                        {
                            EndPoint endpoint = m_Redis.GetSubscriber().IdentifyEndpoint(subscription.Channel);
                            EndPoint subscribedEndpoint = m_Redis.GetSubscriber().SubscribedEndpoint(subscription.Channel);

                            LoggerQueue.Warn(string.Format(">>>>>> REDIS Channel '{0}' has not been used for longer than {1}s, IsConnected: {2}, IsConnectedChannel: {3}, EndPoint: {4}, SubscribedEndPoint: {5}, reconnecting...", subscription.Channel, subscription.MaxNoReceiveSeconds, m_Redis.GetSubscriber().IsConnected(), m_Redis.GetSubscriber().IsConnected(subscription.Channel), endpoint != null ? endpoint.ToString() : "null", subscribedEndpoint != null ? subscribedEndpoint.ToString() : "null"));
                        }
                        catch (Exception ex)
                        {
                            LoggerQueue.Error(string.Format(">>>>>> REDIS Error logging out details of Channel '{0}' reconnect", subscription.Channel), ex);
                        }

                        _ReconnectTimer(null);
                        return;
                    }
                }
            }
            catch (Exception ex)
            {
                LoggerQueue.Error(">>>>>> REDIS Exception ERROR during _CheckSubscriptions", ex);
            }
            finally
            {
                Monitor.Exit(m_ConnectionLocker);
            }
        }

        private void _CheckWrite(object state)
        {
            if (Monitor.TryEnter(m_ConnectionLocker, TimeSpan.FromSeconds(1)) == false)
            {
                return;
            }

            try
            {
                this.Database.HashSet(Environment.MachineName + "SmartoddsWriteCheck", m_CheckWriteGuid.ToString(), DateTime.UtcNow.Ticks);
            }
            catch (RedisConnectionNotReadyException)
            {
                LoggerQueue.Warn(">>>>>> REDIS RedisConnectionNotReadyException ERROR DURING _CheckWrite");
            }
            catch (RedisServerException ex)
            {
                LoggerQueue.Warn(">>>>>> REDIS RedisServerException ERROR DURING _CheckWrite, reconnecting... - " + ex.Message);
                _ReconnectTimer(null);
            }
            catch (RedisConnectionException ex)
            {
                LoggerQueue.Warn(">>>>>> REDIS RedisConnectionException ERROR DURING _CheckWrite, reconnecting... - " + ex.Message);
                _ReconnectTimer(null);
            }
            catch (TimeoutException ex)
            {
                LoggerQueue.Warn(">>>>>> REDIS TimeoutException ERROR DURING _CheckWrite - " + ex.Message);
            }
            catch (Exception ex)
            {
                LoggerQueue.Error(">>>>>> REDIS Exception ERROR during _CheckWrite", ex);
            }
            finally
            {
                Monitor.Exit(m_ConnectionLocker);
            }
        }

        private void _ConnectionFailure(object sender, ConnectionFailedEventArgs e)
        {
            LoggerQueue.Warn(string.Format(">>>>>> REDIS CONNECTION FAILURE, {0}, {1}, {2} >>>>>>", e.ConnectionType, e.EndPoint.ToString(), e.FailureType));
        }

        private void _ConnectionRestored(object sender, ConnectionFailedEventArgs e)
        {
            LoggerQueue.Warn(string.Format(">>>>>> REDIS CONNECTION RESTORED, {0}, {1}, {2} >>>>>>", e.ConnectionType, e.EndPoint.ToString(), e.FailureType));
        }

        private void _SubscriptionHandler(string channel, RedisValue value)
        {
            // get handler lookup
            RedisSubscription subscription = null;
            if (m_Subscriptions.TryGetValue(channel, out subscription) == false || subscription == null)
            {
                return;
            }

            // update last used
            subscription.LastUsed = DateTime.UtcNow;

            // call handler
            subscription.Handler(channel, value);
        }

        public Int64 Publish(string channel, RedisValue message)
        {
            try
            {
                return _Redis.GetSubscriber().Publish(channel, message);
            }
            catch (RedisConnectionNotReadyException)
            {
                LoggerQueue.Error("REDIS RedisConnectionNotReadyException ERROR DURING Publish");
                throw;
            }
            catch (RedisServerException ex)
            {
                LoggerQueue.Error("REDIS RedisServerException ERROR DURING Publish - " + ex.Message);
                throw;
            }
            catch (RedisConnectionException ex)
            {
                LoggerQueue.Error("REDIS RedisConnectionException ERROR DURING Publish - " + ex.Message);
                throw;
            }
            catch (TimeoutException ex)
            {
                LoggerQueue.Error("REDIS TimeoutException ERROR DURING Publish - " + ex.Message);
                throw;
            }
            catch (Exception ex)
            {
                LoggerQueue.Error("REDIS Exception ERROR DURING Publish", ex);
                throw;
            }
        }

        public bool LockTake(RedisKey key, RedisValue value, TimeSpan expiry)
        {
            return _Execute(() => this.Database.LockTake(key, value, expiry));
        }

        public bool LockExtend(RedisKey key, RedisValue value, TimeSpan extension)
        {
            return _Execute(() => this.Database.LockExtend(key, value, extension));
        }

        public bool LockRelease(RedisKey key, RedisValue value)
        {
            return _Execute(() => this.Database.LockRelease(key, value));
        }

        private void _Execute(Action action)
        {
            try
            {
                action.Invoke();
            }
            catch (RedisServerException ex)
            {
                LoggerQueue.Error("REDIS RedisServerException ERROR DURING _Execute - " + ex.Message);
                throw;
            }
            catch (RedisConnectionException ex)
            {
                LoggerQueue.Error("REDIS RedisConnectionException ERROR DURING _Execute - " + ex.Message);
                throw;
            }
            catch (TimeoutException ex)
            {
                LoggerQueue.Error("REDIS TimeoutException ERROR DURING _Execute - " + ex.Message);
                throw;
            }
            catch (Exception ex)
            {
                LoggerQueue.Error("REDIS Exception ERROR DURING _Execute", ex);
                throw;
            }
        }

        private TResult _Execute<TResult>(Func<TResult> function)
        {
            try
            {
                return function.Invoke();
            }
            catch (RedisServerException ex)
            {
                LoggerQueue.Error("REDIS RedisServerException ERROR DURING _Execute - " + ex.Message);
                throw;
            }
            catch (RedisConnectionException ex)
            {
                LoggerQueue.Error("REDIS RedisConnectionException ERROR DURING _Execute - " + ex.Message);
                throw;
            }
            catch (TimeoutException ex)
            {
                LoggerQueue.Error("REDIS TimeoutException ERROR DURING _Execute - " + ex.Message);
                throw;
            }
            catch (Exception ex)
            {
                LoggerQueue.Error("REDIS ERROR DURING _Execute", ex);
                throw;
            }
        }

        public string[] GetAllKeys(string pattern)
        {
            if (m_Sentinel != null)
            {
                return _GetAnyRedisSlaveFromSentinel().Keys(m_DatabaseId, pattern).Select(k => (string)k).ToArray();
            }
            else
            {
                return _Redis.GetServer(_Redis.GetEndPoints().First()).Keys(m_DatabaseId, pattern).Select(k => (string)k).ToArray();
            }
        }

        private void _KillSentinelClient()
        {
            try
            {
                if (m_Sentinel != null)
                {
                    LoggerQueue.Debug(">>>>>> KILLING SENTINEL CONNECTION >>>>>>");

                    ConnectionMultiplexer sentinel = m_Sentinel;
                    m_Sentinel = null;

                    sentinel.Close(false);
                    sentinel.Dispose();
                }
            }
            catch (Exception ex)
            {
                LoggerQueue.Error(">>>>>> Error during _KillSentinelClient", ex);
            }
        }

        private void _KillRedisClient()
        {
            try
            {
                if (m_Redis != null)
                {
                    Stopwatch sw = Stopwatch.StartNew();
                    LoggerQueue.Debug(">>>>>> KILLING REDIS CONNECTION >>>>>>");

                    ConnectionMultiplexer redis = m_Redis;
                    m_Redis = null;

                    if (this.IsSentinel == true)
                    {
                        redis.ConnectionFailed -= _ConnectionFailure;
                        redis.ConnectionRestored -= _ConnectionRestored;
                    }

                    redis.Close(false);
                    redis.Dispose();

                    LoggerQueue.Debug(">>>>>> KILLED REDIS CONNECTION >>>>>> " + sw.Elapsed);
                }
            }
            catch (Exception ex)
            {
                LoggerQueue.Error(">>>>>> Error during _KillRedisClient", ex);
            }
        }

        private void _KillClients()
        {
            lock (m_ConnectionLocker)
            {
                _KillSentinelClient();
                _KillRedisClient();
            }
        }

        private void _KillTimers()
        {
            if (m_CheckSubscriptionsTimer != null)
            {
                m_CheckSubscriptionsTimer.Dispose();
                m_CheckSubscriptionsTimer = null;
            }
            if (m_CheckWriteTimer != null)
            {
                m_CheckWriteTimer.Dispose();
                m_CheckWriteTimer = null;
            }
        }

        public void Dispose()
        {
            _KillClients();
            _KillTimers();
        }
    }
}

I just asked this question, and found a similar question to yours and mine which I believe answers the question of how does our code (the client) know now which is the new master server when the current master goes down?

How to tell a Client where the new Redis master is using Sentinel

Apparently you just have to subscribe and listen to events from the Sentinels. Makes sense.. I just figured there was a more streamlined way.

I read something about the Twemproxy for Linux which acts as a proxy and probably does this for you? But I was on redis for Windows and was trying to find a Windows option. We might just moved to Linux if that's the approved way it should be done.