What other issues should cache design consider when facing massive requests?

Time：2026-01-28 10:44:44 Total views：1

Since the first caching framework Memcached was born, caching has been widely used in Internet applications. If your application traffic is small, using caching may not require extra consideration. But if your application traffic reaches millions, then you have to consider deeper caching issues:Cache penetration, cache breakdown, and cache avalanche.

cache penetration

Cache penetration refers to querying data that definitely does not exist, because this data will never be cached, so every request will go to the database.

For example, if we request user data with UserID -1, because the user does not exist, the request will read the database every time. In this case, if some malicious individuals exploit this existing vulnerability to forge a large number of requests, it is likely to cause the DB to fail due to the inability to handle such a large amount of traffic.

There are several solutions for cache penetration, one is to prevent it beforehand, and the other is to prevent it afterwards.

Preventive measures in advance.In fact, it is to perform parameter validation on all requests and block the vast majority of illegal requests at the outermost layer. In the example we provided, we perform parameter validation and reject all requests with UserID less than 0. But even if we do a comprehensive parameter verification, there may still be fish that slip through the net, and some unexpected situations may occur.

For example, if our UserID is increasing, then if someone requests user information with a very large UserID (e.g. 1000000), and our UserID is at most 10000. At this point, you cannot limit UserIDs greater than 10000 to be illegal, or in other words, IDs greater than 100000 to be illegal, so the user ID can definitely pass parameter verification. But the user does not exist, so every request will go to the database.

Actually, the above is just one situation that I can think of, there are definitely many situations that we haven't thought of. For these situations, what we can do is to take preventive measures.

Post prevention measures.Post hoc prevention means that when an empty result is queried, we still cache it, but set a very short expiration time (such as one minute). Here we can see that we have not completely prevented illegal requests, but rather shifted the risk of illegal requests to Redis, which has a stronger tolerance, and made databases with slightly weaker tolerance more secure.

Through the above two processing methods, we can basically solve the problem of cache penetration. Prevent and resolve 80% of illegal requests in advance, and use Redis to transfer risks for the remaining 20% of illegal requests.

cache breakdown

If your application has some highly visited hot data, we usually put it in the cache to improve access speed. In addition, to maintain timeliness, we usually set an expiration time. But for these high traffic keys, we need to consider a question:Will massive requests generate a large number of database requests when the hotspot KEY fails, leading to database crashes?

For example, we have a business KEY with a concurrent request volume of 10000. When the KEY fails, 10000 threads will request database update cache. If appropriate measures are not taken at this time, the database is likely to crash.

In fact, the above issue is the problem of cache breakdown, which occurs at the moment when the cache key expires. There are two commonly used solutions for this situation:Mutex lock, never expires.

mutex lock

Mutex lock refers to the process where the program first retrieves the lock when the cache key expires and needs to be updated. Only the thread that retrieves the lock is qualified to update the cache key. Other threads that have not obtained the lock will sleep for a moment and then retrieve the latest cached data again. In this way, only one thread will always read the database at a time, thus avoiding the impact of massive database requests on the database.

For the lock mentioned above, we can use some principle operations provided by cache to complete it. For example, for Redis caching, we can use its SETNX command to accomplish it.

publicStringget(key) { String value = redis.get(key);if(value ==null) {//Cache expiration
        if(redis.setnx(key_mutex,one,one*sixty) ==one) { value = db.get(key);  redis.set(key, value, expireTime);  redis.del(key_mutex);  }else{//Sleep for a moment and retrysleep(fifty);get(key);  } }else{returnvalue;  } }

The key_mutex above is actually a regular KEY-VALUE value, and we use the setnx command to set its value to 1. If someone is already updating the cache key at this point, the setnx command will return 0, indicating a failed setup.

never expires

From the perspective of caching, if you set it to never expire, there will be no situation of massive requests to the database. At this point, we usually do it by starting a new threadRegularly update the data in the database to the cacheA more mature way is throughSchedule tasks to synchronize cache and database data.

But this approach may encounter data latency issues, where the data read by the thread is not the latest data. But for general Internet functions, a little delay is acceptable.

Cache avalanche

Cache avalanche refers to the situation where we use the same expiration time when setting up the cache, causing the cache to fail at a certain moment and all requests to be forwarded to the database, ultimately resulting in the database collapsing due to excessive instantaneous pressure.

For example, we have 1000 keys, and the concurrent requests for each key are not large, only 10 times. And cache avalanche refers to the sudden occurrence of 1000 * * 10=10000 queries when all 1000 keys fail at the same time.

The problems caused by cache avalanche are generally difficult to troubleshoot, and if not prevented in advance, it may take a lot of effort to find the cause. For the case of cache avalanche,The simplest solution is to add a random time (e.g. 1-5 minutes) on top of the original failure timeIn this way, the repetition rate of each cache expiration time will be reduced, thereby reducing the occurrence of cache avalanches.