Wordpress - Should I use Transient API to store HTML String, or Object?

Should I use Transient API at all here?

No.

In a stock WordPress install transients are stored in the wp_options table, and only cleaned up during core upgrades. Suppose you have 50,000 posts, that's 50,000 additional rows in the options table. Obviously they're set to autoload=no, so it's not going to consume all your memory, but there's another caveat.

The autoload field in the options table does not have an index, which means that the call to wp_load_alloptions() is going to perform a full table scan. The more rows you have, the longer it will take. The more often you write to the options table, the less efficient MySQL's internal caches are.

If the cached data is directly related to a post, you're better off storing it in post meta. This will also save you a query every time you need to display the cached content, because post meta caches are (usually) primed during the post retrieval in WP_Query.

Your data structure for the meta value can vary, you can have a timestamp and perform your expensive query if the cached value is outdated, much like a transient would behave.

One other important think to keep in mind is that WordPress transients can be volatile in environments with persistent object caching. This means that if you store your cached data for 24 hours in a transient, there's absolutely no guarantee it will be available in 23 hours, or 12, or even 5 minutes. The object cache backend for many installs is an in-memory key-value store such as Redis or Memcached, and if there's not enough allocated memory to fit newer objects, older items will be evicted. This is a huge win for the meta storage approach.

Invalidation can be smarter too, i.e. why are you invalidating related posts caches in X hours? Is it because some content has changed? A new post has been added? A new tag has been assigned? Depending on your "complex and large query" you may choose to invalidate ONLY if something happened that is going to alter the results of your query.

Should I use Transient API to cache $related_posts array, or $html_output string? If I'll cache $html_ouput string, will it reach some max-size limit? Should I maybe gzip it, before saving?

It depends a lot on the size of your string, since that's the data that's going to be flowing between PHP, MySQL, etc. You'll need to try very hard to reach MySQL's limits, but for example Memcached default per-object limit is only 1 mb.

How long does your "complex layout rendering logic" actually take? Run it through a profiler to find out. Chances are that it's very fast will never become a bottleneck.

If that's the case, I would suggest caching the post IDs. Not the WP_Post objects, because those will contain the full post contents, but just an array of post IDs. Then just use a WP_Query with a post__in which will result in a very fast MySQL query by primary key.

That said, if the data needed per item is fairly simple, perhaps title, thumbnail url and permalink, then you can store just those three, without the overhead of an extra round-trip to MySQL, and without the overhead of caching very long HTML strings.

Wow that's a lot of words, hope that helps.


Not All WP Code Is Public Code

If you are going to release something public, then all the things kovshenin said are perfectly valid.

Things are different if you are going to write private code for yourself or your company.

External Object Cache Is A Big Benefit, In Any Case

To set a external persistent object cache is very recommended, when you can.

All the things said in the kovshenin's answer about transients and MySQL are very true, and considering that WP itself and a bunch of plugins make use of object cache... then the performance improvement you got, absolutely worth the (small) effort to setup a modern cache system like Redis or Memcached.

Cached Values May Not Be There: That's Fine

Moreover, yes, an external object cache is not reliable. You should never rely on the fact that a transient is there. You need to make sure it works if cached are not where they should be.

Cache is notstorage, cache is cache.

Use Cache Selectively

See this example:

function my_get_some_value($key) {
   // by default no cache when debug and if no external object_cache
   $defUse = ! (defined('WP_DEBUG') && WP_DEBUG) && wp_using_ext_object_cache();
   // make the usage of cache filterable
   $useCache = apply_filters('my_use_cache', $defUse);
   // return cached value if any
   if ($useCache && ($cached = get_transient($key))) {
     return $cached;
   }
   // no cached value, make sure your code works with no cache
   $value = my_get_some_value_in_some_expensive_way();
   // set cache, if allowed
   $useCache and set_transient($key, $value, HOUR_IN_SECONDS);

   return $value;
}

Using a code like this, in your private site, site performance can improve a lot, especially if you have a lot of users.

Note that:

  • By default the cache is not used when debug is on, so hopefully on your development environment. Believe me, cache can make debug an hell
  • By default the cache is also not used when WP is not set to use an external object cache. It means that all the problem connected with MySQL does not exist, because you use no transient when they use MySQL. A probably easier alternative would be to use wp_cache_* functions, so if no external cache is setup, then the cache happen in memory, and database is never involved.
  • The usage of cache is filterable, to handle some edge cases you may encounter

No Webscale If No Cache

You should not try to solve speed issues with cache. If you have speed issues, then you should re-think you code.

But to scale a website at webscale, cache is pretty required.

And a lot of times (but not always) fragment, context-aware cache is much more flexible and suitable than aggressive fullpage caching.

Your Questions:

Should I use Transient API at all here?

It depends.

Is your code consuming a lot of resources? If not, maybe there's no need of cache. As said, is not just a matter of speed. If your code run fast but it requires a bunch of CPU and memory for a couple users... what happen when you have 100 or 1000 concurrent users?

If you realize cache would be a good idea..

...and is public code: probably no. You can consider to cache selectively, like in my example above in public code, but usually is better if you leave such decisions to implementers.

...and is private code: very probably yes. But even for private code, to cache selectively is still a good thing, for example for debug.

Remember, anyway, that wp_cache_* functions can give you access to cache without the risk of polluting database.

Should I use Transient API to cache $related_posts array, or $html_output string?

It depends on a lot of things. How big are the string? Which external cache are you using? If you are going to cache posts, storing ID as array can be a good idea, querying a decent number of posts by their ID is quite fast.

Final Notes

Transient API is probably one of the best things of WordPress. Thanks to the plugins you can find for any kind of cache systems, it becomes a stupid simple API to a great number of software that can work under the hood.

Outside WordPress, such abstraction that works out of the box with a bunch of different caching system, and allow you to switch from one system to another with no effort is very hard to find.

You rarely can hear me saying that WordPress is better than other modern things, but the transient API is one of the few things I miss when I don't work with WordPress.

Surely cache is hard, does not solve code issues and is not a silver bullet, but it something you need to build an high-traffic site that works.

The WordPress idea to use a under-optimized MySQL table to do cache is quite insane, but is not better to keep yourself away from cache just because WordPress, by default, do it.

You just need to understand how things works, then make your choice.


The previous answers have already highlighted the obligatory "It depends.", to which I fully agree.

I would like to add a recommendation though, based on how I "suppose" this would be best done in the scenario you are describing above.

I would not use Transients in that case, but rather Post Meta, because of one advantage that the latter has: Control.

As you need to cache data on a per-post basis, the amount of data you will cache depends on the number of posts, and will grow over time. Once you surpass a certain number of posts, you might hit the limits of memory your object cache is allowed to use, and it will start to erase previously cached data from memory before it has expired. This could lead to a situation, where you have a large influx of visitors, where each visitor will trigger the "overly complex SQL" upon each page request, and your site will get completely bogged down.

If you cache the data in your Post Meta, you can not only control how it is stored and retrieved, but you can also exactly control how it is updated. You would add a cron job for this that runs only at time periods where there is less to no traffic to the site. So, the "slow query" is never encountered by real users of the site, and you can even pre-load it, so that the work is already done when the first visitor hits.

Keep in mind that all caching is a trade-off! That's why the usual answer is "It depends." and why there is no "holy caching grail".