Wordpress - How to Optimize WP site for millions of posts

1. Set the query before WP_Query is run

This seems to be the most important thing to keep in mind when trying to keep database queries to a minimum since the only opportunity to alter the query is, of course, before it is run in the SQL database.

Normal Queries
For a normal query, WordPress uses the wp() function, which in turn calls $wp->main( $query_vars ). The "is_ variables" from conditional tags are set before passing them to WP_Query->get_posts(), which converts it into a MySQL database query and finally stores them in the $wp_query object. It is possible to filter the query before it is actually run in the SQL database.

The pre_get_posts action hooks into this process, allowing you to change the query before it is passed to WP_Query->get_posts().

For example, if you want to filter the query for posts in the "featured" category, you would use add_action( 'pre_get_posts', 'your_function_name' ); and include the in_category conditional tag within your_function_name.

function your_function_name( $query ) {
    if ( $query->in_category( 'featured' ) && $query->is_main_query() ) {
        // Replace 123 with the category ID of the featured category.
        $query->set( 'cat', '123' );
    }
}
add_action( 'pre_get_posts', 'your_function_name' );

See Plugin API/Action Reference/pre get posts « WordPress Codex

Page Requests
As for page templates, such as the archive page for the "featured" category, conditional tags won't work from the pre_get_posts filter. For example, you can't use is_category to check for the archive page because WP_Query has not run.

Instead, you would have to alter the main query for page requests with a new WP_Query which would look something like $query = new WP_Query( 'cat=123' );. This runs the query with the appropriate argument set from the start.

See Class Reference/WP Query « WordPress Codex

2. Saving to the database

You can use the filter wp_insert_post_data ensuring that only the $data relevant to your custom post type are returned to wp_insert_post. Be sure to include a conditional statement to check for your custom post type.
Plugin API/Filter Reference/wp insert post data « WordPress Codex

This hook is called by the wp_insert_post function, which is called by wp_update_post when you update your custom post type, usually by saving a draft or publishing the post.

You'll have to benchmark it yourself though as I can't personally speak as to the optimization significance of reducing the data that is updated in the database.

3. Do custom post types affect performance?

In my experience, custom post types are a powerful tool for managing content. I don't know of any other way to manage posts in all the ways that it allows in a manner that would use less resources. I would personally focus on finding ways to reduce the number of queries made wherever possible.

There used to be a performance issue related to permalink structure causing it to take a hit when it begins with text instead of a number.3 This was particularly troublesome for sites hosting a large number of pages, but has been solved since WordPress version 3.3.

I'm only bringing up permalinks here because the slug is usually the first part of the permalink structure which may or may not have affected performance prior to version 3.3. Aside from that I'm not aware of any performance issues that arise from the use of custom post types.

Other Performance Options

Transients
This isn't a replacement for keeping queries to a minimum in your code, but you can use set_transient to store the queries for some time so that new queries are not necessary. Here is the example used in Dave Clements' post. Also, note that he recommends adding a save_post action to delete the transient any time a given post type is updated.

<?php // IN THE SPOTLIGHT QUERY
if( false === ( $its_query = get_transient( 'its_query' ) ) ) {
    $pttimestamp = time() + get_option('gmt_offset') * 60*60;
    $its_query = new WP_Query( array(
        'post_type' => 'spotlight',
        'posts_per_page' => 1,
            'post__not_in' => $do_not_duplicate,
        'meta_query' => array(
            array(
                'key' => '_hpc_spotlight_end_time',
                'value' => $pttimestamp,
                'compare' => '>'
            )
        )
    ) );
    set_transient( 'its_query', $its_query, 60*60*4 );
}
if( have_posts() ) { // HIDE SECTION IF NO CURRENT ITS FEATURE ?>
    // LOOP GOES HERE: NOT IMPORTANT TO EXAMPLE
<?php } ?>

More query optimization
Thomas Griffin has a few good tips in his Optimize WordPress Queries tutorial. Here's a brief list of his suggestions:

  • Set 'cache_results' => false in one-off queries if your server is not using persistent caching such as Memcached. One-off queries are described as "queries that are used to show small amounts of data. It could be that you just want to display linked post titles related to the current post, or you may want to display a dropdown of posts to select for a particular option setting."

    His example: $query = get_posts( array( 'posts_per_page' => 1, 'cache_results' => false ) );

  • Set 'no_found_rows' => true where pagination is not needed. This will "bypass MySQL counting the results to see if we need pagination or not."

    His example: $query = new WP_Query( array( 'posts_per_page' => 1, 'no_found_rows' => true ) );

  • Query for post IDs only if this is all you need 'fields' => 'ids' in get_posts. This should significantly reduce the amount of data being returned, which is quite a lot per post if you look at Database Description « WordPress Codex

    His example: $query = get_posts( array( 'posts_per_page' => 1, 'fields' => 'ids' ) );

In addition to that last tip, the same reasoning can be applied when you only need one or a few post fields by using get_post_field.

Having a solid understanding of how the query works is essential. The more specific you can be with your queries, the less work you will be demanding from your SQL database. This means there are a vast number of possibilities for managing database queries. Be careful with custom queries as far as where they run (is it an admin page?), use proper sanitization in direct queries and try to use native WordPress functions where it allows you to achieve the same performance.


I also add:

    'no_found_rows'          => true,
    'update_post_term_cache' => false,
    'update_post_meta_cache' => false,
    'cache_results'          => false
  • no_found_rows (boolean) – make it true when you don’t need any pagination and don’t need the count for total number of posts found.
  • cache_results (boolean) – Post information cache.
  • update_post_meta_cache (boolean) – Post meta information cache.
  • update_post_term_cache (boolean) – Post term information cache.

By using these parameters and passing values as FALSE, we can make the query faster by stopping some extra database queries being executed.

Note: We should not use these parameters always as adding things to the cache is the right thing to do, however these may be useful in specific circumstances and you should consider using when you know what you are doing.

Please visit: https://drujoopress.wordpress.com/2013/06/27/how-to-optimize-wordpress-query-to-get-results-faster/#more-184