Exclude certain products from Magento sitemap.xml generation

Since Magento 1.9.0. you can do this without touching any core file.

There are two new events you can observe:

  • sitemap_categories_generating_before
  • sitemap_products_generating_before

To exclude products based on attribute, you can do this:

  1. add an observer to sitemap_products_generating_before

    app\code\community\My\Module\etc\config.xml

        <events>
            <sitemap_products_generating_before>
                <observers>
                    <my_module>
                        <class>my_module/observer</class>
                        <method>excludeProductsFromSitemap</method>
                    </my_module>
                </observers>
            </sitemap_products_generating_before>
        </events>
    

    app\code\community\My\Module\Model\Observer.php

    public function excludeProductsFromSitemap(Varien_Event_Observer $observer)
    {
        $collection = $observer->getCollection();
        $items = $collection->getItems();
    
        $excludeIds = Mage::getModel('catalog/product')
            ->getCollection()
            ->setStoreId($observer->getStoreId()) # requieres Magento 1.9.3.0
            ->addAttributeToFilter('use_in_sitemap', 0)
            ->getAllIds();
    
        foreach ($excludeIds as $id) {
            unset($items[$id]);
        }
    
        $collection->setItems($items);
    }
    
  2. add a product attribute "yes/no" named use_in_sitemap (maybe default value "yes")

  3. add this attribute to all attribute sets
  4. set the products you want to exclude to "no"
  5. generate your sitemap

Note: until Magento 1.9.3.0 the attribute should be set to global scope.


Out of the box, no, there's no way to exclude certain products from the sitemap generated by Magento's Catalog -> Google Sitemap feature.

If I was going to go about doing this programmatically, modern versions of Magento (checked in the 1.7.x branch, this might be around in earlier/EE versions) use the following resource model class

Mage_Sitemap_Model_Resource_Catalog_Product

to fetch a list of products.

#File: app/code/core/Mage/Sitemap/Model/Sitemap.php
$collection = Mage::getResourceModel('sitemap/catalog_product')->getCollection($storeId);

This is not a standard Magento CRUD model, and getCollection does not return a collection object. Instead, getCollection manually queries the database for these products.

If I was going to implement functionality that prevented certain products from showing up in the site map, I'd try to either

  1. A class rewrite the getCollection method which calls the parent::getCollection, and then manually filters out any products from the array

  2. A class rewrite on _addFilter which calls the parent::_addFilter method, and then adds an additional WHERE clause(s) to the _select to exclude the specific product(s). Sort of a hack, but it's the only method where you have access to the _select object used to query the database. Ideally you'd want to have some sort of global/static flag so you only added your new WHERE clause(s) once.


To achieve this you can do the following -:

  1. Create an attribute against product i.e. exclude_from_sitemap (Yes / No)

  2. Overwrite Mage_Sitemap_Model_Resource_Catalog_Product class and modify getCollection function by adding filter to your new attribute i.e exclude from sitemap

If you are not a developer then the following module can help achieving the above but ofcourse it is paid extension

http://www.scommerce-mage.co.uk/magento-extensions/magento-google-site-map-exclusion.html