Magento core_url_rewrite table excessively large

I've managed to stabalize the issue as follows:

Step 1: Rewrite the Catalog URL model (Using your own module: How To)

Note: If you overwrite the core file without using a rewrite this will render your instance of Magento incapable of future upgrades.

As per Jahnni's solution on the MagentoCommerce boards (no longer active with new board), app/code/core/Mage/Catalog/Model/Url.php [ around line 807 Mage_Catalog_Model_Url::getProductRequestPath() ]

From:

if ($product->getUrlKey() == '' && !empty($requestPath)
   && strpos($existingRequestPath, $requestPath) === 0
) 

To:

if (!empty($requestPath)
           && strpos($existingRequestPath, $requestPath) === 0
) 

Step 2: Truncate

Truncate the core_url_rewrite table

Step 3: Reindex & Flush Caches

Initiate the re-indexing process on Core URL Rewrites. Thereafter, you'll want to flush the Magento cache & storage cache.

SystemCache ManagementFlush Magento Cache

SystemCache ManagementFlush Cache Storage

Voila, you're all set. You'll notice if you re-run the indexer, the table should stay constant in size (unless you've added more products inbetween or if you have duplicate category names).


While I hope someone here comes up with an answer, I don't know that you'll find one. This table gets bulky for a lot of different reasons. Bugs in earlier (and possibly current) versions of Magento is one. Another is there's logic in this table that tries to track changes to the URL key value so that 301/302 rewrites are setup for old products. Because of this, and complicating things, truncating the table and regenerating may make existing URL rewrites go away, and this will have an unknown effect on your search engine listing (not necessity bad, just hard to predict).

My general advice to clients who ask is

  1. Leave the giant growing table as is if you don't have a good handle on your URL/SEO situation

  2. Until the table size starts being a problem (generating site maps, for example). When that happens, get a handle on your URL/SEO situation.

  3. Once you have a handle on your URL/SEO situation, backup the table, then truncate the table and regenerate. Address any URL/SEO problems caused by the truncating.

  4. Automate step 3

Trying to fix this on the Magento code level is admirable, but you'll be swimming upstream. Sometimes it's better to accept that "That's just Magento being Magento", and to solve the problem with and external process.


I would like to add a fix for this url rewrite indexer bug which has been developed at the bugathon in March 2013 and which has been further improved afterwards. It should solve this issue. As a reference, here is the patch file from the link:

diff -rupN mage_org/app/code/core/Mage/Catalog/Model/Url.php src_shop/app/code/core/Mage/Catalog/Model/Url.php
--- mage_org/app/code/core/Mage/Catalog/Model/Url.php   2013-11-19 00:48:25.679009391 +0100
+++ src_shop/app/code/core/Mage/Catalog/Model/Url.php   2013-11-19 00:49:24.188005601 +0100
@@ -643,13 +643,24 @@ class Mage_Catalog_Model_Url
                 $this->_rewrite = $rewrite;
                 return $requestPath;
             }
+
+            // avoid unnecessary creation of new url_keys for duplicate url keys
+            $noSuffixPath = substr($requestPath, 0, -(strlen($suffix)));
+            $regEx = '#^('.preg_quote($noSuffixPath).')(-([0-9]+))?('.preg_quote($suffix).')#i';
+            $currentRewrite = $this->getResource()->getRewriteByIdPath($idPath, $storeId);
+            if ($currentRewrite && preg_match($regEx, $currentRewrite->getRequestPath(), $match)) {
+                $this->_rewrite = $currentRewrite;
+                return $currentRewrite->getRequestPath();
+            }
+
             // match request_url abcdef1234(-12)(.html) pattern
             $match = array();
             $regularExpression = '#^([0-9a-z/-]+?)(-([0-9]+))?('.preg_quote($suffix).')?$#i';
             if (!preg_match($regularExpression, $requestPath, $match)) {
                 return $this->getUnusedPath($storeId, '-', $idPath);
             }
-            $match[1] = $match[1] . '-';
+            $match[1] = $noSuffixPath . '-'; // always use full prefix of url_key
+            unset($match[3]); // don't start counting with a possible number in the url_key
             $match[4] = isset($match[4]) ? $match[4] : '';

             $lastRequestPath = $this->getResource()


Additionally, I would like to add the EE patch PATCH_SUPEE-389_EE_1.12.0.2_v2.sh, which is now available on GitHub:

#!/bin/bash
# Patch apllying tool template
# v0.1.2
# (c) Copyright 2013. Magento Inc.
#
# DO NOT CHANGE ANY LINE IN THIS FILE.

# 1. Check required system tools
_check_installed_tools() {
    local missed=""

    until [ -z "$1" ]; do
        type -t $1 >/dev/null 2>/dev/null
        if (( $? != 0 )); then
            missed="$missed $1"
        fi
        shift
    done

    echo $missed
}

REQUIRED_UTILS='sed patch'
MISSED_REQUIRED_TOOLS=`_check_installed_tools $REQUIRED_UTILS`
if (( `echo $MISSED_REQUIRED_TOOLS | wc -w` > 0 ));
then
    echo -e "Error! Some required system tools, that are utilized in this sh script, are not installed:\nTool(s) \"$MISSED_REQUIRED_TOOLS\" is(are) missed, please install it(them)."
    exit 1
fi

# 2. Determine bin path for system tools
CAT_BIN=`which cat`
PATCH_BIN=`which patch`
SED_BIN=`which sed`
PWD_BIN=`which pwd`
BASENAME_BIN=`which basename`

BASE_NAME=`$BASENAME_BIN "$0"`

# 3. Help menu
if [ "$1" = "-?" -o "$1" = "-h" -o "$1" = "--help" ]
then
    $CAT_BIN << EOFH
Usage: sh $BASE_NAME [--help] [-R|--revert] [--list]
Apply embedded patch.

-R, --revert    Revert previously applied embedded patch
--list          Show list of applied patches
--help          Show this help message
EOFH
    exit 0
fi

# 4. Get "revert" flag and "list applied patches" flag
REVERT_FLAG=
SHOW_APPLIED_LIST=0
if [ "$1" = "-R" -o "$1" = "--revert" ]
then
    REVERT_FLAG=-R
fi
if [ "$1" = "--list" ]
then
    SHOW_APPLIED_LIST=1
fi

# 5. File pathes
CURRENT_DIR=`$PWD_BIN`/
APP_ETC_DIR=`echo "$CURRENT_DIR""app/etc/"`
APPLIED_PATCHES_LIST_FILE=`echo "$APP_ETC_DIR""applied.patches.list"`

# 6. Show applied patches list if requested
if [ "$SHOW_APPLIED_LIST" -eq 1 ] ; then
    echo -e "Applied/reverted patches list:"
    if [ -e "$APPLIED_PATCHES_LIST_FILE" ]
    then
        if [ ! -r "$APPLIED_PATCHES_LIST_FILE" ]
        then
            echo "ERROR: \"$APPLIED_PATCHES_LIST_FILE\" must be readable so applied patches list can be shown."
            exit 1
        else
            $SED_BIN -n "/SUP-\|SUPEE-/p" $APPLIED_PATCHES_LIST_FILE
        fi
    else
        echo "<empty>"
    fi
    exit 0
fi

# 7. Check applied patches track file and its directory
_check_files() {
    if [ ! -e "$APP_ETC_DIR" ]
    then
        echo "ERROR: \"$APP_ETC_DIR\" must exist for proper tool work."
        exit 1
    fi

    if [ ! -w "$APP_ETC_DIR" ]
    then
        echo "ERROR: \"$APP_ETC_DIR\" must be writeable for proper tool work."
        exit 1
    fi

    if [ -e "$APPLIED_PATCHES_LIST_FILE" ]
    then
        if [ ! -w "$APPLIED_PATCHES_LIST_FILE" ]
        then
            echo "ERROR: \"$APPLIED_PATCHES_LIST_FILE\" must be writeable for proper tool work."
            exit 1
        fi
    fi
}

_check_files

# 8. Apply/revert patch
# Note: there is no need to check files permissions for files to be patched.
# "patch" tool will not modify any file if there is not enough permissions for all files to be modified.
# Get start points for additional information and patch data
SKIP_LINES=$((`$SED_BIN -n "/^__PATCHFILE_FOLLOWS__$/=" "$CURRENT_DIR""$BASE_NAME"` + 1))
ADDITIONAL_INFO_LINE=$(($SKIP_LINES - 3))p

_apply_revert_patch() {
    DRY_RUN_FLAG=
    if [ "$1" = "dry-run" ]
    then
        DRY_RUN_FLAG=" --dry-run"
        echo "Checking if patch can be applied/reverted successfully..."
    fi
    PATCH_APPLY_REVERT_RESULT=`$SED_BIN -e '1,/^__PATCHFILE_FOLLOWS__$/d' "$CURRENT_DIR""$BASE_NAME" | $PATCH_BIN $DRY_RUN_FLAG $REVERT_FLAG -p0`
    PATCH_APPLY_REVERT_STATUS=$?
    if [ $PATCH_APPLY_REVERT_STATUS -eq 1 ] ; then
        echo -e "ERROR: Patch can't be applied/reverted successfully.\n\n$PATCH_APPLY_REVERT_RESULT"
        exit 1
    fi
    if [ $PATCH_APPLY_REVERT_STATUS -eq 2 ] ; then
        echo -e "ERROR: Patch can't be applied/reverted successfully."
        exit 2
    fi
}

REVERTED_PATCH_MARK=
if [ -n "$REVERT_FLAG" ]
then
    REVERTED_PATCH_MARK=" | REVERTED"
fi

_apply_revert_patch dry-run
_apply_revert_patch

# 9. Track patch applying result
echo "Patch was applied/reverted successfully."
ADDITIONAL_INFO=`$SED_BIN -n ""$ADDITIONAL_INFO_LINE"" "$CURRENT_DIR""$BASE_NAME"`
APPLIED_REVERTED_ON_DATE=`date -u +"%F %T UTC"`
APPLIED_REVERTED_PATCH_INFO=`echo -n "$APPLIED_REVERTED_ON_DATE"" | ""$ADDITIONAL_INFO""$REVERTED_PATCH_MARK"`
echo -e "$APPLIED_REVERTED_PATCH_INFO\n$PATCH_APPLY_REVERT_RESULT\n\n" >> "$APPLIED_PATCHES_LIST_FILE"

exit 0


SUPEE-389 | EE_1.12.0.2 | v1 | 53c8ca52583358953b143aaa1a78cf409e8dd846 | Thu Jun 20 10:36:39 2013 +0300 | v1.12.0.2..HEAD

__PATCHFILE_FOLLOWS__
diff --git app/code/core/Mage/Catalog/Model/Url.php app/code/core/Mage/Catalog/Model/Url.php
index fa55fc5..a755b46 100644
--- app/code/core/Mage/Catalog/Model/Url.php
+++ app/code/core/Mage/Catalog/Model/Url.php
@@ -609,6 +609,23 @@ class Mage_Catalog_Model_Url
      */
     public function getUnusedPath($storeId, $requestPath, $idPath)
     {
+        $urlKey = '';
+        return $this->getUnusedPathByUrlkey($storeId, $requestPath, $idPath, $urlKey);
+    }
+
+    /**
+     * Get requestPath that was not used yet.
+     *
+     * Will try to get unique path by adding -1 -2 etc. between url_key and optional url_suffix
+     *
+     * @param int $storeId
+     * @param string $requestPath
+     * @param string $idPath
+     * @param string $urlKey
+     * @return string
+     */
+    public function getUnusedPathByUrlkey($storeId, $requestPath, $idPath, $urlKey = '')
+    {
         if (strpos($idPath, 'product') !== false) {
             $suffix = $this->getProductUrlSuffix($storeId);
         } else {
@@ -645,21 +662,22 @@ class Mage_Catalog_Model_Url
             }
             // match request_url abcdef1234(-12)(.html) pattern
             $match = array();
-            $regularExpression = '#^([0-9a-z/-]+?)(-([0-9]+))?('.preg_quote($suffix).')?$#i';
+            $regularExpression = '#(?P<prefix>(.*/)?' . preg_quote($urlKey) . ')(-(?P<increment>[0-9]+))?(?P<suffix>'
+                . preg_quote($suffix) . ')?$#i';
             if (!preg_match($regularExpression, $requestPath, $match)) {
-                return $this->getUnusedPath($storeId, '-', $idPath);
+                return $this->getUnusedPathByUrlkey($storeId, '-', $idPath, $urlKey);
             }
-            $match[1] = $match[1] . '-';
-            $match[4] = isset($match[4]) ? $match[4] : '';
+            $match['prefix'] = $match['prefix'] . '-';
+            $match['suffix'] = isset($match['suffix']) ? $match['suffix'] : '';

             $lastRequestPath = $this->getResource()
-                ->getLastUsedRewriteRequestIncrement($match[1], $match[4], $storeId);
+                ->getLastUsedRewriteRequestIncrement($match['prefix'], $match['suffix'], $storeId);
             if ($lastRequestPath) {
-                $match[3] = $lastRequestPath;
+                $match['increment'] = $lastRequestPath;
             }
-            return $match[1]
-                . (isset($match[3]) ? ($match[3]+1) : '1')
-                . $match[4];
+            return $match['prefix']
+                . (isset($match['increment']) ? ($match['increment']+1) : '1')
+                . $match['suffix'];
         }
         else {
             return $requestPath;
@@ -699,7 +717,7 @@ class Mage_Catalog_Model_Url
     {
         $storeId = $category->getStoreId();
         $idPath  = $this->generatePath('id', null, $category);
-        $suffix  = $this->getCategoryUrlSuffix($storeId);
+        $categoryUrlSuffix = $this->getCategoryUrlSuffix($storeId);

         if (isset($this->_rewrites[$idPath])) {
             $this->_rewrite = $this->_rewrites[$idPath];
@@ -713,27 +731,27 @@ class Mage_Catalog_Model_Url
             $urlKey = $this->getCategoryModel()->formatUrlKey($category->getUrlKey());
         }

-        $categoryUrlSuffix = $this->getCategoryUrlSuffix($category->getStoreId());
         if (null === $parentPath) {
             $parentPath = $this->getResource()->getCategoryParentPath($category);
         }
         elseif ($parentPath == '/') {
             $parentPath = '';
         }
-        $parentPath = Mage::helper('catalog/category')->getCategoryUrlPath($parentPath,
-                                                                           true, $category->getStoreId());
+        $parentPath = Mage::helper('catalog/category')->getCategoryUrlPath($parentPath, true, $storeId);

-        $requestPath = $parentPath . $urlKey . $categoryUrlSuffix;
-        if (isset($existingRequestPath) && $existingRequestPath == $requestPath . $suffix) {
+        $requestPath = $parentPath . $urlKey;
+        $regexp = '/^' . preg_quote($requestPath, '/') . '(\-[0-9]+)?' . preg_quote($categoryUrlSuffix, '/') . '$/i';
+        if (isset($existingRequestPath) && preg_match($regexp, $existingRequestPath)) {
             return $existingRequestPath;
         }

-        if ($this->_deleteOldTargetPath($requestPath, $idPath, $storeId)) {
+        $fullPath = $requestPath . $categoryUrlSuffix;
+        if ($this->_deleteOldTargetPath($fullPath, $idPath, $storeId)) {
             return $requestPath;
         }

-        return $this->getUnusedPath($category->getStoreId(), $requestPath,
-                                    $this->generatePath('id', null, $category)
+        return $this->getUnusedPathByUrlkey($storeId, $fullPath,
+            $this->generatePath('id', null, $category), $urlKey
         );
     }

@@ -798,7 +816,8 @@ class Mage_Catalog_Model_Url
             $this->_rewrite = $this->_rewrites[$idPath];
             $existingRequestPath = $this->_rewrites[$idPath]->getRequestPath();

-            if ($existingRequestPath == $requestPath . $suffix) {
+            $regexp = '/^' . preg_quote($requestPath, '/') . '(\-[0-9]+)?' . preg_quote($suffix, '/') . '$/i';
+            if (preg_match($regexp, $existingRequestPath)) {
                 return $existingRequestPath;
             }

@@ -836,7 +855,7 @@ class Mage_Catalog_Model_Url
         /**
          * Use unique path generator
          */
-        return $this->getUnusedPath($storeId, $requestPath.$suffix, $idPath);
+        return $this->getUnusedPathByUrlkey($storeId, $requestPath.$suffix, $idPath, $urlKey);
     }

     /**
@@ -891,8 +910,8 @@ class Mage_Catalog_Model_Url
                 $parentPath = Mage::helper('catalog/category')->getCategoryUrlPath($parentPath,
                     true, $category->getStoreId());

-                return $this->getUnusedPath($category->getStoreId(), $parentPath . $urlKey . $categoryUrlSuffix,
-                    $this->generatePath('id', null, $category)
+                return $this->getUnusedPathByUrlkey($category->getStoreId(), $parentPath . $urlKey . $categoryUrlSuffix,
+                    $this->generatePath('id', null, $category), $urlKey
                 );
             }

@@ -913,14 +932,14 @@ class Mage_Catalog_Model_Url
                 $this->_addCategoryUrlPath($category);
                 $categoryUrl = Mage::helper('catalog/category')->getCategoryUrlPath($category->getUrlPath(),
                     false, $category->getStoreId());
-                return $this->getUnusedPath($category->getStoreId(), $categoryUrl . '/' . $urlKey . $productUrlSuffix,
-                    $this->generatePath('id', $product, $category)
+                return $this->getUnusedPathByUrlkey($category->getStoreId(), $categoryUrl . '/' . $urlKey . $productUrlSuffix,
+                    $this->generatePath('id', $product, $category), $urlKey
                 );
             }

             // for product only
-            return $this->getUnusedPath($category->getStoreId(), $urlKey . $productUrlSuffix,
-                $this->generatePath('id', $product)
+            return $this->getUnusedPathByUrlkey($category->getStoreId(), $urlKey . $productUrlSuffix,
+                $this->generatePath('id', $product), $urlKey
             );
         }

If you want to use this patch with CE, make sure to test it properly, because it has been developed for EE.