Ignore case with difflib.get_close_matches()

After a lot of searching around I am sadly surprised to see no simple pre-canned answer to this obvious use case.

The only alternative seems to be "FuzzyWuzzy" library. Yet it relies on Levenshtein Distance just as Python's difflib, and its API is not production quality. Its more obscure methods are indeed case-insensitive, but it provides no direct or simple replacement for get_close_matches.

So here is the simplest implementation I can think of:

import difflib

def get_close_matches_icase(word, possibilities, *args, **kwargs):
    """ Case-insensitive version of difflib.get_close_matches """
    lword = word.lower()
    lpos = {p.lower(): p for p in possibilities}
    lmatches = difflib.get_close_matches(lword, lpos.keys(), *args, **kwargs)
    return [lpos[m] for m in lmatches]

I don't see any quick way to make difflib do case-insensitive comparison.

The quick-and-dirty solution seems to be

  • make a function that converts the string to some canonical form (for example: upper case, single spaced, no punctuation)

  • use that function to make a dict of {canonical string: original string} and a list of [canonical string]

  • run .get_close_matches against the canonical-string list, then plug the results through the dict to get the original strings back


@gatopeich had the right idea, but the problem is that there may be many strings which differ only in capitalization. We surely want them all in our results, not just one of them!

The following adaption manages to do this:

def get_close_matches_icase(word, possibilities, *args, **kwargs):
    """ Case-insensitive version of difflib.get_close_matches """
    lword = word.lower()
    lpos = {}
    for p in possibilities:
        if p.lower() not in lpos:
            lpos[p.lower()] = [p]
        else:
            lpos[p.lower()].append(p)
    lmatches = difflib.get_close_matches(lword, lpos.keys(), *args, **kwargs)
    ret = [lpos[m] for m in lmatches]
    ret = itertools.chain.from_iterable(ret)
    return set(ret)

Tags:

Python

Difflib