How to do case insensitive sorting of Norwegian characters (Æ, Ø, and Å) using Hibernate Lucene Search?

I must admit it's not something common. As far as I can see, there is a Lucene module which uses ICU for locale dependent sorting.

See the lucene-icu artifact and especially the ICUCollationKeyFilter and ICUCollationKeyAnalyzer (the analyzer is a KeywordTokenizer with the filter). You will need to create the factory necessary to use it with Hibernate Search but it should be quite easy.

Can't really promise it will work but it's probably your best bet.


You can use org.apache.lucene.collation.CollationKeyFilter class in Hibernate Search version 4.3.0.Final. Create your own collation filter factory:

import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.collation.CollationKeyFilter;
import org.apache.solr.analysis.BaseTokenFilterFactory;

import java.text.Collator;
import java.util.Locale;

public final class NorwegianCollationFactory extends BaseTokenFilterFactory {

    @Override
    public TokenStream create(TokenStream input) {
        Collator norwegianCollator = Collator.getInstance(new Locale("no", "NO"));
        return new CollationKeyFilter(input, norwegianCollator);
    }

}

And the use this collation factory in your AnalyzerDef:

@AnalyzerDef(name = "myOwnAnalyzer",
tokenizer = @TokenizerDef(factory = KeywordTokenizerFactory.class),
filters = {
    @TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
    @TokenFilterDef(factory = LowerCaseFilterFactory.class),
    @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
        @Parameter(name = "pattern", value = "('-&\\.,\\(\\))"),
        @Parameter(name = "replacement", value = " "),
        @Parameter(name = "replace", value = "all")
    }),
    @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
        @Parameter(name = "pattern", value = "([^0-9\\p{L} ])"),
        @Parameter(name = "replacement", value = ""),
        @Parameter(name = "replace", value = "all")
    }),
    @TokenFilterDef(factory = TrimFilterFactory.class)
,
    @TokenFilterDef(factory = NorwegianCollationFactory .class)
}
)
public class KikaPaya implements Serializable {

More information about using this Collation filter with hibernate search version 5 - https://stackoverflow.com/a/60738067/7179509