How to determine a string is english or arabic?

A minor change to cover all arabic characters and symbols range

private boolean isArabic(String text){
        String textWithoutSpace = text.trim().replaceAll(" ",""); //to ignore whitepace
        for (int i = 0; i < textWithoutSpace.length();) {
            int c = textWithoutSpace.codePointAt(i);
          //range of arabic chars/symbols is from 0x0600 to 0x06ff
            //the arabic letter 'لا' is special case having the range from 0xFE70 to 0xFEFF
            if (c >= 0x0600 && c <=0x06FF || (c >= 0xFE70 && c<=0xFEFF)) 
                i += Character.charCount(c);   
            else                
                return false;

        } 
        return true;
      }

Here is a simple logic that I just tried:

  public static boolean isProbablyArabic(String s) {
    for (int i = 0; i < s.length();) {
        int c = s.codePointAt(i);
        if (c >= 0x0600 && c <= 0x06E0)
            return true;
        i += Character.charCount(c);            
    }
    return false;
  }

It declares the text as arabic if and only if an arabic unicode code point is found in the text. You can enhance this logic to be more suitable for your needs.

The range 0600 - 06E0 is the code point range of Arabic characters and symbols (See Unicode tables)


Java in itself supports various language checks by unicode, Arabic is also supported. Much simpler and smallest way to do the same is by UnicodeBlock

public static boolean textContainsArabic(String text) {
    for (char charac : text.toCharArray()) {
        if (Character.UnicodeBlock.of(charac) == Character.UnicodeBlock.ARABIC) {
            return true;
        }
    }
    return false;
}

Tags:

Java