Difference between trim{it <= ' '} and trim in kotlin?

According to the docs: https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/trim.html

fun String.trim(): String Returns a string having leading and trailing whitespace removed.

The it <= ' ' would remove all the 'non printable' characters with ascii code less or equal than space (ascii decimal = 32) as carriage return, line feed...

I've just tested with many of this characters:

val kotlin = "\t\t"
println(kotlin)
   
val kotlin2 = "\t\t".trim()
println(kotlin2)
   
val kotlin3 = "\t\t".trim{it <= ' '}
println(kotlin3)

this outputs:

      


They both clean this characters. And as @AlexeyRomanov states kotlin understands as a whitespace character the ones that return true using the isWhitespace method. So the it <= ' ' is to make it only trim the same chars as java does and not the other whitespace characters according to the Unicode standard.

If we test for example the \u00A0 character:

val kotlin4 = "\u00A0\u00A0".trim()
println(kotlin4)
   
val kotlin5 = "\u00A0\u00A0".trim{it <= ' '}
println(kotlin5)

we can see the difference in output:


  

You can test it in the kotlin playground.


Java's trim documentation says

Otherwise, if there is no character with a code greater than '\u0020' in the string, then a String object representing an empty string is returned.

Otherwise, let k be the index of the first character in the string whose code is greater than '\u0020', and let m be the index of the last character in the string whose code is greater than '\u0020'. A String object is returned, representing the substring of this string that begins with the character at index k and ends with the character at index m-that is, the result of this.substring(k, m + 1).

So the condition is exactly { it <= ' ' } (where it is a character in the string).

Kotlin instead uses

public fun CharSequence.trim(): CharSequence = trim(Char::isWhitespace)

which is true e.g. for non-breaking space \u00A0, Ogham space mark \u1680, etc. and false for some characters below ' ' (e.g. \u0001).