Java Scanner Class bad character "®"

By default Scanner uses the platform default character encoding, this might not match the character encoding of the file. JavaDoc states:

Constructs a new Scanner that produces values scanned from the specified file. Bytes from the file are converted into characters using the underlying platform's default charset.

First determine what character encoding your file is in, this can be done with the Linux command line utility file -i. Pass the correct encoding into the scanner. Java 7 contains predefined constants in java.nio.charset.StandardCharsets for some well known character sets.

Scanner file = new Scanner(new File(fileName), StandardCharsets.UTF_8);

Mention the encoding when you create the scanner.

Scanner file= new Scanner(new File(fileName), "utf-8");

Tags:

Java

Unicode