How to Parse Date Strings with Japanese Numbers in Java DateTime API

For anyone reading along, your example date string holds an era designator, year of era of 23 (in this case correspinding to 1890 CE Gregorian), month 11 and day of month 29. Months and days are the same as in the Gregorian calendar.

Since Japanese numbers are not entirely positional (like Arabic numbers, for example), a DateTimeFormatter doesn’t parse them on its own. So we help it by supplying how the numbers look in Japanese (and Chinese). DateTimeFormatterBuilder has an overloaded appendText method that accepts a map holding all the possible numbers as text. My code example is not complete, but should get you started.

    Locale japaneseJapan = Locale.forLanguageTag("ja-JP");

    Map<Long, String> numbers = Map.ofEntries(
            Map.entry(1L, "\u4e00"),
            Map.entry(2L, "\u4e8c"),
            Map.entry(3L, "\u4e09"),
            Map.entry(4L, "\u56db"),
            Map.entry(5L, "\u4e94"),
            Map.entry(6L, "\u516d"),
            Map.entry(7L, "\u4e03"),
            Map.entry(8L, "\u516b"),
            Map.entry(9L, "\u4e5d"),
            Map.entry(10L, "\u5341"),
            Map.entry(11L, "\u5341\u4e00"),
            Map.entry(12L, "\u5341\u4e8c"),
            Map.entry(13L, "\u5341\u4e09"),
            Map.entry(14L, "\u5341\u56db"),
            Map.entry(15L, "\u5341\u4e94"),
            Map.entry(16L, "\u5341\u516d"),
            Map.entry(17L, "\u5341\u4e03"),
            Map.entry(18L, "\u5341\u516b"),
            Map.entry(19L, "\u5341\u4e5d"),
            Map.entry(20L, "\u4e8c\u5341"),
            Map.entry(21L, "\u4e8c\u5341\u4e00"),
            Map.entry(22L, "\u4e8c\u5341\u4e8c"),
            Map.entry(23L, "\u4e8c\u5341\u4e09"),
            Map.entry(24L, "\u4e8c\u5341\u56db"),
            Map.entry(25L, "\u4e8c\u5341\u4e94"),
            Map.entry(26L, "\u4e8c\u5341\u516d"),
            Map.entry(27L, "\u4e8c\u5341\u4e03"),
            Map.entry(28L, "\u4e8c\u5341\u516b"),
            Map.entry(29L, "\u4e8c\u5341\u4e5d"),
            Map.entry(30L, "\u4e09\u4e8c\u5341"));

    DateTimeFormatter japaneseformatter = new DateTimeFormatterBuilder()
            .appendPattern("GGGG")
            .appendText(ChronoField.YEAR_OF_ERA, numbers)
            .appendLiteral('\u5e74')
            .appendText(ChronoField.MONTH_OF_YEAR, numbers)
            .appendLiteral('\u6708')
            .appendText(ChronoField.DAY_OF_MONTH, numbers)
            .appendLiteral('\u65e5')
            .toFormatter(japaneseJapan)
            .withChronology(JapaneseChronology.INSTANCE);

    String dateString = "明治二十三年十一月二十九日";
    System.out.println(dateString + " is parsed into " + LocalDate.parse(dateString, japaneseformatter));

The output from this example is:

明治二十三年十一月二十九日 is parsed into 1890-11-29

Assuming that an era can be longer than 30 years, you need to supply yet more numbers to the map. You can do that a lot better than I can (and can also check my numbers for bugs). It’s probably best (less error-prone) to use a couple of nested loops for filling the map, but I wasn’t sure I could do it correctly, so I am leaving that part to you.

Today I learned something about Japanese numerals.

Some links I used

  • Japanese numerals
  • Unicode characters for Chinese and Japanese numbers

Late answer, but the accepted answer is somehow lengthy and not so easy to complete so I think my proposal is a good and powerful alternative.

Use my lib Time4J which supports Japanese numerals out of the box and then use the embedded Japanese calendar:

String input = "明治二十三年十一月二十九日";
ChronoFormatter<JapaneseCalendar> f =
    ChronoFormatter.ofPattern(
        "GGGGy年M月d日",
        PatternType.CLDR,
        Locale.JAPANESE,
        JapaneseCalendar.axis()
    ).with(Attributes.NUMBER_SYSTEM, NumberSystem.JAPANESE);
JapaneseCalendar jcal = f.parse(input);
LocalDate gregorian = jcal.transform(PlainDate.axis()).toTemporalAccessor();
System.out.println(gregorian); // 1890-11-29

This solution is not just shorter but even works for historic Japanese dates before Meiji 6 (based on the old lunisolar calendar in those ancient times). Furthermore, the gannen-notation for the first year of an era (actually we have such a year) is much better supported than in standard java (where you have to apply again a lengthy workaround using a customized map).