ValiDate ISO 8601 by RX

PCRE: 603 940 947 949 956 bytes

^\s*[+-]?(\d{4,10}-((00[1-9]|0[1-9]\d|[12]\d\d|3[0-5]\d|36[0-5])|(0[1-9]|1[0-2])-(0[1-9]|1\d|2[0-8])|(0[13-9]|1[0-2])-(29|30)|(0[13578]|1[02])-31|W(0[1-9]|[1-4]\d|5[0-2])-[1-7]))|((\d{2,8}([13579][26]|[2468][048]|0[48])|(\d{0,6}([13579][26]|[02468][048])00))-(366|02-29))|(\+?\d{0,6}(([02468][048]|[13579][26])([26]0|71|[38]2|[49]3|[05]4|15|[27]6|37|[48]8|[09]9)|([02468][159]|[13579][37])(50|[16]1|[27]2|33|[48]4|[09]5|[15]6|67|[27]8|[38]9)|([02468][26]|[13579][048])([48]0|[09]1|[15]2|63|[27]4|[38]5|[49]6|[05]7|[16]8|29)|([02468][37]|[13579][159])([27]0|[38]1|[49]2|[05]3|[16]4|25|[37]6|87|[049]8|[5]9))|-\d{0,6}(([02468][048]|[13579][26])(0[28]|1[39]|24|3[06]|4[17]|5[28]|6[49]|75|8[06]|9[27])|([02468][159]|[13579][37])(0[49]|15|2[06]|3[27]|4[38]|54|6[05]|7[16]|8[28]|9[39])|([02468][26]|[13579][048])(0[51]|16|2[28]|3[39]|44|5[06]|6[17]|7[28]|8[49]|95)|([02468][37]|[13579][159])(0[17]|1[28]|2[49]|35|4[06]|5[27]|6[38]|74|8[05]|9[16])))-W53-[1-7]\s*$

Note: Some pairs of parentheses could possibly be dropped.

  • Test valid dates and formats
  • Test invalid dates
  • Test invalid formats

Divisibility by 4

The multiples of 4 repeat in a simple pattern:

  • 00, 04, 08, 12, 16,
    20, 24, 28, 32, 36,
    40, 44, 48, 52, 56,
    60, 64, 68, 72, 76,
    80, 84, 88, 92, 96, …

This, or the inverse, could be matched by a likewise simple regular expression for all two-digit numbers with leading zero:

(?<divisible-by-four>[13579][26]|[02468][048])
(?<not-divisible-by-four>[13579][048]|[02468][26]|\d[13579])

It could save some bytes if there were character classes for odd and even digits (like \o and \e), but there are not as far as I’m aware.

Years

That expression would suffice for the Julian calendar, but the Gregorian leap year detection needs to special-case trailing 00 with century divisibility by 4:

(?<leap-year>[+-]?(\d{2,8}([13579][26]|[2468][048]|0[48])|(\d{0,6}([13579][26]|[02468][048])00))
(?<year>[+-]?\d{4,10})

This would need some changes to outlaw -0000-… (along with -00000-… etc.) or to enforce the plus sign for positive year numbers with more than 4 digits. The latter would be rather simple, but is not required:

(?<leap-year>([+-]?(\d\d([13579][26]|[2468][048]|0[48])|(([13579][26]|[02468][048])00)))|([+-](\d{3,8}([13579][26]|[2468][048]|0[48])|(\d{1,6}([13579][26]|[02468][048])00))))
(?<year>([+-]?\d{4})|([+-]\d{5,10}))

Day of year

Three-digit ordinal dates are rather simple, we just have to restrict -366 to leap years (and disallow -000).

(?<ordinal-day>-(00[1-9]|0[1-9]\d|[12]\d\d|3[0-5]\d|36[0-5]))
(?<ordinal-leap-day>-366)

Day of month of year

The seven months with 31 days are 01 January, 03 March, 05 May, 07 July, 08 August, 10 October and 12 December. Just four month have exactly 30 days, 04 April, 06 June, 09 September and 11 November. Finally, 02 February has 28 days in common years and 29 in leap years. We can first construct a regular expression for the always valid days 01 through 28 and then add special cases.

(?<month-day>-(0[1-9]|1[0-2])-(0[1-9]|1\d|2[0-8]))
(?<short-month-day>-(0[13-9]|1[0-2])-(29|30))
(?<long-month-day>-(0[13578]|1[02])-31)
(?<month-leap-day>-02-29)

Neither month nor day must be 00 which was not covered by an earlier version.

Day of week of year

All years include 52 weeks

(?<week-day>-W(0[1-9]|[1-4]\d|5[0-2])-[1-7])

Long years that include -W53 repeat in a 400-year cycle, e.g. add 2000 for the current cycle and find the current year in the third entry:

  • 004, 009, 015, 020, 026, 032, 037, 043, 048, 054, 060, 065, 071, 076, 082, 088, 093, 099,
  • 105, 111, 116, 122, 128, 133, 139, 144, 150, 156, 161, 167, 172, 178, 184, 189, 195,
  • 201, 207, 212, 218, 224, 229, 235, 240, 246, 252, 257, 263, 268, 274, 280, 285, 291, 296,
  • 303, 308, 314, 320, 325, 331, 336, 342, 348, 353, 359, 364, 370, 376, 381, 387, 392, 398.

Each of the four centuries has a unique pattern. There is probably not much room for optimization.

  1. 04|09|15|20|26|32|37|43|48|54|60|65|71|76|82|88|93|99
  2. 05|11|16|22|28|33|39|44|50|56|61|67|72|78|84|89|95
  3. 01|07|12|18|24|29|35|40|46|52|57|63|68|74|80|85|91|96
  4. 03|08|14|20|25|31|36|42|48|53|59|64|70|76|81|87|92|98

We can group by either digit to find out that we can save two bytes or so:

  • Grouped by 1st digit.
    1. 0[49]|15|2[06]|3[27]|4[38]|54|6[05]|7[16]|8[28]|9[39]
    2. 05|1[16]|2[28]|3[39]|44|5[06]|6[17]|7[28]|8[49]|95
    3. 0[17]|1[28]|2[49]|35|4[06]|5[27]|6[38]|74|8[05]|9[16]
    4. 0[38]|14|2[05]|3[16]|4[28]|5[39]|64|7[06]|8[17]|9[28]
  • Grouped by 2nd digit.
    1. [26]0|71|[38]2|[49]3|[05]4|15|[27]6|37|[48]8|[09]9
    2. 50|[16]1|[27]2|33|[48]4|[09]5|[15]6|67|[27]8|[38]9
    3. [48]0|[09]1|[15]2|63|[27]4|[38]5|[49]6|[05]7|[16]8|29
    4. [27]0|[38]1|[49]2|[05]3|[16]4|25|[37]6|87|[049]8|[5]9

The century number is easily matched again by a variation of the divisibility expression.

  • 1st century: [02468][048]|[13579][26]
  • 2nd century: [02468][159]|[13579][37]
  • 3rd century: [02468][26]|[13579][048]
  • 4th century: [02468][37]|[13579][159]

So far, this does only work for positive years, including year zero. For negative years, we have to subtract the values from the list above from 400 and do the rest again, because the pattern is not symmetric.

  1. 02|08|13|19|24|30|36|41|47|52|58|64|69|75|80|86|92|97
  2. 04|09|15|20|26|32|37|43|48|54|60|65|71|76|82|88|93|99
  3. 05|11|16|22|28|33|39|44|50|56|61|67|72|78|84|89|95
  4. 01|07|12|18|24|29|35|40|46|52|57|63|68|74|80|85|91|96

or

  1. 0[28]|1[39]|24|3[06]|4[17]|5[28]|6[49]|75|8[06]|9[27]
  2. 0[49]|15|2[06]|3[27]|4[38]|54|6[05]|7[16]|8[28]|9[39]
  3. 0[51]|16|2[28]|3[39]|44|5[06]|6[17]|7[28]|8[49]|95
  4. 0[17]|1[28]|2[49]|35|4[06]|5[27]|6[38]|74|8[05]|9[16]

Putting it all together

Any year

[+-]?\d{4,10}-((00[1-9]|0[1-9]\d|[12]\d\d|3[0-5]\d|36[0-5])|(0[1-9]|1[0-2])-(0[1-9]|1\d|2[0-8])|(0[13-9]|1[0-2])-(29|30)|(0[13578]|1[02])-31|W(0[1-9]|[1-4]\d|5[0-2])-[1-7])

Leap-day year additions

[+-]?(\d{2,8}([13579][26]|[2468][048]|0[48])|(\d{0,6}([13579][26]|[02468][048])00))-(366|02-29)

Leap-week year additions

+?\d{0,6}(([02468][048]|[13579][26])([26]0|71|[38]2|[49]3|[05]4|15|[27]6|37|[48]8|[09]9)|([02468][159]|[13579][37])(50|[16]1|[27]2|33|[48]4|[09]5|[15]6|67|[27]8|[38]9)|([02468][26]|[13579][048])([48]0|[09]1|[15]2|63|[27]4|[38]5|[49]6|[05]7|[16]8|29)|([02468][37]|[13579][159])([27]0|[38]1|[49]2|[05]3|[16]4|25|[37]6|87|[049]8|[5]9))-W53-[1-7]
-\d{0,6}(([02468][048]|[13579][26])(0[28]|1[39]|24|3[06]|4[17]|5[28]|6[49]|75|8[06]|9[27])|([02468][159]|[13579][37])(0[49]|15|2[06]|3[27]|4[38]|54|6[05]|7[16]|8[28]|9[39])|([02468][26]|[13579][048])(0[51]|16|2[28]|3[39]|44|5[06]|6[17]|7[28]|8[49]|95)|([02468][37]|[13579][159])(0[17]|1[28]|2[49]|35|4[06]|5[27]|6[38]|74|8[05]|9[16]))-W53-[1-7]

PCRE (also Perl), 778 bytes

/^([+-]?\d*((([02468][048]|[13579][26]|\d\d(?!00))([02468][048]|[13579][26]))|\d{4}(?!-02-29|-366))-((?!02-3|(0[469]|11)-31|000)((0[1-9]|1[012])-(0[1-9]|[12]\d|30|31)|([012]\d\d|3([0-5]\d|6[0-6])))|(W(?!00)([0-4]\d|51|52)-[1-7]))|((\+?\d*([02468][048]|[13579][26])|-\d*([02468][159]|[13579][37]))(04|09|15|20|26|32|37|43|48|54|60|65|71|76|82|88|93|99)|(\+?\d*([02468][159]|[13579][37])|-\d*([02468][26]|[13579][048]))(05|11|16|22|28|33|39|44|50|56|61|67|72|78|84|89|95)|(\+?\d*([02468][26]|[13579][048])|-\d*([02468][37]|[13579][159]))(01|07|12|18|24|29|35|40|46|52|57|63|68|74|80|85|91|96)|\+?\d*(([02468][37]|[13579][159])(03|14|20|25|31|36|42|53|59|64|70|76|81|87|92|[049]8))|-\d*(([02468][048]|[13579][26])([059]2|08|13|19|24|30|36|41|47|58|64|69|75|80|86|97)))-W53-[1-7])$/

I have included the delimiters in the byte count to show that it doesn't rely on any flags.

It does not match valid dates within other strings, such as 1234-56-89 2016-02-29 9876-54-32. The regex is shorter by not checking for a maximum of 10 digits for the year.

Extended with comments:

/^  # Start of pattern (no leading space)
  (
    # YEAR
    # Optional sign and digits if more than 4 in year
    [+-]?\d*(
      # Years 00??, 04??, 08?? ... 92??, 96?? OR dd not followed by 00
      # followed by 00, 04, 08 ... 92, 96 OR
      (([02468][048]|[13579][26]|\d\d(?!00))([02468][048]|[13579][26])) |
      # any year not followed by 29 February or day 366
      \d{4}(?!-02-29|-366)
    # dash
    ) -
    # MONTH AND DAY, or DAY OF YEAR, or WEEK OF YEAR AND DAY if less than 53 weeks
    (
      # Not (30 or 31 February OR 31 April, June, September or December OR day 0)
      (?!02-3|(0[469]|11)-31|000)
      (
        # Month         dash         day         OR
        (0[1-9]|1[012]) - (0[1-9]|[12]\d|30|31) |
        # 001-299 OR 300-359 OR 360-366
        ([012]\d\d | 3([0-5]\d | 6[0-6]))
      # OR
      ) |
      (
        # W    01-52    dash    1-7
        W(?!00)([0-4]\d|51|52)-[1-7]
      )
    # OR
    ) |
    # WEEK OF YEAR AND DAY only if week is 53
    (
      # Optional plus and extra year digits
      \+?\d*(
        # Years +0303 - +9998
        ([02468][37]|[13579][159])(03|14|20|25|31|36|42|53|59|64|70|76|81|87|92|[049]8)
      ) |
      # Minus and extra year digits
      -\d*(
        # Years -0002 - -9697
        ([02468][048]|[13579][26])([059]2|08|13|19|24|30|36|41|47|58|64|69|75|80|86|97)
      ) |
      # Years +0004 - +9699, -0104 - -9799
      (\+?\d*([02468][048]|[13579][26])|-\d*([02468][159]|[13579][37]))
          (04|09|15|20|26|32|37|43|48|54|60|65|71|76|82|88|93|99) |
      # Years +0105 - +9795, -0205 - -9895
      (\+?\d*([02468][159]|[13579][37])|-\d*([02468][26]|[13579][048]))
          (05|11|16|22|28|33|39|44|50|56|61|67|72|78|84|89|95) |
      # Years +0201 - +9896, -0301 - -9996
      (\+?\d*([02468][26]|[13579][048])|-\d*([02468][37]|[13579][159]))
          (01|07|12|18|24|29|35|40|46|52|57|63|68|74|80|85|91|96)
    # dash W 53 dash 1-7
    )-W53-[1-7]
  # End of pattern (no trailing space)
  )$/x