Cops and Robbers: Reverse Regex Golf

Basic Regex, 656813 bytes [safe!]

The regex to end all regexes. One final hurrah into the night.

Testable under PCRE, Perl, Python and many others.

bzip2'd and base64-encoded version on Pastebin: (Pastebin didn't want the raw version because it was too big).

To make sure you get the right regex, you can verify that its MD5 hash is


or check that it begins with


and ends with


The key is still a nice comfortable 256 bytes.

I tested this regex with Python, but note that this regex doesn't use any special features of Python. Indeed, with the exception of (?:) (as a grouping mechanism), it actually uses no special features of any regex engine at all: just basic character classes, repetitions, and anchoring. Thus, it should be testable in a great number of regular expression engines.

Well, actually, I can still crank the difficulty up, assuming someone doesn't just instantly solve the smaller problems...but I wager people will have trouble with a 1GB regex...

After 72 hours, this submission remains uncracked! Thus, I am now revealing the key to make the submission safe. This is the first safe submission, after over 30 submissions were cracked in a row by persistent robbers.

Match: Massive Regex Problem Survives The Night!
Non-match: rae4q9N4gMXG3QkjV1lvbfN!wI4unaqJtMXG9sqt2Tb!0eonbKx9yUt3xcZlUo5ZDilQO6Wfh25vixRzgWUDdiYgw7@J8LgYINiUzEsIjc1GPV1jpXqGcbS7JETMBAqGSlFC3ZOuCJroqcBeYQtOiEHRpmCM1ZPyRQg26F5Cf!5xthgWNiK!8q0mS7093XlRo7YJTgZUXHEN!tXXhER!Kenf8jRFGaWu6AoQpj!juLyMuUO5i0V5cz7knpDX0nsL

Regex explanation:

The regex was generated from a "hard" 3SAT problem with a deliberately-introduced random solution. This problem was generated using the algorithm from [Jia, Moore & Strain, 2007]: "Generating Hard Satisfiable Formulas by Hiding Solutions Deceptively". Six boolean variables are packed into each byte of the key, for a total of 1536 variables.
The regex itself is quite simple: it expresses each of 7680 3SAT clauses as a an inverted condition (by de Morgan's laws), and matches any string that does not meet one of the 3SAT clauses. Therefore, the key is a string which does not match the regex, i.e. one that satisfies every one of the clauses.

.NET regex, 841 bytes [Safe!]

Now that I've got a safe entry, lets see how small I can make the regex!



    ((?<-G>){997}|){9}      ((?<-G>)(?<g>)|){997}



  • Short, 841 bytes
  • Golfed and written by hand
  • Not known to encode an NP-hard problem
  • Times out on most invalid input :)
  • Tested on, takes ~5 seconds for the valid input

Thanks to Sp3000 and user23013 for cluing me in to .NET regex.

After 72 hours, I am revealing the key to make this submission safe.



Non-match: Aren'tHashFunctionsFun?


This regular expression implements a very simple and rather stupid hash function. The hash function computes a single integer x as output. x starts off equal to 53. It is adjusted based on each character encountered: if it sees a 0, it will set x = 7x + 5, and if it sees a 1, it will set x = 3x + 1. x is then reduced mod 9977. The final result is checked against a predefined constant; the regex fails to match if the hash value is not equal.

Seven capture groups (a-g) are used to store the base-997 digits of x, with seven more capture groups (A-G) serving as temporary storage. I use the "balancing capture groups" extension of .NET regex to store integers in capture groups. Technically, the integer associated with each capture group is the number of unbalanced matches captured by that group; "capturing" an empty string using (?<X>) increments the number of captures, and "balancing" the group using (?<-X>) decrements the number of captures (which will cause a match failure if the group has no captures). Both can be repeated to add and subtract fixed constants.

This hash algorithm is just one I cooked up in a hurry, and is the smallest hash algorithm I could come up with that seemed reasonably secure using only additions and multiplications. It's definitely not crypto-quality, and there are likely to be weaknesses which make it possible to find a collision in less than 9977/2 hash evaluations.

ECMAScript (10602 bytes)

(Language note: I see a lot of posts labeled ruby, or python, or whatever, when they really don't use any language-specific features. This one only requires (?!...) and (?=...) on top of POSIX ERE with backreferences. Those features are probably in your favorite language's regexp engine, so don't be discouraged from trying the challenge because I chose to use the javascript online tester.)

Just a little bit of fun, not as computationally difficult as some of the others.


Test here:

(crickets chirping)

No takers? It's oddly disappointing to think of posting the spoiler with no evidence that anyone looked at it long enough to understand what type of problem it is.

I'm writing a complete explanation to post later but I think I'd be happier if someone beat me.

When I said it was not "computationally difficult"... it is an instance of an NP-complete problem, but not a big instance.

Hint: it's a type of pencil-and-paper puzzle. But I'd be quite impressed if you can solve this one with pencil and paper alone (after decoding the regexp into a form suitable for printing).

Spoiler time

There are multiple levels of spoilers here. If you didn't solve the regexp yet, you might want to try again after reading just the first spoiler block. The actual key that matches the regexp is after the last spoiler block.

This regexp encodes a Slitherlink puzzle.

Once you figure out what's going on and convert the regexp into a Slitherlink grid, you'll quickly discover that it's harder than the average Slitherlink. It's on a 16x16 square grid, larger than the usual 10x10. It is also slightly unusual in having no 0 clues and a relative shortage of 3's. 0's and 3's are the easiest clues to work with, so I didn't want to give you a lot of them.

slitherlink puzzle

Second layer of spoilage:

When you're solving the Slitherlink puzzle, an extra surprise kicks in: this Slitherlink has more than one solution. If you're a regular Slitherlink solver, and you have a habit of making deductions based on the assumption of a unique solution, you might have been confused by that. If so, you're a cheater and this is your punishment! Part of the job of a puzzle solver is to find out how many solutions there are.

Final layer of spoilage:

The final twist: the 2 solutions to the Slitherlink are mostly identical, but one is slightly longer than the other. You need to find the short one. If you only found the long one and encoded it as a string to match the regexp, the string would be 257 characters long. The path goes through 256 nodes, but you have to repeat the first node at the end to close the loop. And if you got that far, you might have thought I made a mistake and forgot to count that extra character. Nope! and/or Gotcha! (and/or Boosh! and/or Kakow!)

The short solution is 254 segments long and encodes to a string of 255 characters which is the key. Since you can start at any node on the loop and proceed clockwise or counterclockwise, there are 254*2=508 possible answers.

slitherlink solution

Non-match: bananabananabanana
Match: ƜpRԱԺեþɋэʌkȿՌOfɄCҷɐխɷլԿѪɮȹÞӿѤNɹЦʞӶdѯχԎԷӺջՒϻЉAɔbУƾձҴԉҨʬHѺӄӾԏxчɎֆFƈɊΞζџiփΨӃϣɂƱϬɣɿqϚɰƐդΦժʮgBƕȴւҵɺҏϯƋՐѠɴҔŋԀɤȻɸaЊѬҥѾҸшɉҶjnMʙƸՊʡEɟμƩςʢϪʊLՅȾɝUʝՉϥҁѧЩӷƆԋҳϙѭϢմԂɥȸhΔԓƛѿբՑҩSDȽԅҠGeωƪՈɅϛɃwҀҤՂΩßɜȶʟɀҹԄҡλѥՃȵҜҎɞԲЭщɌИдϠʄԻʫҝyϼӻҺЋϗѩͽɒʈէϞՀթЪΠƏƣoտʓюrԾϟϤƺϫճлљIնǂƎԳuȺԃQϧԶʁWըիcYЏʘƜ