Regular expression to remove one parameter from query string

/(?<=&|\?)foo(=[^&]*)?(&|$)/

Uses lookbehind and the last group to "anchor" the match, and allows a missing value. Change the \? to ^ if you've already stripped off the question mark from the query string.

Regex is still not a substitute for a real parser of the query string, however.

Update: Test script: (run it at codepad.org)

import re

regex = r"(^|(?<=&))foo(=[^&]*)?(&|$)"

cases = {
  "foo=123": "",
  "foo=123&bar=456": "bar=456",
  "bar=456&foo=123": "bar=456",
  "abc=789&foo=123&bar=456": "abc=789&bar=456",

  "oopsfoo=123": "oopsfoo=123",
  "oopsfoo=123&bar=456": "oopsfoo=123&bar=456",
  "bar=456&oopsfoo=123": "bar=456&oopsfoo=123",
  "abc=789&oopsfoo=123&bar=456": "abc=789&oopsfoo=123&bar=456",

  "foo": "",
  "foo&bar=456": "bar=456",
  "bar=456&foo": "bar=456",
  "abc=789&foo&bar=456": "abc=789&bar=456",

  "foo=": "",
  "foo=&bar=456": "bar=456",
  "bar=456&foo=": "bar=456",
  "abc=789&foo=&bar=456": "abc=789&bar=456",
}

failures = 0
for input, expected in cases.items():
  got = re.sub(regex, "", input)
  if got != expected:
    print "failed: input=%r expected=%r got=%r" % (input, expected, got)
    failures += 1
if not failures:
  print "Success"

It shows where my approach failed, Mark has the right of it—which should show why you shouldn't do this with regex.. :P


The problem is associating the query parameter with exactly one ampersand, and—if you must use regex (if you haven't picked up on it :P, I'd use a separate parser, which might use regex inside it, but still actually understand the format)—one solution would be to make sure there's exactly one ampersand per parameter: replace the leading ? with a &.

This gives /&foo(=[^&]*)?(?=&|$)/, which is very straight forward and the best you're going to get. Remove the leading & in the final result (or change it back into a ?, etc.). Modifying the test case to do this uses the same cases as above, and changes the loop to:

failures = 0
for input, expected in cases.items():
  input = "&" + input
  got = re.sub(regex, "", input)
  if got[:1] == "&":
    got = got[1:]
  if got != expected:
    print "failed: input=%r expected=%r got=%r" % (input, expected, got)
    failures += 1
if not failures:
  print "Success"

If you want to do this in just one regular expression, you could do this:

/&foo(=[^&]*)?|^foo(=[^&]*)?&?/

This is because you need to match either an ampersand before the foo=..., or one after, or neither, but not both.

To be honest, I think it's better the way you did it: removing the trailing ampersand in a separate step.