Split a string by spaces -- preserving quoted substrings -- in Python

Have a look at the shlex module, particularly shlex.split.

>>> import shlex
>>> shlex.split('This is "a test"')
['This', 'is', 'a test']

Depending on your use case, you may also want to check out the csv module:

import csv
lines = ['this is "a string"', 'and more "stuff"']
for row in csv.reader(lines, delimiter=" "):
    print(row)

Output:

['this', 'is', 'a string']
['and', 'more', 'stuff']

You want split, from the built-in shlex module.

>>> import shlex
>>> shlex.split('this is "a test"')
['this', 'is', 'a test']

This should do exactly what you want.

If you want to preserve the quotation marks, then you can pass the posix=False kwarg.

>>> shlex.split('this is "a test"', posix=False)
['this', 'is', '"a test"']

I see regex approaches here that look complex and/or wrong. This surprises me, because regex syntax can easily describe "whitespace or thing-surrounded-by-quotes", and most regex engines (including Python's) can split on a regex. So if you're going to use regexes, why not just say exactly what you mean?:

test = 'this is "a test"'  # or "this is 'a test'"
# pieces = [p for p in re.split("( |[\\\"'].*[\\\"'])", test) if p.strip()]
# From comments, use this:
pieces = [p for p in re.split("( |\\\".*?\\\"|'.*?')", test) if p.strip()]

Explanation:

[\\\"'] = double-quote or single-quote
.* = anything
( |X) = space or X
.strip() = remove space and empty-string separators

shlex probably provides more features, though.

Tags:

Python

Regex