How can I split a string of a mathematical expressions in python?

Tree with ast

You could use ast to get a tree of the expression :

import ast

source = '((81 * 6) /42+ (3-1))'
node = ast.parse(source) 

def show_children(node, level=0):
    if isinstance(node, ast.Num):
        print(' ' * level + str(node.n))
    else:
        print(' ' * level + str(node))
    for child in ast.iter_child_nodes(node):
        show_children(child, level+1)

show_children(node)

It outputs :

<_ast.Module object at 0x7f56abbc5490>
 <_ast.Expr object at 0x7f56abbc5350>
  <_ast.BinOp object at 0x7f56abbc5450>
   <_ast.BinOp object at 0x7f56abbc5390>
    <_ast.BinOp object at 0x7f56abb57cd0>
     81
     <_ast.Mult object at 0x7f56abbd0dd0>
     6
    <_ast.Div object at 0x7f56abbd0e50>
    42
   <_ast.Add object at 0x7f56abbd0cd0>
   <_ast.BinOp object at 0x7f56abb57dd0>
    3
    <_ast.Sub object at 0x7f56abbd0d50>
    1

As @user2357112 wrote in the comments : ast.parse interprets Python syntax, not mathematical expressions. (1+2)(3+4) would be parsed as a function call and list comprehensions would be accepted even though they probably shouldn't be considered a valid mathematical expression.

List with a regex

If you want a flat structure, a regex could work :

import re

number_or_symbol = re.compile('(\d+|[^ 0-9])')
print(re.findall(number_or_symbol, source))
# ['(', '(', '81', '*', '6', ')', '/', '42', '+', '(', '3', '-', '1', ')', ')']

It looks for either :

  • multiple digits
  • or any character which isn't a digit or a space

Once you have a list of elements, you could check if the syntax is correct, for example with a stack to check if parentheses are matching, or if every element is a known one.


You need to implement a very simple tokenizer for your input. You have the following types of tokens:

  • (
  • )
  • +
  • -
  • *
  • /
  • \d+

You can find them in your input string separated by all sorts of white space.

So a first step is to process the string from start to finish, and extract these tokens, and then do your parsing on the tokens, rather than on the string itself.

A nifty way to do this is to use the following regular expression: '\s*([()+*/-]|\d+)'. You can then:

import re

the_input='(3+(2*5))'
tokens = []
tokenizer = re.compile(r'\s*([()+*/-]|\d+)')
current_pos = 0
while current_pos < len(the_input):
  match = tokenizer.match(the_input, current_pos)
  if match is None:
     raise Error('Syntax error')
  tokens.append(match.group(1))
  current_pos = match.end()
print(tokens)

This will print ['(', '3', '+', '(', '2', '*', '5', ')', ')']

You could also use re.findall or re.finditer, but then you'd be skipping non-matches, which are syntax errors in this case.


If you don't want to use re module, you can try this:

s="((81 * 6) /42+ (3-1))"

r=[""]

for i in s.replace(" ",""):
    if i.isdigit() and r[-1].isdigit():
        r[-1]=r[-1]+i
    else:
        r.append(i)
print(r[1:])

Output:

['(', '(', '81', '*', '6', ')', '/', '42', '+', '(', '3', '-', '1', ')', ')']