Select rows by partial string with query with pandas

The petobens solution now works with Query without engine spec, what increases the speed, acconding the manual.

Uses contains in query spec it's a powerful feature to handling string content because allow use regex.

import numpy as np
import pandas as pd
A = np.array(["Paulo", "Lucas", "Luana", "Larra", "BaLu","Bela"])
B = np.array([111, 222, 222, 333, 333, 777])
C = np.random.randint(10, 99, 6)
dt = pd.DataFrame(zip(A, B, C), columns=['A', 'B', 'C'])
dt.set_index(['A', 'B'], inplace=True)
print(dt)
print("=============")
print(dt.query('A.str.contains("Lu")'))
print("=============")
print(dt.query('A.str.contains("L(a|u)", regex=True)'))
print("=============")
print(dt.query('A.str.contains("^L", regex=True)'))  # starts with L

The result is

A   B
1.1 Paulo  57
    Lucas  49
3.3 Luana  38
    Larra  82
5.5 BaLu   37
6.6 Bela   14
=============
            C
A   B
1.1 Lucas  49
3.3 Luana  38
5.5 BaLu   37
=============
            C
A   B
1.1 Lucas  49
3.3 Luana  38
    Larra  82
5.5 BaLu   37
=============
            C
A   B
1.1 Lucas  49
3.3 Luana  38
    Larra  82

This answer is out of date. Please check @petobens' answer.

As of version 0.20.2, query doesn't support partial string matching. There is an open future request about it and one of the core developers seems to agree that it would be a nice addition.


The issue that @ayhan refers to now shows how this can be achieved by using query's python engine:

print(df.query('name.str.contains("lu")', engine='python').head())

should work.