Search for String in all Pandas DataFrame columns and filter

The Series.str.contains method expects a regex pattern (by default), not a literal string. Therefore str.contains("^") matches the beginning of any string. Since every string has a beginning, everything matches. Instead use str.contains("\^") to match the literal ^ character.

To check every column, you could use for col in df to iterate through the column names, and then call str.contains on each column:

mask = np.column_stack([df[col].str.contains(r"\^", na=False) for col in df])
df.loc[mask.any(axis=1)]

Alternatively, you could pass regex=False to str.contains to make the test use the Python in operator; but (in general) using regex is faster.


posting my findings in case someone would need.

i had a Dataframe (360 000 rows), needed to search across the whole dataframe to find the rows (just a few) that contained word 'TOTAL' (any variation eg 'TOTAL PRICE', 'TOTAL STEMS' etc) and delete those rows.

i finally processed the dataframe in two-steps:

FIND COLUMNS THAT CONTAIN THE WORD:

for i in df.columns:
df[i].astype('str').apply(lambda x: print(df[i].name) if x.startswith('TOTAL') else 'pass')

DELETE THE ROWS:

df[df['LENGTH/ CMS'].str.contains('TOTAL') != True]

Here's a function to solve the problem of doing text search in all column of a dataframe df:

def search(regex: str, df, case=False):
    """Search all the text columns of `df`, return rows with any matches."""
    textlikes = df.select_dtypes(include=[object, "string"])
    return df[
        textlikes.apply(
            lambda column: column.str.contains(regex, regex=True, case=case, na=False)
        ).any(axis=1)
    ]

It differs from the existing answers by both staying in the pandas API and embracing that pandas is more efficient in column processing than row processing. Also, this is packed as a pure function :-)

Relevant docs:

  • DataFrame.apply
  • The .str accessor
  • DataFrame.any

Try with :

df.apply(lambda row: row.astype(str).str.contains('TEST').any(), axis=1)

Tags:

Python

Pandas