Using str in split in pandas

It is explained in the documentation under Indexing using str

.str[index] notation indexes the string by position where as [index] will slice based on the index of the series.

Using the example

s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan,'CABA', 'dog', 'cat'])

s.str[3]

returns the element at index 3 at each row

0    NaN
1    NaN
2    NaN
3      a
4      a
5    NaN
6      A
7    NaN
8    NaN

Whereas

s[3]

returns

'Aaba'

  • chess_data is a dataframe
  • chess_data.winner is a series
  • chess_data.winner.str is an accessor to methods that are string specific and optimized (to a degree)
  • chess_data.winner.str.split is one such method
  • chess_data.winner.map is a different method that takes a dictionary or a callable object and either calls that callable with each element in the series or calls the dictionaries get method on each element of the series.

In the case of using chess_data.winner.str.split Pandas does do a loop and performs a kind of str.split. While map is a more crude way of doing the same thing.


With your data.

chess_data.winner.str.split(':')

0    [A, 1]
1    [A, 2]
2    [A, 3]
3    [A, 4]
4    [B, 1]
5    [B, 2]
Name: winner, dtype: object

In order to get each first element, you'll want to use the string accessor again

chess_data.winner.str.split(':').str[0]

0    A
1    A
2    A
3    A
4    B
5    B
Name: winner, dtype: object

This is the equivalent way of performing what you had done in your map

chess_data.winner.map(lambda x: x.split(':')[0])

You could have also used a comprehension

chess_data.assign(new_col=[x.split(':')[0] for x in chess_data.winner])

  winner new_col
0    A:1       A
1    A:2       A
2    A:3       A
3    A:4       A
4    B:1       B
5    B:2       B

Your code,

chess_data['winner'].str.split(':')[0] 
['A', '1']

Is the same as,

chess_data['winner'].str.split(':').loc[0] 
['A', '1']

And,

chess_data['winner'].map(lambda n: n.split(':')[0])
0    A
1    A
2    A
3    A
4    B
5    B
Name: winner, dtype: object

Is the same as,

chess_data.winner.str.split(':').str[0]
0    A
1    A
2    A
3    A
4    B
5    B
Name: winner, dtype: object

Which is also the same as,

pd.Series([x.split(':')[0] for x in chess_data['winner']], name='winner') 
0    A
1    A
2    A
3    A
4    B
5    B
Name: winner, dtype: object