How to create a DataFrame while preserving order of the columns?

To preserve column order pass in your numpy arrays as a list of tuples to DataFrame.from_items:

>>> df = pd.DataFrame.from_items([('foo', foo), ('bar', bar)])

   foo  bar
0    1    4
1    2    5
2    3    6

Update

From pandas 0.23 from_items is deprecated and will be removed. So pass the numpy arrays using from_dict. To use from_dict you need to pass the items as a dictionary:

>>> from collections import OrderedDict as OrderedDict
>>> df = pd.DataFrame.from_dict(OrderedDict(zip(['foo', 'bar'], [foo, bar])))

From python 3.7 you can depend on insertion order being preserved (see https://mail.python.org/pipermail/python-dev/2017-December/151283.html) so:

>>> df = pd.DataFrame.from_dict(dict(zip(['foo', 'bar'], [foo, bar])))

or simply:

>>> df = pd.DataFrame(dict(zip(['foo', 'bar'], [foo, bar])))

Use the columns keyword when creating the DataFrame:

pd.DataFrame({'foo': foo, 'bar': bar}, columns=['foo', 'bar'])

Also, note that you don't need to create the Series.


Original Solution: Incorrect Usage of collections.OrderedDict

In my original solution, I proposed to use OrderedDict from the collections package in python's standard library.

>>> import numpy as np
>>> import pandas as pd
>>> from collections import OrderedDict
>>>
>>> foo = np.array( [ 1, 2, 3 ] )
>>> bar = np.array( [ 4, 5, 6 ] )
>>>
>>> pd.DataFrame( OrderedDict( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) } ) )

   foo  bar
0    1    4
1    2    5
2    3    6

Right Solution: Passing Key-Value Tuple Pairs for Order Preservation

However, as noted, if a normal dictionary is passed to OrderedDict, the order may still not be preserved since the order is randomized when constructing the dictionary. However, a work around is to convert a list of key-value tuple pairs into an OrderedDict, as suggested from this SO post:

>>> import numpy as np
>>> import pandas as pd
>>> from collections import OrderedDict
>>>
>>> a = np.array( [ 1, 2, 3 ] )
>>> b = np.array( [ 4, 5, 6 ] )
>>> c = np.array( [ 7, 8, 9 ] )
>>>
>>> pd.DataFrame( OrderedDict( { 'a': pd.Series(a), 'b': pd.Series(b), 'c': pd.Series(c) } ) )

   a  c  b
0  1  7  4
1  2  8  5
2  3  9  6

>>> pd.DataFrame( OrderedDict( (('a', pd.Series(a)), ('b', pd.Series(b)), ('c', pd.Series(c))) ) )

   a  b  c
0  1  4  7
1  2  5  8
2  3  6  9

Tags:

Python

Pandas