Retrieving Data from MySQL in batches via Python

To expand on akalikin's answer, you can use a stepped iteration to split the query into chunks, and then use LIMIT and OFFSET to execute the query.

cur = con.cursor(mdb.cursors.DictCursor)
cur.execute("SELECT COUNT(*) FROM sales")

for i in range(0,cur.fetchall(),5):
    cur2 = con.cursor(mdb.cursors.DictCursor)
    cur2.execute("SELECT id, date, product_id, sales FROM sales LIMIT %s OFFSET %s" %(5,i))
    rows = cur2.fetchall()
    print rows

You can use

SELECT id, date, product_id, sales FROM sales LIMIT X OFFSET Y;

where X is the size of the batch you need and Y is current offset (X times number of current iterations for example)


First point: a python db-api.cursor is an iterator, so unless you really need to load a whole batch in memory at once, you can just start with using this feature, ie instead of:

cursor.execute("SELECT * FROM mytable")
rows = cursor.fetchall()
for row in rows:
   do_something_with(row)

you could just:

cursor.execute("SELECT * FROM mytable")
for row in cursor:
   do_something_with(row)

Then if your db connector's implementation still doesn't make proper use of this feature, it will be time to add LIMIT and OFFSET to the mix:

# py2 / py3 compat
try:
    # xrange is defined in py2 only
    xrange
except NameError:
    # py3 range is actually p2 xrange
    xrange = range

cursor.execute("SELECT count(*) FROM mytable")
count = cursor.fetchone()[0]
batch_size = 42 # whatever

for offset in xrange(0, count, batch_size):
    cursor.execute(
        "SELECT * FROM mytable LIMIT %s OFFSET %s", 
        (batch_size, offset))
   for row in cursor:
       do_something_with(row)

Tags:

Python

Mysql