to_sql pyodbc count field incorrect or syntax error
For me solution was NOT TO USE:
engine = create_engine(connection_uri, fast_executemany=True)
instead I just played with:
df.to_sql('tablename', engine, index=False, if_exists='replace', method='multi', chunksize=100)
Here instead of
chunksize=100 I've put
chunksize=90 and it started to work. Obviously because previous table was smaller and for larger number of columns you might need smaller number here. Play around with it if you don't want to play with calculations which might be wrong for various reasons.
Made a few modifications based on Gord Thompson's answer. This will auto-calculate the chunksize and keep it to the lowest nearest integer value which fits in the 2100 parameters limit :
import math df_num_of_cols=len(df.columns) chunknum=math.floor(2100/df_num_of_cols) df.to_sql('MY_TABLE',con=engine,schema='myschema',chunksize=chunknum,if_exists='append',method='multi',index=False )
At the time this question was asked, pandas 0.23.0 had just been released. That version changed the default behaviour of
.to_sql() from calling the DBAPI
.executemany() method to constructing a table-value constructor (TVC) that would improve upload speed by inserting multiple rows with a single
.execute() call of an INSERT statement. Unfortunately that approach often exceeded T-SQL's limit of 2100 parameter values for a stored procedure, leading to the error cited in the question.
Shortly thereafter, a subsequent release of pandas added a
method= argument to
.to_sql(). The default –
method=None – restored the previous behaviour of using
.executemany(), while specifying
method="multi" would tell
.to_sql() to use the newer TVC approach.
Around the same time, SQLAlchemy 1.3 was released and it added a
fast_executemany=True argument to
create_engine() which greatly improved upload speed using Microsoft's ODBC drivers for SQL Server. With that enhancement,
method=None proved to be at least as fast as
method="multi" while avoiding the 2100-parameter limit.
So with current versions of pandas, SQLAlchemy, and pyodbc, the best approach for using
.to_sql() with Microsoft's ODBC drivers for SQL Server is to use
fast_executemany=True and the default behaviour of
connection_uri = ( "mssql+pyodbc://scott:tiger^[email protected]/db_name" "?driver=ODBC+Driver+17+for+SQL+Server" ) engine = create_engine(connection_uri, fast_executemany=True) df.to_sql("table_name", engine, index=False, if_exists="append")
This is the recommended approach for apps running on Windows, macOS, and the Linux variants that Microsoft supports for its ODBC driver. If you need to use FreeTDS ODBC, then
.to_sql() can be called with
chunksize= as described below.
Prior to pandas version 0.23.0,
to_sql would generate a separate INSERT for each row in the DataTable:
exec sp_prepexec @p1 output,N'@P1 int,@P2 nvarchar(6)', N'INSERT INTO df_to_sql_test (id, txt) VALUES (@P1, @P2)', 0,N'row000' exec sp_prepexec @p1 output,N'@P1 int,@P2 nvarchar(6)', N'INSERT INTO df_to_sql_test (id, txt) VALUES (@P1, @P2)', 1,N'row001' exec sp_prepexec @p1 output,N'@P1 int,@P2 nvarchar(6)', N'INSERT INTO df_to_sql_test (id, txt) VALUES (@P1, @P2)', 2,N'row002'
Presumably to improve performance, pandas 0.23.0 now generates a table-value constructor to insert multiple rows per call
exec sp_prepexec @p1 output,N'@P1 int,@P2 nvarchar(6),@P3 int,@P4 nvarchar(6),@P5 int,@P6 nvarchar(6)', N'INSERT INTO df_to_sql_test (id, txt) VALUES (@P1, @P2), (@P3, @P4), (@P5, @P6)', 0,N'row000',1,N'row001',2,N'row002'
The problem is that SQL Server stored procedures (including system stored procedures like
sp_prepexec) are limited to 2100 parameters, so if the DataFrame has 100 columns then
to_sql can only insert about 20 rows at a time.
We can calculate the required
# df is an existing DataFrame # # limit based on sp_prepexec parameter count tsql_chunksize = 2097 // len(df.columns) # cap at 1000 (limit for number of rows inserted by table-value constructor) tsql_chunksize = 1000 if tsql_chunksize > 1000 else tsql_chunksize # df.to_sql('tablename', engine, index=False, if_exists='replace', method='multi', chunksize=tsql_chunksize)
However, the fastest approach is still likely to be:
dump the DataFrame to a CSV file (or similar), and then
have Python call the SQL Server
bcputility to upload that file into the table.