"ERROR: extra data after last expected column" when using PostgreSQL COPY

An empty table won't do. You need table that matches the structure of input data. Something like:

CREATE TABLE raw_data (
  col1 int
, col2 int

You don't need to declare tab as DELIMITER since that's the default:

COPY raw_data FROM '/home/Projects/TestData/raw_data.txt';

800 columns you say? That many columns would typically indicate a problem with your design. Anyway, there are ways to half-automate the CREATE TABLE script.


Assuming simplified raw data

1   2   3   4  -- first row contains "column names"
1   1   0   1  -- tab separated
1   0   0   1
1   0   1   1

Define a different DELIMITER (one that does not occur in the import data at all), and import to a temporary staging table with a single text column:

CREATE TEMP TABLE tmp_data (raw text);

COPY tmp_data FROM '/home/Projects/TestData/raw_data.txt' WITH (DELIMITER '§');

This query creates the CREATE TABLE script:

SELECT 'CREATE TABLE tbl (col' || replace (raw, E'\t', ' bool, col') || ' bool)'
FROM   (SELECT raw FROM tmp_data LIMIT 1) t;

A more generic & safer query:

    ||  string_agg(quote_ident('col' || col), ' bool, ' ORDER  BY ord)
    || ' bool);'
FROM  (SELECT raw FROM tmp_data LIMIT 1) t
     , unnest(string_to_array(t.raw, E'\t')) WITH ORDINALITY c(col, ord);


CREATE TABLE tbl (col1 bool, col2 bool, col3 bool, col4 bool);

Execute after verifying validity - or execute dynamically if you trust the result:

   SELECT 'CREATE TABLE tbl (col' || replace(raw, ' ', ' bool, col') || ' bool)'
   FROM  (SELECT raw FROM tmp_data LIMIT 1) t

Then INSERT the data with this query:

SELECT (('(' || replace(replace(replace(
                , '1',   't')
                , '0',   'f')
                , E'\t', ',')
             || ')')::tbl).*
FROM   (SELECT raw FROM tmp_data OFFSET 1) t;

Or simpler with translate():

SELECT (('(' || translate(raw, E'10\t', 'tf,') || ')')::tbl).*
FROM   (SELECT raw FROM tmp_data OFFSET 1) t;

The string is converted into a row literal, cast to the newly created table row type and decomposed with (row).*.

All done.

You could put all of that into a plpgsql function, but you'd need to safeguard against SQL injection. (There are a number of related solutions here on SO. Try a search.

db<>fiddle here
Old SQL Fiddle