how can I create new XML file from existing database in PostgreSQL database using java

I have a working implementation where I do everything inside PostgreSQL without additional libraries.

Auxiliary parsing function

CREATE OR REPLACE FUNCTION f_xml_extract_val(text, xml)
  RETURNS text AS
$func$
SELECT CASE
        WHEN $1 ~ '@[[:alnum:]_]+$' THEN
           (xpath($1, $2))[1]
        WHEN $1 ~* '/text()$' THEN
           (xpath($1, $2))[1]
        WHEN $1 LIKE '%/' THEN
           (xpath($1 || 'text()', $2))[1]
        ELSE
           (xpath($1 || '/text()', $2))[1]
       END;
$func$  LANGUAGE sql IMMUTABLE;

Handle multiple values

The above implementation doesn't handle multiple attributes at one xpath. Here is an overloaded version of f_xml_extract_val() for that. With the 3rd parameter you can pick one (the first), all or dist (distinct) values. Multiple values are aggregated to a comma-separated string.

CREATE OR REPLACE FUNCTION f_xml_extract_val(_path text, _node xml, _mode text)
  RETURNS text AS
$func$
DECLARE
   _xpath text := CASE
                   WHEN $1 ~~ '%/'              THEN $1 || 'text()'
                   WHEN lower($1) ~~ '%/text()' THEN $1
                   WHEN $1 ~ '@\w+$'            THEN $1
                   ELSE                              $1 || '/text()'
                  END;
BEGIN
   -- fetch one, all or distinct values
   CASE $3
       WHEN 'one'  THEN RETURN (xpath(_xpath, $2))[1]::text;
       WHEN 'all'  THEN RETURN array_to_string(xpath(_xpath, $2), ', ');
       WHEN 'dist' THEN RETURN array_to_string(ARRAY(
            SELECT DISTINCT unnest(xpath(_xpath, $2))::text ORDER BY 1), ', ');
       ELSE RAISE EXCEPTION
          'Invalid $3: >>%<<', $3;
   END CASE;
END
$func$  LANGUAGE plpgsql;

COMMENT ON FUNCTION f_xml_extract_val(text, xml, text) IS '
Extract element of an xpath from XML document
Overloaded function to f_xml_extract_val(..)
$3 .. mode is one of: one | all | dist'

Call:

SELECT f_xml_extract_val('//city', x, 'dist');

Main part

Name of target table: tbl; prim. key: id:

CREATE OR REPLACE FUNCTION f_sync_from_xml()
  RETURNS boolean AS
$func$
DECLARE
   datafile text := 'path/to/my_file.xml';  -- only relative path in db dir
   myxml    xml  := pg_read_file(datafile, 0, 100000000); -- arbitrary 100 MB
BEGIN
   -- demonstrating 4 variants of how to fetch values for educational purposes
   CREATE TEMP TABLE tmp ON COMMIT DROP AS
   SELECT (xpath('//some_id/text()', x))[1]::text AS id   -- id is unique  
        , f_xml_extract_val('//col1', x)          AS col1 -- one value
        , f_xml_extract_val('//col2/', x, 'all')  AS col2 -- all values incl. dupes
        , f_xml_extract_val('//col3/', x, 'dist') AS col3 -- distinct values
   FROM   unnest(xpath('/xml/path/to/datum', myxml)) x;

   -- 1.) DELETE?

   -- 2.) UPDATE
   UPDATE tbl t
   SET   (  col_1,   col2,   col3) =
         (i.col_1, i.col2, i.col3)
   FROM   tmp i
   WHERE  t.id = i.id
   AND   (t.col_1, t.col2, t.col3) IS DISTINCT FROM
         (i.col_1, i.col2, i.col3);

   -- 3.) INSERT NEW
   INSERT INTO tbl
   SELECT i.*
   FROM   tmp i
   WHERE  NOT EXISTS (SELECT 1 FROM tbl WHERE id = i.id);
END
$func$  LANGUAGE plpgsql;

Important notes

  • This implementation checks on a primary key if the inserted row exists already and updates in this case. Only new rows are inserted.

  • I use a temporary staging table to speed up the procedure.

  • Tested with Postgres 8.4, 9.0 and 9.1.

  • XML has to be well-formed.

  • pg_read_file() has restrictions to it. The manual:

    Use of these functions is restricted to superusers.

    And:

    Only files within the database cluster directory and the log_directory can be accessed.

So you have to put your source file there - or create a symbolic link to your actual file/directory.

Or you can provide the file via Java in your case (I did it all inside Postgres).

Or you can import the data into 1 column of 1 row of a temporary table and take it from there.

Or you can use lo_import like demonstrated in this related answer on dba.SE.

This blog post by Scott Bailey helped me.


Postgres has (thanks to Daniel Lyons for pointing it out) native XML support which you can use to store your table. If you however want to shred your XML data manually, there are different possibilities to represent XML data in a database. The first question should be, if you want a very generic solution, that will be able to store any XML document or one that is specific to your domain (i.e. only allows XML documents of a certain structure). Depending on that, you will have a very flexible, universal representation which is however harder to query (the SQL needed will be quite complicated). If you have a more specific approach, the queries will be simpler, but you will need to create new tables or add new attributes to existing talbes every time you want to store another type of document or add a field to an existing document; so changing the schema will be harder (which is one major advantage of XML). This presentation should give you some ideas what are the different possibilities.

Also, you might consider to switch to some DB that supports Xquery, like DB2. The ability to natively query using XQuery, a language targeted at processing XML, will simplify things a lot.

UPDATE: Given your comment, your XML data (that you linked to) is perfectly relational. It can be mapped 1:1 to the following table:

CREATE TABLE mynt (
    ID          SERIAL     ,
    myntnafn    CHAR(3)    ,
    myntheiti   Varchar(255) ,
    kaupgengi   Decimal(15,2) ,
    midgengi    Decimal(15,2) ,
    solugengi   Decimal(15,2) ,
    dagsetning  TimeStamp      
)

So any mynt tag would be a record in the table and the corresponding sub-tags the attributes. The data types I gathered from your data, they might be wrong. The main problem is, IMO, that there is no natural primary key, so I added an autogenerated one.


PostgreSQL has an XML datatype. There are lots of XML specific functions you can use to query and modify the data, such as with xpath.

From the Java side, you can pretend you're just working with strings, but know that the data is well-formed on the way out and it won't let you store non-well-formed data.