How do I split a large xml file?

First download foxe xml editor from this link http://www.firstobject.com/foxe242.zip

Watch that video http://www.firstobject.com/xml-splitter-script-video.htm Video explains how split code works.

There is a script code on that page (starts with split() ) copy the code and on the xml editor program make a "New Program" under the "File". Paste the code and save it. The code is:

split()
{
  CMarkup xmlInput, xmlOutput;
  xmlInput.Open( "**50MB.xml**", MDF_READFILE );
  int nObjectCount = 0, nFileCount = 0;
  while ( xmlInput.FindElem("//**ACT**") )
  {
    if ( nObjectCount == 0 )
    {
      ++nFileCount;
      xmlOutput.Open( "**piece**" + nFileCount + ".xml", MDF_WRITEFILE );
      xmlOutput.AddElem( "**root**" );
      xmlOutput.IntoElem();
    }
    xmlOutput.AddSubDoc( xmlInput.GetSubDoc() );
    ++nObjectCount;
    if ( nObjectCount == **5** )
    {
      xmlOutput.Close();
      nObjectCount = 0;
    }
  }
  if ( nObjectCount )
    xmlOutput.Close();
  xmlInput.Close();
  return nFileCount;
}

Change the bold marked (or ** ** marked) fields for your needs. (this is also expressed at the video page)

On the xml editor window right click and click the RUN (or simply F9). There is output bar on the window where it shows number of files that generated.

Note: input File name can be "C:\\Users\\AUser\\Desktop\\a_xml_file.xml" (double slashes) and output file "C:\\Users\\AUser\\Desktop\\anoutputfolder\\piece" + nFileCount + ".xml"


As mentioned already the xml_split from the Perl package XML::Twig does a great job.

Usage

xml_split < bigFile.xml

#or if compressed e.g.
bzcat bigFile.xml.bz2 | xml_split

Without any arguments xml_split creates a file per top-level child node.

There are parameters to specify the number of elements you want per file (-g) or approximate size (-s <Kb|Mb|Gb>).

Installation

Windows

Look here

Linux

sudo apt-get install xml-twig-tools


There's no general-purpose solution to this, because there's so many different possible ways that your source XML could be structured.

It's reasonably straightforward to build an XSLT transform that will output a slice of an XML document. For instance, given this XML:

<header>
  <data rec="1"/>
  <data rec="2"/>
  <data rec="3"/>
  <data rec="4"/>
  <data rec="5"/>
  <data rec="6"/>
</header>

you can output a copy of the file containing only data elements within a certain range with this XSLT:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>
  <xsl:param name="startPosition"/>
  <xsl:param name="endPosition"/>

  <xsl:template match="@* | node()">
      <xsl:copy>
          <xsl:apply-templates select="@* | node()"/>
      </xsl:copy> 
  </xsl:template>

  <xsl:template match="header">
    <xsl:copy>
      <xsl:apply-templates select="data"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="data">
    <xsl:if test="position() &gt;= $startPosition and position() &lt;= $endPosition">
      <xsl:copy>
        <xsl:apply-templates select="@* | node()"/>
      </xsl:copy>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>

(Note, by the way, that because this is based on the identity transform, it works even if header isn't the top-level element.)

You still need to count the data elements in the source XML, and run the transform repeatedly with the values of $startPosition and $endPosition that are appropriate for the situation.

Tags:

Windows

Xml