Skip to content


Parsing Large XML files with PHP

As part of my on going work of helping to migrate an e-commerce site to Magento, I’ve discovered that I need to parse a very large (>350MB) XML file. My first instinct was to use the SimpleXML parser. After a short test, where loading the file took 20 minutes and used up all of my computers memory, I looked for another solution. XMLReader (it comes with php 5.1) is the answer. It’s not as “pretty” as SimpleXML but its a stream processor so it does not have to load the entire file into memory to process it. This means you can more or less process record by record. Here are a few links that show how to use it:

http://www.ibm.com/developerworks/xml/library/x-xmlphp2.html
http://www.ibm.com/developerworks/library/x-pullparsingphp.html
http://blog.liip.ch/archive/2004/05/10/processing_large_xml_documents_with_php.html

Posted in php.

Tagged with , , .


4 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. Scott says

    Your article spurred me to look into this myself. We are using a script to update and create new products in magento and our current xml php parser http://keithdevens.com/software/phpxml is cringing at large files. I was able able to create parse, call, and print different fields from our xml file using a simple script, but am having some issues integrating into our file that creates products. I was wondering if you could post some examples. Thanks for the useful info.

  2. juice says

    Here’s a quick example:

    <?php
    /*  <product>
        <a001>1405112204</a001>
        <a002>03</a002>
        <productidentifier>
          <b221>15</b221>
          <b244>9781405112208</b244>
        </productidentifier>
        <b246>11</b246>
        <b012>BB</b012>
        <series>
          <seriesidentifier>
            <b273>01</b273>
            <b244>2252</b244>
          </seriesidentifier>
          <b018>Challenges in Contemporary Theology</b018>
          <b019>15</b019>
        </series>
        <title textcase="01">
          <b202>01</b202>
          <b203>Rewritten Theology</b203>
          <b029>Aquinas After His Readers</b029>
        </title>
        <contributor>
          <b034>1</b034>
          <b035>A01</b035>
          <b036>Mark D. Jordan</b036>
          <b037>Jordan, Mark D.</b037>
          <b039>Mark D.</b039>
          <b040>Jordan</b040>
          <professionalaffiliation>
            <b046>Emory University</b046>
          </professionalaffiliation>
        </contributor>
        <language>
          <b253>01</b253>
          <b252>eng</b252>
        </language>
        <b061>224</b061>
        <b064>REL067070</b064>
        <b200>2008</b200>
        <b073>06</b073>
        <othertext>
          <d102>01</d102>
          <d103>02</d103>
          <d104>Responding to the recent upsurge of interest in Thomas Aquinas, this book goes straight
            to the heart of the contemporary debates about Thomism.
            &lt;br&gt;&lt;li&gt;Focuses on the concept of authority, both in terms of
            Aquinas&amp;#8217;s own attitude to authority, and how the Church authorities have used
            Aquinas&amp;#8217;s texts. &lt;br&gt;&lt;li&gt;Engages with
            appropriations of Aquinas&amp;#8217;s work by a range of theologians, from liberal
            Catholics to the creators of radical orthodoxy. &lt;br&gt;&lt;li&gt;Argues
            for future readings of Aquinas which are substantially different from those which have gone
            before.</d104>
        </othertext>
        <othertext>
          <d102>04</d102>
          <d103>02</d103>
          <d104>Preface.&lt;p&gt;Abbreviations and Editions.&lt;p&gt;1 St. Thomas and
            the Police.&lt;p&gt;2 The Competition of Authoritative Languages.&lt;p&gt;3
            Imaginary Thomistic Sciences.&lt;p&gt;4 Thomas&amp;#8217;s Alleged
            Aristotelianism &lt;i&gt;or &lt;/i&gt;Aristotle Among the
            Authorities.&lt;p&gt;5 The Protreptic of &lt;i&gt;Against the
            Gentiles.&lt;/i&gt;&lt;p&gt;6 The &lt;i&gt;Summa of
            Theology&lt;/i&gt; as Moral Formation.&lt;p&gt;7 What the
            &lt;i&gt;Summa of Theology&lt;/i&gt; Teaches.&lt;p&gt;8 Philosophy
            in a &lt;i&gt;Summa of Theology.&lt;/i&gt;&lt;p&gt;9 Writing Secrets
            in a &lt;i&gt;Summa of Theology.&lt;/i&gt;&lt;p&gt;Conclusion:
            Writing Theology after Thomas -- and His Readers.&lt;p&gt;Index.</d104>
        </othertext>
        <othertext>
          <d102>07</d102>
          <d103>02</d103>
          <d104>&amp;#8220;Mark Jordan's recent book, &lt;i&gt;Rewritten
            Theology,&lt;/i&gt; challenges the way in which the achievement of Thomas Aquinas
            has been both received and reformulated, often in order to serve particular theological and
            philosophical ends.&amp;#8221; (&lt;i&gt;American Catholic Philosophical
            Quarterly&lt;/i&gt;, May 2009) </d104>
        </othertext>
        <othertext>
          <d102>13</d102>
          <d103>02</d103>
          <d104>&lt;b&gt;Mark D. Jordan&lt;/b&gt; is the Asa Griggs Candler Professor in
            the Department of Religion at Emory University. He is the author of
            &lt;i&gt;Ordering Wisdom: The Hierarchy of Philosophical Discourse in
            Aquinas&lt;/i&gt; (1986), &lt;i&gt;The Invention of Sodomy in Christian
            Theology&lt;/i&gt; (1997), &lt;i&gt;The Silence of Sodom: Homosexuality in
            Modern Catholicism&lt;/i&gt; (2000) and &lt;i&gt;The Ethics of
            Sex&lt;/i&gt; (Blackwell, 2001).</d104>
        </othertext>
        <othertext>
          <d102>18</d102>
          <d103>02</d103>
          <d104>Recent years have seen numerous appropriations of Thomas Aquinas&amp;#8217;s work by
            a range of theologians, from liberal Catholics to the creators of radical orthodoxy.
            Responding to this upsurge of interest, this book goes straight to the heart of the
            contemporary debates about Thomism. &lt;br&gt;&lt;p&gt;Author Mark Jordan
            focuses on the concept of authority, both in terms of Aquinas&amp;#8217;s own attitude
            to authority and how the Church authorities have used Aquinas to shore up their own
            position. He shows how to read Aquinas from, into and against theological authorities, and
            argues for future readings of Thomas which are substantially different from those which have
            gone before.</d104>
        </othertext>
        <mediafile>
          <f114>04</f114>
          <f115>03</f115>
          <f116>01</f116>
          <f117>http://catalogimages.wiley.com/images/db/jimages/9781405112208.jpg</f117>
        </mediafile>
        <productwebsite>
          <b367>02</b367>
          <f123>http://www.wiley.com/remtitle.cgi?isbn=1405112204</f123>
        </productwebsite>
        <imprint>
          <b241>02</b241>
          <b242>Wiley Imprint Code List</b242>
          <b243>WB</b243>
          <b079>Wiley-Blackwell</b079>
        </imprint>
        <publisher>
          <b291>01</b291>
          <b081>John Wiley &amp; Sons</b081>
          <website>
            <b367>18</b367>
            <b295>http://www.wiley.com</b295>
          </website>
        </publisher>
        <b083>GB</b083>
        <b394>04</b394>
        <b003>20051227</b003>
        <b087>2005</b087>
        <salesrights>
          <b089>01</b089>
          <b388>WORLD</b388>
        </salesrights>
        <measure>
          <c093>01</c093>
          <c094>236.20</c094>
          <c095>mm</c095>
        </measure>
        <measure>
          <c093>02</c093>
          <c094>157.4</c094>
          <c095>mm</c095>
        </measure>
        <measure>
          <c093>03</c093>
          <c094>17.2</c094>
          <c095>mm</c095>
        </measure>
        <measure>
          <c093>08</c093>
          <c094>16</c094>
          <c095>oz</c095>
        </measure>
        <relatedproduct>
          <h208>27</h208>
          <productidentifier>
            <b221>15</b221>
            <b244>9780470775387</b244>
          </productidentifier>
          <b012>DG</b012>
          <b211>004</b211>
          <b213>Acrobat Ebook Reader</b213>
          <b214>02</b214>
        </relatedproduct>
        <relatedproduct>
          <h208>27</h208>
          <productidentifier>
            <b221>15</b221>
            <b244>9780470797020</b244>
          </productidentifier>
          <b012>DG</b012>
          <b211>022</b211>
          <b213>Mobi</b213>
          <b214>11</b214>
        </relatedproduct>
        <relatedproduct>
          <h208>27</h208>
          <productidentifier>
            <b221>15</b221>
            <b244>9780470773796</b244>
          </productidentifier>
          <b012>DH</b012>
          <b014>Interscience Book online</b014>
        </relatedproduct>
        <supplydetail>
          <j136>2002272</j136>
          <j137>John Wiley &amp; Sons</j137>
          <j270>800-225-5945</j270>
          <j271>732-302-2370</j271>
          <j272>custserv@wiley.com</j272>
          <j268>02</j268>
          <j269>C</j269>
          <j141>IP</j141>
          <j396>21</j396>
          <j145>20</j145>
          <price>
            <j148>01</j148>
            <discountcoded>
              <j363>02</j363>
              <j364>P</j364>
            </discountcoded>
            <j151>99.95</j151>
            <j152>USD</j152>
            <b251>US</b251>
          </price>
        </supplydetail>
      </product>
     * 
     */
    define('MAGENTO', realpath('/var/www/magento'));
    ini_set('memory_limit', '128M');
     
    require_once MAGENTO . '/app/Mage.php';
     
    Mage::app();
     
    $reader = new XMLReader();
    $reader->open("wiley_incremental_jun24-jul01_2009.xml");
    while ($reader->read()) {
        switch ($reader->nodeType) {
            case (XMLREADER::ELEMENT):
            if ($reader->localName == "product") {
                    $node = $reader->expand();
                    $dom = new DomDocument();
                    $n = $dom->importNode($node,true);
                    $dom->appendChild($n);
                    $sxe = simplexml_import_dom($n);
                    $sku = $sxe->productidentifier->b244;
                    $price =  $sxe->supplydetail->price->j151;
                    //etc - parse the required fields
                    //now create the product in magento
    				$product = Mage::getModel('catalog/product');
    				$product->setTypeId('simple');
    				$product->setTaxClassId(0); //none
    				$product->setWebsiteIds(array(1));  // store id
    				$product->setAttributeSetId(4);
    				$product->setSku($sku);
    				$product->setName($name);
    				$product->setAuthor($author);
    				$product->setDescription($description);
    				$product->setInDepth($description);  //description
    				$product->setPrice($price);
    				$product->setWeight($weight);
    				$product->setStatus(1);
    				$product->setVisibilty(4);
    				$product->setMetaDescription($description); //255 character limit
    				$product->setMetaTitle($title);
    				$productSaved=false;
    				try{
    				     $product->save();
    				     $productSaved=true;
    				}
    				catch (Exception $e){
    				     echo "$sku not added\n";
    				}
     
    				if ($productSaved){		
    				    $stockItem = Mage::getModel('cataloginventory/stock_item');
    				    $stockItem->loadByProduct($product->getId());
    				    //var_dump($stockItem);
     
    				    if (!$stockItem->getId()) {
    				    $stockItem->setProductId($product->getId())->setStockId(1);
    				    }
    				    $stockItem->setData('qty', 0);
    				    $stockItem->setData('is_in_stock', 1);
     
    				    $stockItem->save();
    				}
     
            }
        }
    }
     
     
     
     
     
    ?>
  3. SG1 says

    Juice, thanks for the example, I just tried it and it’s working perfectly, but I’ve a little problem, even if I change the value for qty I allways get 0 :( , any clue?. thansk in advance.

  4. juice says

    I’m not using Magneto to manage stock, so I haven’t had any need to try to set the qty to anything other than 0. But if you have configured Magento to manage stock through its admin interface, you may be able to try:

    $stockItem->setQty(100)



Some HTML is OK

or, reply to this post via trackback.

Spam Protection by WP-SpamFree