Begin main content

Convert XML to SQL

I thought it might be worth crossposting my reply to a recent perlmonks question asking for a generalised XML to SQL conversion.

You can transform your xml into sql using xslt. Stealing some bits of code from elsewhere on the net, I can use the following xslt to convert my del.icio.us bookmarks into a set of insert statements:

<?xmlversion="1.0"?>
<!DOCTYPE rdf:RDF [
    <!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
    <!ENTITY rss 'http://purl.org/rss/1.0/'>
]>
<xsl:stylesheetversion="1.0"xmlns:xsl="http://www.w3.org/1999/XSL/Transform"xmlns:rdf="&rdf;"xmlns:rss="&rss;">
  <xsl:outputmethod="text"/>

  <!-- main template -->

  <xsl:templatematch="/rdf:RDF">
      <xsl:apply-templatesselect="rss:item" />
  </xsl:template>

  <xsl:templatematch="rss:item">
    insert into links (url, title)
    values (
    '<xsl:call-templatename="sql-escape"><xsl:with-paramname="text"select="rss:title"/></xsl:call-template>',
    '<xsl:call-templatename="sql-escape"><xsl:with-paramname="text"select="rss:link"/></xsl:call-template>'
    );
  </xsl:template>

  <!-- utility functions -->

  <xsl:templatename="sql-escape">
    <xsl:paramname="text"/>
    <xsl:variablename="tmp">
      <xsl:call-templatename="replace-substring">
        <xsl:with-paramname="from">'</xsl:with-param>
        <xsl:with-paramname="to">''</xsl:with-param>
        <xsl:with-paramname="value"select="$text"/>
      </xsl:call-template>
    </xsl:variable>
    <xsl:value-ofselect="$tmp"/>
  </xsl:template>

  <xsl:templatename="replace-substring">
    <xsl:paramname="value" />
    <xsl:paramname="from" />
    <xsl:paramname="to" />
    <xsl:choose>
      <xsl:whentest="contains($value,$from)">
        <xsl:value-ofselect="substring-before($value,$from)" />
        <xsl:value-ofselect="$to" />
        <xsl:call-templatename="replace-substring">
          <xsl:with-paramname="value"select="substring-after($value,$from)" />
          <xsl:with-paramname="from"select="$from" />
          <xsl:with-paramname="to"select="$to" />
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-ofselect="$value" />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
  
</xsl:stylesheet>
If I save that to a file called del2sql.xsl I can insert the links in the rss response like so:

wget -O - http://feeds.delicious.com/rss/aufflick | xsltproc del2sql.xsl - | psql links_db
Pretty neat huh! Unfortunately xslt is even less decipherable than golf perl, but thems the breaks.

The command pipeline above uses the postgres psql command, but you can just as easily use this approach with mysql, oracle, sybase, sql server or whatever. Most database vendors/projects these days have some sort of xml functionality built in, but this puts all the processing load on the part of your system that is most difficult to scale. Better, I think, to do as much processing as possible on your application server which you can more easily scale.

You might also be better using xslt to convert to CSV and then using some simple script to slurp the csv into your database - you could then handle insert/update errors gracefully in your script.

The utility sql-escape and replace-substring templates can be placed in a common file and included into your xsl stylesheets via an xsl:import statement. If you have an xsl 2 compatible xslt engine you can use the x:replace() function directly instead of the recursive replace-substring template.

07:41 AM, 03 Mar 2008 by Mark Aufflick Permalink | Comments (0)

XML

Blog Categories

software (40)
..cocoa (21)
  ..heads up 'tunes (5)
..ruby (6)
..lisp (4)
..perl (4)
..openacs (1)
mac (21)
embedded (2)
..microprocessor (2)
  ..avr (1)
electronics (3)
design (1)
photography (26)
..black and white (6)
..A day in Sydney (18)
..The Daily Shoot (6)
food (2)
Book Review (2)

Notifications

Icon of envelope Request notifications

Syndication Feed

XML

Recent Comments

  1. Unregistered Visitor: mmh
  2. Mark Aufflick: Thank you
  3. Unregistered Visitor: Filenames with hyphens
  4. Unregistered Visitor: normal
  5. Unregistered Visitor: mel kaye that died in 2011
  6. Unregistered Visitor: Contacts cats vs. email cats
  7. Mark Aufflick: Thanks for the update
  8. Unregistered Visitor: Correction...
  9. Unregistered Visitor: Update on Mel...
  10. Unregistered Visitor: Error