about photos bookshelf portfolio blog home
Begin main content

Convert XML to SQL

I thought it might be worth crossposting my reply to a recent perlmonks question asking for a generalised XML to SQL conversion.

You can transform your xml into sql using xslt. Stealing some bits of code from elsewhere on the net, I can use the following xslt to convert my del.icio.us bookmarks into a set of insert statements:

<?xml version="1.0"?>
<!DOCTYPE rdf:RDF [
    <!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
    <!ENTITY rss 'http://purl.org/rss/1.0/'>
]>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:rdf="&rdf;"
                xmlns:rss="&rss;">
  <xsl:output method="text"/>

  <!-- main template -->

  <xsl:template match="/rdf:RDF">
      <xsl:apply-templates select="rss:item" />
  </xsl:template>

  <xsl:template match="rss:item">
    insert into links (url, title)
    values (
    '<xsl:call-template name="sql-escape"><xsl:with-param name="text" select="rss:title"/></xsl:call-template>',
    '<xsl:call-template name="sql-escape"><xsl:with-param name="text" select="rss:link"/></xsl:call-template>'
    );
  </xsl:template>

  <!-- utility functions -->

  <xsl:template name="sql-escape">
    <xsl:param name="text"/>
    <xsl:variable name="tmp">
      <xsl:call-template name="replace-substring">
        <xsl:with-param name="from">'</xsl:with-param>
        <xsl:with-param name="to">''</xsl:with-param>
        <xsl:with-param name="value" select="$text"/>
      </xsl:call-template>
    </xsl:variable>
    <xsl:value-of select="$tmp"/>
  </xsl:template>

  <xsl:template name="replace-substring">
    <xsl:param name="value" />
    <xsl:param name="from" />
    <xsl:param name="to" />
    <xsl:choose>
      <xsl:when test="contains($value,$from)">
        <xsl:value-of select="substring-before($value,$from)" />
        <xsl:value-of select="$to" />
        <xsl:call-template name="replace-substring">
          <xsl:with-param name="value" select="substring-after($value,$from)" />
          <xsl:with-param name="from" select="$from" />
          <xsl:with-param name="to" select="$to" />
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="$value" />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
  
</xsl:stylesheet>
If I save that to a file called del2sql.xsl I can insert the links in the rss response like so:

wget -O - http://feeds.delicious.com/rss/aufflick | xsltproc del2sql.xsl - | psql links_db
Pretty neat huh! Unfortunately xslt is even less decipherable than golf perl, but thems the breaks.

The command pipeline above uses the postgres psql command, but you can just as easily use this approach with mysql, oracle, sybase, sql server or whatever. Most database vendors/projects these days have some sort of xml functionality built in, but this puts all the processing load on the part of your system that is most difficult to scale. Better, I think, to do as much processing as possible on your application server which you can more easily scale.

You might also be better using xslt to convert to CSV and then using some simple script to slurp the csv into your database - you could then handle insert/update errors gracefully in your script.

The utility sql-escape and replace-substring templates can be placed in a common file and included into your xsl stylesheets via an xsl:import statement. If you have an xsl 2 compatible xslt engine you can use the x:replace() function directly instead of the recursive replace-substring template.

11:41 PM, 03 Mar 2008 by Mark Aufflick Permalink | Comments (0)

XML

Blog Categories

software (4)
  ..heads up 'tunes (4)

Notifications

Icon of Envelope Request notifications

Syndication Feed

XML

Recent Comments

  1. Mark Aufflick: all good ideas
  2. Unregistered Visitor: Excellent!
  3. Mark Aufflick: Hey thanks
  4. Unregistered Visitor: Fantastic entry
  5. Mark Aufflick: Bah - dashboard widgets
  6. Unregistered Visitor: Nice
  7. Mark Aufflick: elegant maths (as opposed to elegant rabbit)
  8. Unregistered Visitor: Does that really matter?
  9. Mark Aufflick: Inspiration
  10. Unregistered Visitor: Perhaps...