XML Wizardry?

So I have this XML File, and I’d like to organize the nodes in it such as this

< license >
  < id >000000aabbcc< /id >
  < Key >qazxswedcvfrtgbnhyujm,blah. </key >
< /license >
< license >
  < id >001111aabbcc< /id >
  < Key >poiuytrdcvfrtgbnhyujm,kmeh. </key >
< /license >
< license >
  < id >000000112233< /id >
  < Key >98ujhy65cvfrtgbnhyujm,pheh. </key >
< /license >

except there are 1500 License nodes or so.

I’d like to sort it by the < id > tag, but I’m coming up blank on how to do this.

Thoughts, Suggestions, Liquor recommendations?

2 Likes

a naive google search suggests this python solution

https://effbot.org/zone/element-sort.htm

3 Likes

whatever you do, don’t convert it to a spreadsheet. could lose entire hours in excel playing with it.

2 Likes

I don’t know, I’ve had good look fixing things like this in Excel. For instance, if the data is genuinely as regular as the sample indicates:

  1. Copy and paste the whole thing into the first column.
  2. Create a second column that repeats the first if it is an id value, and repeats the value above it otherwise. For instance cell B2 would be something like =IF("<id>"=LEFT(TRIM(A2),4),A2,B1).
  3. Create a third column that just repeats the numbers 1 through 4. For instance cell C2 would be something like =IF(C1=4,1,C1+1).
  4. Change the second and third columns from formulae into values, and then sort by them.
  5. Probably clean up the top and bottom because I didn’t put that much thought into this.
2 Likes

Build a hash table of id to key mappings, sort, and rebuild the full XML from that (assuming the schema is really that simple)

2 Likes

{sheepishly looks at the floor}

Look, when all you have is a hammer Excel, everything looks like a nail Workbook.

@chenille
I think I’d started thinking along these lines after I posted and moved on to other things.

@jerwin
That looks promising as it really is a dead simple XML just one more element/ License node, so it should work.

Thanks all…if I survive to try this, I’ll report back.

3 Likes

As much as I despise it, XSLT might be the answer.
I modified your document slightly to give it a root element:

<licenses>
	<license>
		<id>000000aabbcc</id>
		<key>qazxswedcvfrtgbnhyujm,blah.</key>
	</license>
	<license>
		<id>001111aabbcc</id>
		<key>poiuytrdcvfrtgbnhyujm,kmeh.</key>
	</license>
	<license>
		<id>000000112233</id>
		<key>98ujhy65cvfrtgbnhyujm,pheh.</key>
	</license>
</licenses>

Here’s an XSLT file that just copies the input and sorts by the id.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
  <xsl:template match="licenses">
    <xsl:copy>
      <xsl:apply-templates>
          <xsl:sort select="id"/>
         </xsl:apply-templates>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="*">
    <xsl:copy>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

Adapted from https://www.xml.com/pub/a/2002/07/03/transform.html

If you want to test this out, you can try it at http://www.xslfiddle.net/. Unfortunately, there doesn’t seem to be an actual counterpart to JSFiddle for XSLT, but it works well enough.

8 Likes

That’s the sort of thing I do as one-off scripts pretty often. Here’s a simple PHP version. It assumes input is valid and simple as described, and that a plain ascii sort is what’s wanted.

input.xml

<?xml version="1.0" encoding="UTF-8"?>
<licenses>
  <license>
    <id>000000aabbcc</id>
    <key>qazxswedcvfrtgbnhyujm,blah.</key>
  </license>
  <license>
    <id>001111aabbcc</id>
    <key>poiuytrdcvfrtgbnhyujm,kmeh.</key>
  </license>
  <license>
    <id>000000112233</id>
    <key>98ujhy65cvfrtgbnhyujm,pheh.</key>
  </license>
</licenses>

xmlsort.php

<?php
$input = simplexml_load_file('input.xml');
$licenses = [];
foreach ($input->children() as $license) {
  $licenses[(string)$license->id] = (string)$license->key;
}
ksort($licenses);
$output = new SimpleXMLElement('<?xml version="1.0" encoding="UTF-8"?><licenses></licenses>');
foreach ($licenses as $id => $key) {
  $license = $output->addChild('license');
  $license->addChild('id', $id);
  $license->addChild('key', $key);
}
$dom = dom_import_simplexml($output)->ownerDocument;
$dom->formatOutput = true;
$dom->save('output.xml');

gives output.xml

<?xml version="1.0" encoding="UTF-8"?>
<licenses>
  <license>
    <id>000000112233</id>
    <key>98ujhy65cvfrtgbnhyujm,pheh.</key>
  </license>
  <license>
    <id>000000aabbcc</id>
    <key>qazxswedcvfrtgbnhyujm,blah.</key>
  </license>
  <license>
    <id>001111aabbcc</id>
    <key>poiuytrdcvfrtgbnhyujm,kmeh.</key>
  </license>
</licenses>
4 Likes