Large XML file splitter

The goal of this post is to present a simple way of splitting large XML files.

Did you ever have a large XML file and need to split it in smaller parts for different processing reason? I think yes! Looking on the internet, it's difficult to find the tool you need for your particular case (of course, there is always a case that nobody has tghought to.

The example here below will split the large XML file based on an element that you may configure. The number of xml-part files shall be specified too. If you want to have only 1000 elements per file, you may enter it.

Let's consider the following XML example:

<?xml version="1.0" encoding="utf-8"?>
<items date="xxxx-xx-xx">
	<item size="2">First item</item>
	<item size="3">Second item</item>
	...
	<item size="2">Nth item</item>
</items>

<codeclass="prettyprint lang-xml">

First itemSecond item ...Nth item  

Supposing you have 2 millions of "item" elements and you want to have smaller XML files each of 10 thousand elems, then you have a problem because you have 100 xml files to produce. That's the case we'll solve here.

The solution is presented in PowerShell just here:

 

$XmlDocPath = "D:\temp\big_file.xml"
$XmlDoc = [Xml](Get-Content $XmlDocPath)
$XmlNodes = $XmlDoc.SelectNodes("/items/item")
For ($Index = 0; $Index –lt $XmlNodes.Count;) {
$OutputPath = Join-Path "D:\temp\xml_parts" ("items.{0}-{1}.xml" -f $Index,"{0}")
$XmlOutput = [Xml] @"
"@
$XmlNodes | Select -First 1000 -Skip $Index | % {
$XmlOutput.SelectSingleNode("/cars").AppendChild($XmlOutput.ImportNode($_, $true)) | Out-Null
$Index += 1
}
$XmlOutput.Save($OutputPath -f ($Index - 1))
}

The line:

$OutputPath = Join-Path "D:\temp\xml_parts" ("items.{0}-{1}.xml" -f $Index,"{0}")

specifies a folder where to generate all the xml parts.

Feel free to ask questions via the contact form.

Good luck,

ArtifexSystem.com team

 

 

Image scaler by floating scale factor

 

This is a simple tool for scaling images. It can scale up or down images of different formats (including TIFF files). The scaling down is performed using the one of the following algorithms: average area or Lanczos.

The tool can be used as command line or using the exposed Java interface. This is a very simplified fork version of camera image processor published years ago.

The tool uses OpenCV version 3.x and it can be used with a  x86 or x64 Java virtual machine (feeding the associated openCV dll version). Linking the OpenCV 2.x versions won't work. The Java API is not backward compatible.

Here below a list of features to come:

1. Resizing to fixed frame.

2. Expose interface for choosing the scale algorithm.

Feel free to ask questions or new features via the contact form.

Good luck,

ArtifexSystem.com team