Suggested XML Libraries for Lua

Lately I’ve been writing code in Lua that extracts data from XML documents, most notably for the game I am working on. The Lua Users Wiki has information about XML that you will find useful. Today I want to talk about the two libraries I have used for dealing with XML in Lua software.

Disclaimer

Both libraries I will discuss allow you to create XML documents by constructing them in memory as a collection of node ‘objects’ and then converting all of that to a string. They also allow you to modify existing XML documents in this fashion. I have never used either library for this purpose. Therefore, my opinions are strictly concerned with the quality of the libraries in terms of how they let you read XML data.

LuaXML

LuaXML by Gerald Franz comes as a combination of C and Lua. The C code provides the XML parser; it is short, fits comfortably in one file, and compiles easily on major operating systems. However, as is common with hand-rolled XML parsers, LuaXML’s lacks the thorough standards-compliance and features of mature XML libraries such as Expat—which has Lua bindings by the way.

LuaXML provides an easy-to-learn and use Lua interface. Let’s use this XML snippet for an example:

<!-- offset.xml -->
<offset>
    <x>10</x>
    <y>-20</y>
</offset>

We could use the following Lua code to create a table with these two values.

require("LuaXml")

-- This loads and parses the contents of the file, making 'document'
-- an XML object we can use to access the data within.
local document = xml.load("offset.xml")

-- The find() method is the most useful way to select a tag.
local offset = document:find("offset")

-- Now we can pull the numbers out of <x> and <y> tags and create a 2D
-- vector as a table containing those values.
local vector = {}
local x = offset:find("x")
local y = offset:find("y")

if x then vector.x = tonumber(x[1]) or 0 end
if y then vector.y = tonumber(y[1]) or 0 end

Notice how we access the textual contents of each tag via an array reference, e.g. x[1]. LuaXML provides nothing like an x:text() method, although it would not be difficult to implement ourselves. The example above contains no tag attributes; in LuaXML we access those as if they were named properties of a tag. So for example, if foo represents the node <foo label="bar"/> then we write foo.label to get the string "bar".

I will sometimes patch my own utility methods into LuaXML by modifying its xml table. For example, here is one I use when extracting a lot of numbers from tags like in the example above.

function xml.getNumber(node, tag)
    local child = node:find(tag)

    if child then
        return tonumber(child[1])
    end

    return nil
end

-- Now I can write this.
if x then vector.x = offset:getNumber("x") or 0 end
if y then vector.y = offset:getNumber("y") or 0 end

I hope this demonstrates how LuaXML is simple to use. For a while it was my preferred choice for working with XML in Lua projects. However, LuaXML has one issue that deserves mentioning. Support for Lua 5.2 did not appear until just last month and I have been unable to get LuaXML to work with 5.2. I was using LuaXML primarily with LÖVE and it uses Lua 5.1. I suspect more readers will be using 5.2 and so that could be a deal-breaker, assuming it is not merely an isolated problem on my end.

Simple-XML

Cluain Krystian Szczęsny—creator of the game Archer—does not offer an official name for his Lua XML library. So I refer to it as ‘Simple-XML’. It is an improved version of the XML module in the Corona SDK.

Let’s begin by using rewriting the first example from the article so that it uses Simple-XML:

local xml = require("xmlSimple").newParser()
local document = xml:loadFile("offset.xml")
local vector = {
    x = tonumber(document.offset.x:value()) or 0,
    y = tonumber(document.offset.y:value()) or 0,
}

Unlike LuaXML, Simple-XML allows us to navigate an XML document tree using the names of each node along the way, e.g. document.offset.x leads us to the <x> tag which is a child of <offset>. This eliminates the necessity of anything like LuaXML’s find() method.

In LuaXML we accessed the text content of the tags by using an array reference. In contrast, Simple-XML gives us the value() method which we must use to get that content. The method is a nice complement to Simple-XML’s name(), a method that gives us the name of a tag as a string. For example:

print(document.offset.x:name())    -- "x"
print(document.offset.x:value())   -- "10"

What if we did not know how many child tags there were under the <offset> tag? Maybe we later introduce a <z> tag to create some 3D data. Simple-XML gives us the choice to iterate over the children of any given tag if we want, and so we could replace our definition of vector in the previous example with this:

local vector = {}

for _,child in ipairs(document.offset:children()) do
    vector[child:name()] = tonumber(child:value())
end

Now let’s change our XML data to this:

<offset>
    <x type="absolute">10</x>
    <y type="relative">-20</y>
    <z type="absolute">5</z>
</offset>

This time we want to print the offset coordinates that we marked as ‘absolute’. The following code does the trick, demonstrating one way to access tag attributes using Simple-XML:

local vector = {}

for _,child in ipairs(document.offset:children()) do
    if child["@type"] == "absolute" then
        vector[child:name()] = tonumber(child:value())
    end
end

print(vector.x, vector.y, vector.z)

-- The above prints:
--
--    10    nil    5
--

We access a tag’s attributes the same way we access its child nodes, except to differentiate between attribute and tag names we prefix the @ character to the former. And just as we can use children() to access and iterate through all child nodes, we may call properties() on any tag to obtain a table of its attributes to iterate through.

An Important Bug

Consider this code:

local xml = require("xmlSimple").newParser()

If we have a string of XML we can parse it into a document object by writing local document = xml:ParseXmlText([[ … ]]) But you saw that I used loadFile() in the example above. That method assumes Lua has access to the system module provided by the Corona SDK, since Simple-XML is based on Corona’s XML module. If you are not using Corona then you should rewrite loadFile() like so:

function XmlParser:loadFile(xmlFilename, base)
    local hFile, err = io.open(xmlFilename, "r");

    if hFile and not err then
        local xmlText = hFile:read("*a"); -- read file content
        io.close(hFile);
        return self:ParseXmlText(xmlText), nil;
    else
        print(err)
        return nil
    end
end

You will find this inside the xmlSimple.lua file. I will not claim that this is the best way to hack around this problem. But it works for me.

In Conclusion

I believe both LuaXML and Simple-XML offer useful functionality for working with XML data in Lua programs. I have come to prefer Simple-XML because I believe it has a better, more thorough API. Simple-XML lets you access individual tags or attributes by name, and it lets you loop through them all if that better serves your purpose. LuaXML is less flexible in that regard. I suspect a lot of you will have to delete the Corona-specific use of system from XmlParser:loadFile(), but that issue does not hurt the value of Simple-XML too badly in my opinion.

Simple-XML is the library I would recommend. That said, I have not yet tried every available XML solution for Lua. So if you have one you prefer then please share in the comments.

Advertisements

Add Your Thoughts

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s