Skip navigation.
KDE Developer's Journals

Thought Experiment: XML integrated into a C-like language

tjansen's picture

In recent days I made the following thought experiment: how can XML processing be made easier by integrating XML support into a Java/C#-like programming language.

I created the code snippet below to try out what such a language could look like. The syntax of this theoretical language:

  • Adds two hybrid-base types called Node and NodeList to the language. Hybrid means that they are Objects like java.lang.String in Java, but have their own literals and operators.
  • Node is similar to a DOM node, but uses the XPath data model (no DTD/Doctype, no entities, no CData sections, everything normalized)
  • Using the index ([]) operator an XPath expression can be executed on a Node, the result is a NodeList
  • A node has the operators += (add as a child), + (create a node list of the two nodes), -= (remove node from children) and << (replace the node)
  • A NodeList is a list of references to nodes. It has operators like +, += (append a node list) and << (replace all nodes)
  • A normal XML node literal is contained in [[ ]] brackets. To avoid uneccessary escaping, you can use more than two brackets, e.g. [[[[ <element/> ]]]].
  • A perl-string-like XML node expression that allows the insertion of base types is enclosed in single brackets [ ]. This would be a simple node with content: [ <text>Blabla ${somevariable} $anothervariable</text> ] . Variables can be Nodes, NodeLists, Strings, numbers..
  • You can cast any Node to NodeList. NodeLists can be casted to Node, but when the list has more than one member it throws an exception
  • Nodes can be implicitly casted to Strings
  • Strings can be implicitly casted to (text) nodes
  • the keyword prefix is used to define a XML namespace prefix to be used in XML node literals and XPath expressions. It can be used in all places you can declare a const variable, and has the same scoping rules

The example assumes that you are familar with XPath. Dont expect the code to be really useful, it's just to get a feel for the syntax. I think I could get used to something like this...

class Test {
        prefix ageext "urn:mascot-age-extension"; 

        static void main() {
                Node mascots = [[
        <mascotList>
                <mascot>
                        <name>Tux</name>
                        <species>Penguin</species>
                        <project>Linux</project>
                        <ageext:age>8</ageext:age>
                </mascot>
                <mascot>
                        <name>Konqi</name>
                        <species>Dragon</species>
                        <project>KDE</project>
                        <ageext:age>3</ageext:age>
                </mascot>
        </mascotList> 
]];

                workWithMascots(mascots, 4);
        }

        void workWithMascots(Node mascots, int mimimumAge) {
                mascots[/mascotList/mascot[ageext:age < $minimumAge]] << minimumAge;

                NodeList n = mascots[/mascotList/mascot];
                foreach Node i in n {
                        Node summary = 
[
<summary>${i[name]} is a ${i[species]} and the mascot of ${i[project]}</summary>
];
                        i += summary;
                }
        
                // print all mascots
                int num = 0;
                foreach Node i in n {
                        num++; 
                        Console.println([Mascot Number $num: ${i[summary]}]);
                }
        }
};

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
panzi's picture

cool but

I'm not able to direct post my comment here, so see: http://nopaste.php-q.net/30078

tjansen's picture

Wouldn't it be better to use

Wouldn't it be better to use a xml like syntax like this:

Node mascots = <xml>
        <mascotList>
        ...
        </mascotList> 
</xml>;


That was my original idea, but as you said, parsing it is hell because of the less/greater operators. In a variable declaration it is not such a big problem, but in a complex expression (where '<' is still less than) it would be quite a mess and not possible without integrating XML itself into the language's grammar.



Or maybe we can do this all without sytax extensions, like the boost parster generator.



I have seen something like this for Ruby, but I think the syntax is horrible Smiling

cies's picture

Wow, very good idea's indeed...

title sais it all.

Rock on Tim!

kervel's picture

nice

nice idea indeed ... i'd love to see it in a scripting language like python, would allow for even faster prototyping..

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.