I have a doubt about the use of JDOM parsing a xml document. The outcome is not what I expect..
I did the next program to parse a xml document. I have considered that the root of the document id the element body
import org.jdom.*;
import org.jdom.input.SAXBuilder;
import org.jdom.output.*;
import java.io.File;
import java.io.IOException;
import java.util.*;
public class Ex04 {
public static void main(String[] args) {
String filename = "Test.xml";
SAXBuilder b = new SAXBuilder();
try {
Document doc = b.build(new File(filename));
Element root = doc.getRootElement();
Element body = root.getChild("body");
bodyExtract(body);
}
// indicates a well-formedness error
catch (JDOMException e) {
System.out.println(args[0] + " is not well-formed.");
System.out.println(e.getMessage());
}
catch (IOException e) {
System.out.println(e);
}
}
public static void bodyExtract(Element current) {
String aaa = current.getText();
List children = current.getChildren();
Iterator iterator = children.iterator();
while (iterator.hasNext()) {
Element child = (Element) iterator.next();
bodyExtract(child);
}
}
}
#######################################################################
Part of the original Test.xml file is:
...
<body>
The
Linux is na open-source operating system, created by
Linus Torvalds in the 80’s.
...
The output of the program above is:
The is an open-source operating system, created by in the 80’s.
Linux
Linus TorvaldsI want to analyze semantically the sentences. Thus I need that the output is something like this:
The Linux is an open-source operating system, created by Linus Torvalds
in the 80’s.
How can I solve this problem,
Thanx for your help
MP