« Return to Thread: JDOM extract sentence correctly

JDOM extract sentence correctly

by perez :: Rate this Message:

Reply to Author | View in Thread

I have a doubt about the use of JDOM parsing a xml document. The outcome is not what I expect..

I did the next program to parse a xml document. I have considered that the root of the document id the element body

import org.jdom.*;
import org.jdom.input.SAXBuilder;
import org.jdom.output.*;

import java.io.File;
import java.io.IOException;
import java.util.*;

public class Ex04 {

public static void main(String[] args) {

String filename = "Test.xml";

SAXBuilder b = new SAXBuilder();

try {
Document doc = b.build(new File(filename));
Element root = doc.getRootElement();

Element body = root.getChild("body");
bodyExtract(body);
}
// indicates a well-formedness error
catch (JDOMException e) {
System.out.println(args[0] + " is not well-formed.");
System.out.println(e.getMessage());
}
catch (IOException e) {
System.out.println(e);
}

}

public static void bodyExtract(Element current) {

String aaa = current.getText();

List children = current.getChildren();

Iterator iterator = children.iterator();
while (iterator.hasNext()) {
Element child = (Element) iterator.next();
bodyExtract(child);
}
}
}

#######################################################################

Part of the original Test.xml file is:
...
<body>
The Linux is na open-source operating system, created by Linus Torvalds in the 80’s.
...

The output of the program above is:

The is an open-source operating system, created by in the 80’s.
Linux
Linus Torvalds


I want to analyze semantically the sentences. Thus I need that the output is something like this:

The Linux is an open-source operating system, created by Linus Torvalds
in the 80’s.
 

How can I solve this problem,

Thanx for your help

MP

 « Return to Thread: JDOM extract sentence correctly

LightInTheBox - Buy quality products at wholesale price!