一般使用DOM扫描一个XML文档后会生成一个以节点树表示的文档,XML中的每一个元素、实体、PCData和Attribut都会生成一个节点,节点类型是实现了Node接口的类。参考代码如下:
public static void getScanner(String address) throws Exception{
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
//create a DocumentBuilderFactory
DocumentBuilder db = null;
//create a DocumentBuilder
try{
db=dbf.newDocumentBuilder();
} catch (ParserConfigurationException pce){
System.err.println(pce);
System.exit(1);
}
Document doc = null;
try{
doc=db.parse(new File(address));
} catch (SAXException se){
System.err.println(se.getMessage());
//to be replaced by the log method;
System.exit(1);
} catch (IOException ioe){
System.err.println(ioe);
//to be replaced by the log method;
System.exit(1);
}
//parse the input file
}
public static void printxml(Node n)
{
//recursive routine to print out Dom Tree nodes
int type = n.getNodeType();
switch (type){
case Node.DOCUMENT_NODE:
System.out.print("DOC:");
break;
case Node.DOCUMENT_TYPE_NODE:
System.out.print("DOC_TYPE:");
break;
case Node.ELEMENT_NODE:
System.out.print("ELEM:");
break;
case Node.TEXT_NODE:
System.out.print("TEXT:");
break;
default:
System.out.print("Other Node:" + type);
break;
}
System.out.print(" nodeName=\"" + n.getNodeName() + "\"");
String val = n.getNodeValue();
if(val!=null){
if (!(val.trim().equals(""))){
System.out.print("nodeValue \"" + n.getNodeValue() + "\"");
}
}
System.out.println();
//Print children if any
for (Node child = n.getFirstChild() ; child!=null ; child = child.getNextSibling() )
{
printxml(child);
}
}
比如我导入的xml文档是这样:
<?xml version="1.0"?>
<GenericProfileOfVideoCodecSettings xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Name>1P-Goodquality</Name>
<Test>test</Test>
</GenericProfileOfVideoCodecSettings>
生成的文档树就是这样的:
不过,从实际情况来看GenericProfileOfVideoCodecSettings的所有子节点每个都会带着一个空白的兄弟节点,这个"空白"其实就是xml代码页中两个元素之间的空格和换行。于是上面的输出就会带有很多的
TEXT: nodeName="#text"
没有评论:
发表评论