2008-11-05

DOM扫描XML后设节点树

  一般使用DOM扫描一个XML文档后会生成一个以节点树表示的文档,XML中的每一个元素、实体、PCData和Attribut都会生成一个节点,节点类型是实现了Node接口的类。参考代码如下:


public static void getScanner(String address) throws Exception{
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
//create a DocumentBuilderFactory

DocumentBuilder db = null;
//create a DocumentBuilder
try{
  db=dbf.newDocumentBuilder();
  } catch (ParserConfigurationException pce){
    System.err.println(pce);
    System.exit(1);
  }

Document doc = null;
try{
  doc=db.parse(new File(address));
  } catch (SAXException se){
    System.err.println(se.getMessage());
    //to be replaced by the log method;
    System.exit(1);
} catch (IOException ioe){
System.err.println(ioe);
//to be replaced by the log method;
System.exit(1);
}
//parse the input file
}

public static void printxml(Node n)
{
//recursive routine to print out Dom Tree nodes
int type = n.getNodeType();

switch (type){
case Node.DOCUMENT_NODE:
System.out.print("DOC:");
break;
case Node.DOCUMENT_TYPE_NODE:
System.out.print("DOC_TYPE:");
break;
 case Node.ELEMENT_NODE:
System.out.print("ELEM:");
break;
case Node.TEXT_NODE:
System.out.print("TEXT:");
break;
default:
System.out.print("Other Node:" + type);
break;
}

System.out.print(" nodeName=\"" + n.getNodeName() + "\"");

String val = n.getNodeValue();
if(val!=null){
if (!(val.trim().equals(""))){
System.out.print("nodeValue \"" + n.getNodeValue() + "\"");
}
}
System.out.println();


//Print children if any
for (Node child = n.getFirstChild() ; child!=null ; child = child.getNextSibling() )
{
  printxml(child);
  }
}


比如我导入的xml文档是这样:

<?xml version="1.0"?>

<GenericProfileOfVideoCodecSettings xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">

<Name>1P-Goodquality</Name>

<Test>test</Test>

</GenericProfileOfVideoCodecSettings>


生成的文档树就是这样的:

不过,从实际情况来看GenericProfileOfVideoCodecSettings的所有子节点每个都会带着一个空白的兄弟节点,这个"空白"其实就是xml代码页中两个元素之间的空格和换行。于是上面的输出就会带有很多的
TEXT: nodeName="#text"

没有评论: