XML::DOM

XML::DOM(3)           User Contributed Perl Documentation          XML::DOM(3)



NAME
       XML::DOM - A perl module for building DOM Level 1 compliant document
       structures

SYNOPSIS
        use XML::DOM;

        my $parser = new XML::DOM::Parser;
        my $doc = $parser->parsefile ("file.xml");

        # print all HREF attributes of all CODEBASE elements
        my $nodes = $doc->getElementsByTagName ("CODEBASE");
        my $n = $nodes->getLength;

        for (my $i = 0; $i < $n; $i++)
        {
            my $node = $nodes->item ($i);
            my $href = $node->getAttributeNode ("HREF");
            print $href->getValue . "\n";
        }

        # Print doc file
        $doc->printToFile ("out.xml");

        # Print to string
        print $doc->toString;

        # Avoid memory leaks - cleanup circular references for garbage collection
        $doc->dispose;

DESCRIPTION
       This module extends the XML::Parser module by Clark Cooper.  The
       XML::Parser module is built on top of XML::Parser::Expat, which is a
       lower level interface to James Clark's expat library.

       XML::DOM::Parser is derived from XML::Parser. It parses XML strings or
       files and builds a data structure that conforms to the API of the Docu-
       ment Object Model as described at http://www.w3.org/TR/REC-DOM-Level-1.
       See the XML::Parser manpage for other available features of the
       XML::DOM::Parser class.  Note that the 'Style' property should not be
       used (it is set internally.)

       The XML::Parser NoExpand option is more or less supported, in that it
       will generate EntityReference objects whenever an entity reference is
       encountered in character data. I'm not sure how useful this is. Any
       comments are welcome.

       As described in the synopsis, when you create an XML::DOM::Parser
       object, the parse and parsefile methods create an XML::DOM::Document
       object from the specified input. This Document object can then be exam-
       ined, modified and written back out to a file or converted to a string.

       When using XML::DOM with XML::Parser version 2.19 and up, setting the
       XML::DOM::Parser option KeepCDATA to 1 will store CDATASections in
       CDATASection nodes, instead of converting them to Text nodes.  Subse-
       quent CDATASection nodes will be merged into one. Let me know if this
       is a problem.

       When using XML::Parser 2.27 and above, you can suppress expansion of
       parameter entity references (e.g. %pent;) in the DTD, by setting
       ParseParamEnt to 1 and ExpandParamEnt to 0. See Hidden Nodes for
       details.

       A Document has a tree structure consisting of Node objects. A Node may
       contain other nodes, depending on its type.  A Document may have Ele-
       ment, Text, Comment, and CDATASection nodes.  Element nodes may have
       Attr, Element, Text, Comment, and CDATASection nodes.  The other nodes
       may not have any child nodes.

       This module adds several node types that are not part of the DOM spec
       (yet.)  These are: ElementDecl (for <!ELEMENT ...> declarations),
       AttlistDecl (for <!ATTLIST ...> declarations), XMLDecl (for <?xml ...?>
       declarations) and AttDef (for attribute definitions in an AttlistDecl.)

XML::DOM Classes
       The XML::DOM module stores XML documents in a tree structure with a
       root node of type XML::DOM::Document. Different nodes in tree represent
       different parts of the XML file. The DOM Level 1 Specification defines
       the following node types:

       * XML::DOM::Node - Super class of all node types
       * XML::DOM::Document - The root of the XML document
       * XML::DOM::DocumentType - Describes the document structure: <!DOCTYPE
       root [ ... ]>
       * XML::DOM::Element - An XML element: <elem attr="val"> ... </elem>
       * XML::DOM::Attr - An XML element attribute: name="value"
       * XML::DOM::CharacterData - Super class of Text, Comment and CDATASec-
       tion
       * XML::DOM::Text - Text in an XML element
       * XML::DOM::CDATASection - Escaped block of text: <![CDATA[ text ]]>
       * XML::DOM::Comment - An XML comment: <!-- comment -->
       * XML::DOM::EntityReference - Refers to an ENTITY: &ent; or %ent;
       * XML::DOM::Entity - An ENTITY definition: <!ENTITY ...>
       * XML::DOM::ProcessingInstruction - <?PI target>
       * XML::DOM::DocumentFragment - Lightweight node for cut & paste
       * XML::DOM::Notation - An NOTATION definition: <!NOTATION ...>

       In addition, the XML::DOM module contains the following nodes that are
       not part of the DOM Level 1 Specification:

       * XML::DOM::ElementDecl - Defines an element: <!ELEMENT ...>
       * XML::DOM::AttlistDecl - Defines one or more attributes in an
       <!ATTLIST ...>
       * XML::DOM::AttDef - Defines one attribute in an <!ATTLIST ...>
       * XML::DOM::XMLDecl - An XML declaration: <?xml version="1.0" ...>

       Other classes that are part of the DOM Level 1 Spec:

       * XML::DOM::Implementation - Provides information about this implemen-
       tation. Currently it doesn't do much.
       * XML::DOM::NodeList - Used internally to store a node's child nodes.
       Also returned by getElementsByTagName.
       * XML::DOM::NamedNodeMap - Used internally to store an element's
       attributes.

       Other classes that are not part of the DOM Level 1 Spec:

       * XML::DOM::Parser - An non-validating XML parser that creates
       XML::DOM::Documents
       * XML::DOM::ValParser - A validating XML parser that creates
       XML::DOM::Documents. It uses XML::Checker to check against the Docu-
       mentType (DTD)
       * XML::Handler::BuildDOM - A PerlSAX handler that creates
       XML::DOM::Documents.

XML::DOM package
       Constant definitions
           The following predefined constants indicate which type of node it
           is.

        UNKNOWN_NODE (0)                The node type is unknown (not part of DOM)

        ELEMENT_NODE (1)                The node is an Element.
        ATTRIBUTE_NODE (2)              The node is an Attr.
        TEXT_NODE (3)                   The node is a Text node.
        CDATA_SECTION_NODE (4)          The node is a CDATASection.
        ENTITY_REFERENCE_NODE (5)       The node is an EntityReference.
        ENTITY_NODE (6)                 The node is an Entity.
        PROCESSING_INSTRUCTION_NODE (7) The node is a ProcessingInstruction.
        COMMENT_NODE (8)                The node is a Comment.
        DOCUMENT_NODE (9)               The node is a Document.
        DOCUMENT_TYPE_NODE (10)         The node is a DocumentType.
        DOCUMENT_FRAGMENT_NODE (11)     The node is a DocumentFragment.
        NOTATION_NODE (12)              The node is a Notation.

        ELEMENT_DECL_NODE (13)          The node is an ElementDecl (not part of DOM)
        ATT_DEF_NODE (14)               The node is an AttDef (not part of DOM)
        XML_DECL_NODE (15)              The node is an XMLDecl (not part of DOM)
        ATTLIST_DECL_NODE (16)          The node is an AttlistDecl (not part of DOM)

        Usage:

          if ($node->getNodeType == ELEMENT_NODE)
          {
              print "It's an Element";
          }

       Not In DOM Spec: The DOM Spec does not mention UNKNOWN_NODE and, quite
       frankly, you should never encounter it. The last 4 node types were
       added to support the 4 added node classes.

       Global Variables


       $VERSION
           The variable $XML::DOM::VERSION contains the version number of this
           implementation, e.g. "1.43".

       METHODS

       These methods are not part of the DOM Level 1 Specification.

       getIgnoreReadOnly and ignoreReadOnly (readOnly)
           The DOM Level 1 Spec does not allow you to edit certain sections of
           the document, e.g. the DocumentType, so by default this implementa-
           tion throws DOMExceptions (i.e. NO_MODIFICATION_ALLOWED_ERR) when
           you try to edit a readonly node.  These readonly checks can be dis-
           abled by (temporarily) setting the global IgnoreReadOnly flag.

           The ignoreReadOnly method sets the global IgnoreReadOnly flag and
           returns its previous value. The getIgnoreReadOnly method simply
           returns its current value.

            my $oldIgnore = XML::DOM::ignoreReadOnly (1);
            eval {
            ... do whatever you want, catching any other exceptions ...
            };
            XML::DOM::ignoreReadOnly ($oldIgnore);     # restore previous value

           Another way to do it, using a local variable:

            { # start new scope
               local $XML::DOM::IgnoreReadOnly = 1;
               ... do whatever you want, don't worry about exceptions ...
            } # end of scope ($IgnoreReadOnly is set back to its previous value)

       isValidName (name)
           Whether the specified name is a valid "Name" as specified in the
           XML spec.  Characters with Unicode values > 127 are now also sup-
           ported.

       getAllowReservedNames and allowReservedNames (boolean)
           The first method returns whether reserved names are allowed.  The
           second takes a boolean argument and sets whether reserved names are
           allowed.  The initial value is 1 (i.e. allow reserved names.)

           The XML spec states that "Names" starting with (X|x)(M|m)(L|l) are
           reserved for future use. (Amusingly enough, the XML version of the
           XML spec (REC-xml-19980210.xml) breaks that very rule by defining
           an ENTITY with the name 'xmlpio'.)  A "Name" in this context means
           the Name token as found in the BNF rules in the XML spec.

           XML::DOM only checks for errors when you modify the DOM tree, not
           when the DOM tree is built by the XML::DOM::Parser.

       setTagCompression (funcref)
           There are 3 possible styles for printing empty Element tags:

           Style 0
                <empty/> or <empty attr="val"/>

               XML::DOM uses this style by default for all Elements.

           Style 1
                 <empty></empty> or <empty attr="val"></empty>

           Style 2
                 <empty /> or <empty attr="val" />

               This style is sometimes desired when using XHTML.  (Note the
               extra space before the slash "/") See
               <http://www.w3.org/TR/xhtml1> Appendix C for more details.

           By default XML::DOM compresses all empty Element tags (style 0.)
           You can control which style is used for a particular Element by
           calling XML::DOM::setTagCompression with a reference to a function
           that takes 2 arguments. The first is the tag name of the Element,
           the second is the XML::DOM::Element that is being printed.  The
           function should return 0, 1 or 2 to indicate which style should be
           used to print the empty tag. E.g.

            XML::DOM::setTagCompression (\&my_tag_compression);

            sub my_tag_compression
            {
               my ($tag, $elem) = @_;

               # Print empty br, hr and img tags like this: <br />
               return 2 if $tag =~ /^(br|hr|img)$/;

               # Print other empty tags like this: <empty></empty>
               return 1;
            }

IMPLEMENTATION DETAILS
       * Perl Mappings
           The value undef was used when the DOM Spec said null.

           The DOM Spec says: Applications must encode DOMString using UTF-16
           (defined in Appendix C.3 of [UNICODE] and Amendment 1 of
           [ISO-10646]).  In this implementation we use plain old Perl strings
           encoded in UTF-8 instead of UTF-16.

       * Text and CDATASection nodes
           The Expat parser expands EntityReferences and CDataSection sections
           to raw strings and does not indicate where it was found.  This
           implementation does therefore convert both to Text nodes at parse
           time.  CDATASection and EntityReference nodes that are added to an
           existing Document (by the user) will be preserved.

           Also, subsequent Text nodes are always merged at parse time. Text
           nodes that are added later can be merged with the normalize method.
           Consider using the addText method when adding Text nodes.

       * Printing and toString
           When printing (and converting an XML Document to a string) the
           strings have to encoded differently depending on where they occur.
           E.g. in a CDATASection all substrings are allowed except for "]]>".
           In regular text, certain characters are not allowed, e.g. ">" has
           to be converted to "&gt;".  These routines should be verified by
           someone who knows the details.

       * Quotes
           Certain sections in XML are quoted, like attribute values in an
           Element.  XML::Parser strips these quotes and the print methods in
           this implementation always uses double quotes, so when parsing and
           printing a document, single quotes may be converted to double
           quotes. The default value of an attribute definition (AttDef) in an
           AttlistDecl, however, will maintain its quotes.

       * AttlistDecl
           Attribute declarations for a certain Element are always merged into
           a single AttlistDecl object.

       * Comments
           Comments in the DOCTYPE section are not kept in the right place.
           They will become child nodes of the Document.

       * Hidden Nodes
           Previous versions of XML::DOM would expand parameter entity refer-
           ences (like %pent;), so when printing the DTD, it would print the
           contents of the external entity, instead of the parameter entity
           reference.  With this release (1.27), you can prevent this by set-
           ting the XML::DOM::Parser options ParseParamEnt => 1 and Expand-
           ParamEnt => 0.

           When it is parsing the contents of the external entities, it *DOES*
           still add the nodes to the DocumentType, but it marks these nodes
           by setting the 'Hidden' property. In addition, it adds an Enti-
           tyReference node to the DocumentType node.

           When printing the DocumentType node (or when using to_expat() or
           to_sax()), the 'Hidden' nodes are suppressed, so you will see the
           parameter entity reference instead of the contents of the external
           entities. See test case t/dom_extent.t for an example.

           The reason for adding the 'Hidden' nodes to the DocumentType node,
           is that the nodes may contain <!ENTITY> definitions that are refer-
           enced further in the document. (Simply not adding the nodes to the
           DocumentType could cause such entity references to be expanded
           incorrectly.)

           Note that you need XML::Parser 2.27 or higher for this to work cor-
           rectly.

SEE ALSO
       The Japanese version of this document by Takanori Kawai (Hippo2000) at
       <http://member.nifty.ne.jp/hippo2000/perltips/xml/dom.htm>

       The DOM Level 1 specification at <http://www.w3.org/TR/REC-DOM-Level-1>

       The XML spec (Extensible Markup Language 1.0) at
       <http://www.w3.org/TR/REC-xml>

       The XML::Parser and XML::Parser::Expat manual pages.

       XML::LibXML also provides a DOM Parser, and is significantly faster
       than XML::DOM, and is under active development.  It requires that you
       download the Gnome libxml library.

       XML::GDOME will provide the DOM Level 2 Core API, and should be as fast
       as XML::LibXML, but more robust, since it uses the memory management
       functions of libgdome.  For more details see
       <http://tjmather.com/xml-gdome/>

CAVEATS
       The method getElementsByTagName() does not return a "live" NodeList.
       Whether this is an actual caveat is debatable, but a few people on the
       www-dom mailing list seemed to think so. I haven't decided yet. It's a
       pain to implement, it slows things down and the benefits seem marginal.
       Let me know what you think.

AUTHOR
       Enno Derksen is the original author.

       Send patches to T.J. Mather at <tjmather@maxmind.com>.

       Paid support is available from directly from the maintainers of this
       package.  Please see <http://www.maxmind.com/app/opensourceservices>
       for more details.

       Thanks to Clark Cooper for his help with the initial version.



perl v5.8.6                       2002-02-08                       XML::DOM(3)