XML::Parser::PerlSAX
XML::Parser::PerlSAX(3User Contributed Perl DocumentatiXML::Parser::PerlSAX(3)
NAME
XML::Parser::PerlSAX - Perl SAX parser using XML::Parser
SYNOPSIS
use XML::Parser::PerlSAX;
$parser = XML::Parser::PerlSAX->new( [OPTIONS] );
$result = $parser->parse( [OPTIONS] );
$result = $parser->parse($string);
DESCRIPTION
"XML::Parser::PerlSAX" is a PerlSAX parser using the XML::Parser mod-
ule. This man page summarizes the specific options, handlers, and
properties supported by "XML::Parser::PerlSAX"; please refer to the
PerlSAX standard in `"PerlSAX.pod"' for general usage information.
METHODS
new Creates a new parser object. Default options for parsing,
described below, are passed as key-value pairs or as a single hash.
Options may be changed directly in the parser object unless stated
otherwise. Options passed to `"parse()"' override the default
options in the parser object for the duration of the parse.
parse
Parses a document. Options, described below, are passed as key-
value pairs or as a single hash. Options passed to `"parse()"'
override default options in the parser object.
location
Returns the location as a hash:
ColumnNumber The column number of the parse.
LineNumber The line number of the parse.
BytePosition The current byte position of the parse.
PublicId A string containing the public identifier, or undef
if none is available.
SystemId A string containing the system identifier, or undef
if none is available.
Base The current value of the base for resolving relative
URIs.
ALPHA WARNING: The `"SystemId"' and `"PublicId"' properties
returned are the system and public identifiers of the document
passed to `"parse()"', not the identifiers of the currently parsing
external entity. The column, line, and byte positions are of the
current entity being parsed.
OPTIONS
The following options are supported by "XML::Parser::PerlSAX":
Handler default handler to receive events
DocumentHandler handler to receive document events
DTDHandler handler to receive DTD events
ErrorHandler handler to receive error events
EntityResolver handler to resolve entities
Locale locale to provide localisation for errors
Source hash containing the input source for parsing
UseAttributeOrder set to true to provide AttributeOrder and Defaulted
properties in `start_element()'
If no handlers are provided then all events will be silently ignored,
except for `"fatal_error()"' which will cause a `"die()"' to be called
after calling `"end_document()"'.
If a single string argument is passed to the `"parse()"' method, it is
treated as if a `"Source"' option was given with a `"String"' parame-
ter.
The `"Source"' hash may contain the following parameters:
ByteStream The raw byte stream (file handle) containing the
document.
String A string containing the document.
SystemId The system identifier (URI) of the document.
PublicId The public identifier.
Encoding A string describing the character encoding.
If more than one of `"ByteStream"', `"String"', or `"SystemId"', then
preference is given first to `"ByteStream"', then `"String"', then
`"SystemId"'.
HANDLERS
The following handlers and properties are supported by
"XML::Parser::PerlSAX":
DocumentHandler methods
start_document
Receive notification of the beginning of a document.
No properties defined.
end_document
Receive notification of the end of a document.
No properties defined.
start_element
Receive notification of the beginning of an element.
Name The element type name.
Attributes A hash containing the attributes attached to the
element, if any.
The `"Attributes"' hash contains only string values.
If the `"UseAttributeOrder"' parser option is true, the follow-
ing properties are also passed to `"start_element"':
AttributeOrder An array of attribute names in the order they were
specified, followed by the defaulted attribute
names.
Defaulted The index number of the first defaulted attribute in
`AttributeOrder. If this index is equal to the
length of `AttributeOrder', there were no defaulted
values.
Note to "XML::Parser" users: `"Defaulted"' will be half the
value of "XML::Parser::Expat"'s `"specified_attr()"' function
because only attribute names are provided, not their values.
end_element
Receive notification of the end of an element.
Name The element type name.
characters
Receive notification of character data.
Data The characters from the XML document.
processing_instruction
Receive notification of a processing instruction.
Target The processing instruction target.
Data The processing instruction data, if any.
comment
Receive notification of a comment.
Data The comment data, if any.
start_cdata
Receive notification of the start of a CDATA section.
No properties defined.
end_cdata
Receive notification of the end of a CDATA section.
No properties defined.
entity_reference
Receive notification of an internal entity reference. If this
handler is defined, internal entities will not be expanded and
not passed to the `"characters()"' handler. If this handler is
not defined, internal entities will be expanded if possible and
passed to the `"characters()"' handler.
Name The entity reference name
Value The entity reference value
DTDHandler methods
notation_decl
Receive notification of a notation declaration event.
Name The notation name.
PublicId The notation's public identifier, if any.
SystemId The notation's system identifier, if any.
Base The base for resolving a relative URI, if any.
unparsed_entity_decl
Receive notification of an unparsed entity declaration event.
Name The unparsed entity's name.
SystemId The entity's system identifier.
PublicId The entity's public identifier, if any.
Base The base for resolving a relative URI, if any.
entity_decl
Receive notification of an entity declaration event.
Name The entity name.
Value The entity value, if any.
PublicId The notation's public identifier, if any.
SystemId The notation's system identifier, if any.
Notation The notation declared for this entity, if any.
For internal entities, the `"Value"' parameter will contain the
value and the `"PublicId"', `"SystemId"', and `"Notation"' will
be undefined. For external entities, the `"Value"' parameter
will be undefined, the `"SystemId"' parameter will have the
system id, the `"PublicId"' parameter will have the public id
if it was provided (it will be undefined otherwise), the
`"Notation"' parameter will contain the notation name for
unparsed entities. If this is a parameter entity declaration,
then a '%' will be prefixed to the entity name.
Note that `"entity_decl()"' and `"unparsed_entity_decl()"'
overlap. If both methods are implemented by a handler, then
this handler will not be called for unparsed entities.
element_decl
Receive notification of an element declaration event.
Name The element type name.
Model The content model as a string.
attlist_decl
Receive notification of an attribute list declaration event.
This handler is called for each attribute in an ATTLIST decla-
ration found in the internal subset. So an ATTLIST declaration
that has multiple attributes will generate multiple calls to
this handler.
ElementName The element type name.
AttributeName The attribute name.
Type The attribute type.
Fixed True if this is a fixed attribute.
The default for `"Type"' is the default value, which will
either be "#REQUIRED", "#IMPLIED" or a quoted string (i.e. the
returned string will begin and end with a quote character).
doctype_decl
Receive notification of a DOCTYPE declaration event.
Name The document type name.
SystemId The document's system identifier.
PublicId The document's public identifier, if any.
Internal The internal subset as a string, if any.
Internal will contain all whitespace, comments, processing
instructions, and declarations seen in the internal subset. The
declarations will be there whether or not they have been pro-
cessed by another handler (except for unparsed entities pro-
cessed by the Unparsed handler). However, comments and pro-
cessing instructions will not appear if they've been processed
by their respective handlers.
xml_decl
Receive notification of an XML declaration event.
Version The version.
Encoding The encoding string, if any.
Standalone True, false, or undefined if not declared.
EntityResolver
resolve_entity
Allow the handler to resolve external entities.
Name The notation name.
SystemId The notation's system identifier.
PublicId The notation's public identifier, if any.
Base The base for resolving a relative URI, if any.
`"resolve_entity()"' should return undef to request that the
parser open a regular URI connection to the system identifier
or a hash describing the new input source. This hash has the
same properties as the `"Source"' parameter to `"parse()"':
PublicId The public identifier of the external entity being
referenced, or undef if none was supplied.
SystemId The system identifier of the external entity being
referenced.
String String containing XML text
ByteStream An open file handle.
CharacterStream
An open file handle.
Encoding The character encoding, if known.
AUTHOR
Ken MacLeod, ken@bitsko.slc.ut.us
SEE ALSO
perl(1), PerlSAX.pod(3)
Extensible Markup Language (XML) <http://www.w3c.org/XML/>
SAX 1.0: The Simple API for XML <http://www.megginson.com/SAX/>
perl v5.8.6 2003-10-21 XML::Parser::PerlSAX(3)