ABCDEFGHIJKLMNOPQRSTUVWXYZ

XML::UM

XML::UM(3)            User Contributed Perl Documentation           XML::UM(3)



NAME
       XML::UM - Convert UTF-8 strings to any encoding supported by
       XML::Encoding

SYNOPSIS
        use XML::UM;

        # Set directory with .xml files that comes with XML::Encoding distribution
        # Always include the trailing slash!
        $XML::UM::ENCDIR = '/home1/enno/perlModules/XML-Encoding-1.01/maps/';

        # Create the encoding routine
        my $encode = XML::UM::get_encode (
               Encoding => 'ISO-8859-2',
               EncodeUnmapped => \&XML::UM::encode_unmapped_dec);

        # Convert a string from UTF-8 to the specified Encoding
        my $encoded_str = $encode->($utf8_str);

        # Remove circular references for garbage collection
        XML::UM::dispose_encoding ('ISO-8859-2');

DESCRIPTION
       This module provides methods to convert UTF-8 strings to any XML encod-
       ing that XML::Encoding supports. It creates mapping routines from the
       .xml files that can be found in the maps/ directory in the XML::Encod-
       ing distribution. Note that the XML::Encoding distribution does install
       the .enc files in your perl directory, but not the.xml files they were
       created from. That's why you have to specify $ENCDIR as in the SYNOP-
       SIS.

       This implementation uses the XML::Encoding class to parse the .xml file
       and creates a hash that maps UTF-8 characters (each consisting of up to
       4 bytes) to their equivalent byte sequence in the specified encoding.
       Note that large mappings may consume a lot of memory!

       Future implementations may parse the .enc files directly, or do the
       conversions entirely in XS (i.e. C code.)

get_encode (Encoding => STRING, EncodeUnmapped => SUB)
       The central entry point to this module is the XML::UM::get_encode()
       method.  It forwards the call to the global $XML::UM::FACTORY, which is
       defined as an instance of XML::UM::SlowMapperFactory by default. Over-
       ride this variable to plug in your own mapper factory.

       The XML::UM::SlowMapperFactory creates an instance of XML::UM::SlowMap-
       per (and caches it for subsequent use) that reads in the .xml encoding
       file and creates a hash that maps UTF-8 characters to encoded charac-
       ters.

       The get_encode() method of XML::UM::SlowMapper is called, finally,
       which generates an anonimous subroutine that uses the hash to convert
       multi-character UTF-8 blocks to the proper encoding.

dispose_encoding ($encoding_name)
       Call this to free the memory used by the SlowMapper for a specific
       encoding.  Note that in order to free the big conversion hash, the user
       should no longer have references to the subroutines generated by
       get_encode().

       The parameters to the get_encode() method (defined as name/value pairs)
       are:

       o Encoding
           The name of the desired encoding, e.g. 'ISO-8859-2'

       o EncodeUnmapped (Default: \&XML::UM::encode_unmapped_dec)
           Defines how Unicode characters not found in the mapping file (of
           the specified encoding) are printed.  By default, they are con-
           verted to decimal entity references, like '{'

           Use \&XML::UM::encode_unmapped_hex for hexadecimal constants, like
           '«'

CAVEATS
       I'm not exactly sure about which Unicode characters in the range (0 ..
       127) should be mapped to themselves. See comments in XML/UM.pm near
       %DEFAULT_ASCII_MAPPINGS.

       The encodings that expat supports by default are currently not sup-
       ported, (e.g. UTF-16, ISO-8859-1), because there are no .enc files
       available for these encodings.  This module needs some more work. If
       you have the time, please help!

AUTHOR
       Send bug reports, hints, tips, suggestions to Enno Derksen at
       <enno@att.com>.



perl v5.8.0                       2000-02-17                        XML::UM(3)