Including IDN with Solaris
Stefan Teleman <Stefan.Teleman at Sun.COM>
15 March 2007
1. Summary and motivation
The inclusion of PHP5 in Solaris has identified a number of
missing capabilities. One of these capabilities is a generic
implementation of the Stringprep, Punycode and IDNA specifications
as defined by IETF Internationalized Domain Names (IDN) Working
Group. LibIDN provides such an implementation in a portable and
platform-independent manner. According to the IDN web page at
GNU.org, LibIDN is known to run on over 20 UNIX-like platforms.
This FastTrack case proposes the integration of LibIDN in Solaris.
LibIDN is GNU Software [http://www.gnu.org/software/libidn/] [1]
and is developed outside of SMI. As such, the SFW Consolidation is
the natural choice for LibIDN.
This case proposes the most recent stable release of LibIDN,
0.6.8.
This case seeks Micro/Patch Relase Binding.
2. Technical issues
2.1. Key objects.
/usr/bin/idn
/usr/lib/libidn.so.11.5.22
/usr/lib/libidn.so.11 -> libidn.so.11.5.22
/usr/lib/libidn.so -> libidn.so.11.5.22
/usr/include/idn/stringprep.h
/usr/include/idn/idna.h
/usr/include/idn/punycode.h
/usr/include/idn/idn-free.h
/usr/include/idn/pr29.h
/usr/include/idn/tld.h
/usr/include/idn/idn-int.h
/usr/share/lib/java/libidn-0.6.8.jar
/usr/share/man/man1/idn.1
/usr/share/man/man3/idna_strerror.3
/usr/share/man/man3/idna_to_ascii_4i.3
/usr/share/man/man3/idna_to_ascii_4z.3
/usr/share/man/man3/idna_to_ascii_8z.3
/usr/share/man/man3/idna_to_ascii_lz.3
/usr/share/man/man3/idna_to_unicode_44i.3
/usr/share/man/man3/idna_to_unicode_4z4z.3
/usr/share/man/man3/idna_to_unicode_8z4z.3
/usr/share/man/man3/idna_to_unicode_8z8z.3
/usr/share/man/man3/idna_to_unicode_8zlz.3
/usr/share/man/man3/idna_to_unicode_lzlz.3
/usr/share/man/man3/pr29_4.3
/usr/share/man/man3/pr29_4z.3
/usr/share/man/man3/pr29_8z.3
/usr/share/man/man3/pr29_strerror.3
/usr/share/man/man3/punycode_decode.3
/usr/share/man/man3/punycode_encode.3
/usr/share/man/man3/punycode_strerror.3
/usr/share/man/man3/stringprep.3
/usr/share/man/man3/stringprep_4i.3
/usr/share/man/man3/stringprep_4zi.3
/usr/share/man/man3/stringprep_check_version.3
/usr/share/man/man3/stringprep_convert.3
/usr/share/man/man3/stringprep_locale_charset.3
/usr/share/man/man3/stringprep_locale_to_utf8.3
/usr/share/man/man3/stringprep_profile.3
/usr/share/man/man3/stringprep_strerror.3
/usr/share/man/man3/stringprep_ucs4_nfkc_normalize.3
/usr/share/man/man3/stringprep_ucs4_to_utf8.3
/usr/share/man/man3/stringprep_unichar_to_utf8.3
/usr/share/man/man3/stringprep_utf8_nfkc_normalize.3
/usr/share/man/man3/stringprep_utf8_to_locale.3
/usr/share/man/man3/stringprep_utf8_to_ucs4.3
/usr/share/man/man3/stringprep_utf8_to_unichar.3
/usr/share/man/man3/tld_check_4.3
/usr/share/man/man3/tld_check_4t.3
/usr/share/man/man3/tld_check_4tz.3
/usr/share/man/man3/tld_check_4z.3
/usr/share/man/man3/tld_check_8z.3
/usr/share/man/man3/tld_check_lz.3
/usr/share/man/man3/tld_default_table.3
/usr/share/man/man3/tld_get_4.3
/usr/share/man/man3/tld_get_4z.3
/usr/share/man/man3/tld_get_table.3
/usr/share/man/man3/tld_get_z.3
/usr/share/man/man3/tld_strerror.3
LibIDN's functionality is provided by one executable [idn], and
one shared library [libidn.so.*]. Key aspects of the facilities
provided by LibIDN are discussed below.
2.2 Specifications
LibIDN implements the Stringprep, Punycode and IDNA specifications.
The Stringprep specification is defined by RFC 3454
[http://www.ietf.org/rfc/rfc3454.txt] [5]. According to the
specification, Stringprep "specifies a framework of processing rules
for Unicode text. Other protocols can create profiles of these rules;
these profiles will allow users to enter internationalized text strings
in
applications and have the highest chance of getting the content of
the strings correct.". In other words, in and of itself, Stringprep is
merely a foundation library for Unicode character conversion and
representation. It does not implement any protocols. Custom "profiles"
implementing formalized protocols can be constructed on top of, and
pursuant to, the Stringprep specification.
The Nameprep Internet Protocol specification is defined by RFC 3491
[http://www.ietf.org/rfc/rfc3491.txt] [6]. Nameprep specifies the
processing rules which allow user input of internationalized domain
names (IDNs) into applications, providing the highest success rate
of correct string conversion. Nameprep is a Stringprep Profile.
The Nameprep processing rules are intended solely for internationalized
domain names, and not suitable for, nor do they support, arbitrary
text.
The Nameprep profile defines the following capabilities (as required
by Stringprep):
Internationalized Domain Names [IDN]
Character universe that represents the possible input
and output to Stringprep: Unicode 3.2
[http://www.unicode.org/reports/tr28/tr28-3.html] [4]
Other profiles exist for Stringprep:
Internet Small Computer Systems Interface [iSCSI] Names,
[http://www.ietf.org/rfc/rfc3722.txt] [RFC 3722] [7]
Extensible Messaging and Presence Protocol [XMPP] Core,
[http://www.ietf.org/rfc/rfc3920.txt] [RFC 3920] [8]
Stringprep Profile for User Names and Passwords [SASL],
[http://www.ietf.org/rfc/rfc4013.txt] [RFC 4013] [9]
None of the profiles enumerated above pertain directly to LibIDN,
and, for the purposes of this document, will not be discussed further.
The Punycode Specification is defined by RFC 3492
[http://www.ietf.org/rfc/rfc3492.txt] [10]. Punycode specifies an
Internet Standard for implementing a "simple and efficient
transfer encoding syntax designed for use with Internationalized
Domain Names in Applications [IDNA]". Simply put, Punycode
tranforms a Unicode string into an ASCII string. The conversion
operation is reversible and bidirectional. Unicode characters
which can be represented as ASCII are represented literally,
and non-ASCII convertible characters are converted to, and
represented by, ASCII characters allowed in host name labels
[letters, digits, and hyphens].
The IDNA Specification is defined by RFC 3490
[http://www.ietf.org/rfc/rfc3490.txt] [11]. IDNA formalizes a Standard
for Internationalized Domain Names [IDNs], and a mechanism named
Internationalizing Domain Names in Applications [IDNA] for handling
IDNs in a standard manner. IDNs can use characters available in the
Unicode Universe. IDNA allows the non-ASCII characters to be
represented using only the ASCII character set allowed in host
name labels. This backward-compatible conversion and representation
is required by existing protocols like DNS. This way, IDNs can be
introduced with no changes to the existing infrastructure. IDNA
is meant solely for processing domain names, and is not suitable for,
nor does it support, free text.
2.3. Language bindings
LibIDN is written in C. Language bindings for Java are included.
2.4. Documentation
LibIDN provides an extensive and detailed set of man pages for
all its interfaces. These manual pages will be installed in the
default Solaris manual page location. Additionally, detailed
documentation of LibIDN's APIs is provided in HTML format.
3. Interfaces
3.1. Interface Stability
LibIDN is an Open Source project, and is controlled by a group of
developers external to SMI [GNU/FSF/Simon Joseffson]. Although
LibIDN strives to maintain ABI and API compatibility between releases,
no explicit guarantees of backwards compatibility between releases
are offered by LibIDN's developers.
3.2. Imported interfaces
LibIDN imports Standard C Library interfaces, Socket [-lsocket],
and Network Services Library [-lnsl] interfaces.
NAME NOTES
Java [for Java bindings] PSARC/2002/727
3.3. Exported interfaces
NAME STABILITY NOTES
SUNWgnu-idn Uncommitted Package name
/usr/bin/idn Uncommitted Executable location
/usr/lib/libidn.so.11.5.22 Uncommitted Library location
/usr/lib/libidn.so.11 Uncommitted Symbolic link
/usr/lib/libidn.so Uncommitted Symbolic link
/usr/share/lib/java/libidn-0.6.8.jar Uncommitted JAR file
/usr/include/idn/ Uncommitted Include files
4. References
[1] http://www.gnu.org/software/libidn/
[2] http://www.gnu.org/software/libidn/manual/libidn.html
[3] http://www.gnu.org/software/libidn/reference/
[4] http://www.unicode.org/reports/tr28/tr28-3.html
[5] http://www.ietf.org/rfc/rfc3454.txt
[6] http://www.ietf.org/rfc/rfc3491.txt
[7] http://www.ietf.org/rfc/rfc3722.txt
[8] http://www.ietf.org/rfc/rfc3920.txt
[9] http://www.ietf.org/rfc/rfc4013.txt
[10] http://www.ietf.org/rfc/rfc3492.txt
[11] http://www.ietf.org/rfc/rfc3490.txt
--
Stefan Teleman
Sun Microsystems, Inc.
Stefan.Teleman at Sun.COM