git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/branches/WX_2_2_BRANCH@7708 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
		
			
				
	
	
		
			897 lines
		
	
	
		
			32 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			897 lines
		
	
	
		
			32 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
| <HTML>
 | |
| <HEAD>
 | |
| <!-- This HTML file has been created by texi2html 1.54
 | |
|      from gettext.texi on 25 January 1999 -->
 | |
| 
 | |
| <TITLE>GNU gettext utilities - The Programmer's View</TITLE>
 | |
| <link href="gettext_9.html" rel=Next>
 | |
| <link href="gettext_7.html" rel=Previous>
 | |
| <link href="gettext_toc.html" rel=ToC>
 | |
| 
 | |
| </HEAD>
 | |
| <BODY>
 | |
| <p>Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_7.html">previous</A>, <A HREF="gettext_9.html">next</A>, <A HREF="gettext_12.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
 | |
| <P><HR><P>
 | |
| 
 | |
| 
 | |
| <H1><A NAME="SEC39" HREF="gettext_toc.html#TOC39">The Programmer's View</A></H1>
 | |
| 
 | |
| <P>
 | |
| One aim of the current message catalog implementation provided by
 | |
| GNU <CODE>gettext</CODE> was to use the systems message catalog handling, if the
 | |
| installer wishes to do so.  So we perhaps should first take a look at
 | |
| the solutions we know about.  The people in the POSIX committee does not
 | |
| manage to agree on one of the semi-official standards which we'll
 | |
| describe below.  In fact they couldn't agree on anything, so nothing
 | |
| decide only to include an example of an interface.  The major Unix vendors
 | |
| are split in the usage of the two most important specifications: X/Opens
 | |
| catgets vs. Uniforums gettext interface.  We'll describe them both and
 | |
| later explain our solution of this dilemma.
 | |
| 
 | |
| </P>
 | |
| 
 | |
| 
 | |
| 
 | |
| <H2><A NAME="SEC40" HREF="gettext_toc.html#TOC40">About <CODE>catgets</CODE></A></H2>
 | |
| 
 | |
| <P>
 | |
| The <CODE>catgets</CODE> implementation is defined in the X/Open Portability
 | |
| Guide, Volume 3, XSI Supplementary Definitions, Chapter 5.  But the
 | |
| process of creating this standard seemed to be too slow for some of
 | |
| the Unix vendors so they created their implementations on preliminary
 | |
| versions of the standard.  Of course this leads again to problems while
 | |
| writing platform independent programs: even the usage of <CODE>catgets</CODE>
 | |
| does not guarantee a unique interface.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| Another, personal comment on this that only a bunch of committee members
 | |
| could have made this interface.  They never really tried to program
 | |
| using this interface.  It is a fast, memory-saving implementation, an
 | |
| user can happily live with it.  But programmers hate it (at least me and
 | |
| some others do...)
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| But we must not forget one point: after all the trouble with transfering
 | |
| the rights on Unix(tm) they at last came to X/Open, the very same who
 | |
| published this specifications.  This leads me to making the prediction
 | |
| that this interface will be in future Unix standards (e.g. Spec1170) and
 | |
| therefore part of all Unix implementation (implementations, which are
 | |
| <EM>allowed</EM> to wear this name).
 | |
| 
 | |
| </P>
 | |
| 
 | |
| 
 | |
| 
 | |
| <H3><A NAME="SEC41" HREF="gettext_toc.html#TOC41">The Interface</A></H3>
 | |
| 
 | |
| <P>
 | |
| The interface to the <CODE>catgets</CODE> implementation consists of three
 | |
| functions which correspond to those used in file access: <CODE>catopen</CODE>
 | |
| to open the catalog for using, <CODE>catgets</CODE> for accessing the message
 | |
| tables, and <CODE>catclose</CODE> for closing after work is done.  Prototypes
 | |
| for the functions and the needed definitions are in the
 | |
| <CODE><nl_types.h></CODE> header file.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| <CODE>catopen</CODE> is used like in this:
 | |
| 
 | |
| </P>
 | |
| 
 | |
| <PRE>
 | |
| nl_catd catd = catopen ("catalog_name", 0);
 | |
| </PRE>
 | |
| 
 | |
| <P>
 | |
| The function takes as the argument the name of the catalog.  This usual
 | |
| refers to the name of the program or the package.  The second parameter
 | |
| is not further specified in the standard.  I don't even know whether it
 | |
| is implemented consistently among various systems.  So the common advice
 | |
| is to use <CODE>0</CODE> as the value.  The return value is a handle to the
 | |
| message catalog, equivalent to handles to file returned by <CODE>open</CODE>.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| This handle is of course used in the <CODE>catgets</CODE> function which can
 | |
| be used like this:
 | |
| 
 | |
| </P>
 | |
| 
 | |
| <PRE>
 | |
| char *translation = catgets (catd, set_no, msg_id, "original string");
 | |
| </PRE>
 | |
| 
 | |
| <P>
 | |
| The first parameter is this catalog descriptor.  The second parameter
 | |
| specifies the set of messages in this catalog, in which the message
 | |
| described by <CODE>msg_id</CODE> is obtained.  <CODE>catgets</CODE> therefore uses a
 | |
| three-stage addressing:
 | |
| 
 | |
| </P>
 | |
| 
 | |
| <PRE>
 | |
| catalog name => set number => message ID => translation
 | |
| </PRE>
 | |
| 
 | |
| <P>
 | |
| The fourth argument is not used to address the translation.  It is given
 | |
| as a default value in case when one of the addressing stages fail.  One
 | |
| important thing to remember is that although the return type of catgets
 | |
| is <CODE>char *</CODE> the resulting string <EM>must not</EM> be changed.  It
 | |
| should better <CODE>const char *</CODE>, but the standard is published in
 | |
| 1988, one year before ANSI C.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| The last of these function functions is used and behaves as expected:
 | |
| 
 | |
| </P>
 | |
| 
 | |
| <PRE>
 | |
| catclose (catd);
 | |
| </PRE>
 | |
| 
 | |
| <P>
 | |
| After this no <CODE>catgets</CODE> call using the descriptor is legal anymore.
 | |
| 
 | |
| </P>
 | |
| 
 | |
| 
 | |
| <H3><A NAME="SEC42" HREF="gettext_toc.html#TOC42">Problems with the <CODE>catgets</CODE> Interface?!</A></H3>
 | |
| 
 | |
| <P>
 | |
| Now that this descriptions seemed to be really easy where are the
 | |
| problem we speak of.  In fact the interface could be used in a
 | |
| reasonable way, but constructing the message catalogs is a pain.  The
 | |
| reason for this lies in the third argument of <CODE>catgets</CODE>: the unique
 | |
| message ID.  This has to be a numeric value for all messages in a single
 | |
| set.  Perhaps you could imagine the problems keeping such list while
 | |
| changing the source code.  Add a new message here, remove one there.  Of
 | |
| course there have been developed a lot of tools helping to organize this
 | |
| chaos but one as the other fails in one aspect or the other.  We don't
 | |
| want to say that the other approach has no problems but they are far
 | |
| more easily to manage.
 | |
| 
 | |
| </P>
 | |
| 
 | |
| 
 | |
| <H2><A NAME="SEC43" HREF="gettext_toc.html#TOC43">About <CODE>gettext</CODE></A></H2>
 | |
| 
 | |
| <P>
 | |
| The definition of the <CODE>gettext</CODE> interface comes from a Uniforum
 | |
| proposal and it is followed by at least one major Unix vendor
 | |
| (Sun) in its last developments.  It is not specified in any official
 | |
| standard, though.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| The main points about this solution is that it does not follow the
 | |
| method of normal file handling (open-use-close) and that it does not
 | |
| burden the programmer so many task, especially the unique key handling.
 | |
| Of course here is also a unique key needed, but this key is the
 | |
| message itself (how long or short it is).  See section <A HREF="gettext_8.html#SEC48">Comparing the Two Interfaces</A> for a
 | |
| more detailed comparison of the two methods.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| The following section contains a rather detailed description of the
 | |
| interface.  We make it that detailed because this is the interface
 | |
| we chose for the GNU <CODE>gettext</CODE> Library.  Programmers interested
 | |
| in using this library will be interested in this description.
 | |
| 
 | |
| </P>
 | |
| 
 | |
| 
 | |
| 
 | |
| <H3><A NAME="SEC44" HREF="gettext_toc.html#TOC44">The Interface</A></H3>
 | |
| 
 | |
| <P>
 | |
| The minimal functionality an interface must have is a) to select a
 | |
| domain the strings are coming from (a single domain for all programs is
 | |
| not reasonable because its construction and maintenance is difficult,
 | |
| perhaps impossible) and b) to access a string in a selected domain.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| This is principally the description of the <CODE>gettext</CODE> interface.  It
 | |
| has an global domain which unqualified usages reference.  Of course this
 | |
| domain is selectable by the user.
 | |
| 
 | |
| </P>
 | |
| 
 | |
| <PRE>
 | |
| char *textdomain (const char *domain_name);
 | |
| </PRE>
 | |
| 
 | |
| <P>
 | |
| This provides the possibility to change or query the current status of
 | |
| the current global domain of the <CODE>LC_MESSAGE</CODE> category.  The
 | |
| argument is a null-terminated string, whose characters must be legal in
 | |
| the use in filenames.  If the <VAR>domain_name</VAR> argument is <CODE>NULL</CODE>,
 | |
| the function return the current value.  If no value has been set
 | |
| before, the name of the default domain is returned: <EM>messages</EM>.
 | |
| Please note that although the return value of <CODE>textdomain</CODE> is of
 | |
| type <CODE>char *</CODE> no changing is allowed.  It is also important to know
 | |
| that no checks of the availability are made.  If the name is not
 | |
| available you will see this by the fact that no translations are provided.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| To use a domain set by <CODE>textdomain</CODE> the function
 | |
| 
 | |
| </P>
 | |
| 
 | |
| <PRE>
 | |
| char *gettext (const char *msgid);
 | |
| </PRE>
 | |
| 
 | |
| <P>
 | |
| is to be used.  This is the simplest reasonable form one can imagine.
 | |
| The translation of the string <VAR>msgid</VAR> is returned if it is available
 | |
| in the current domain.  If not available the argument itself is
 | |
| returned.  If the argument is <CODE>NULL</CODE> the result is undefined.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| One things which should come into mind is that no explicit dependency to
 | |
| the used domain is given.  The current value of the domain for the
 | |
| <CODE>LC_MESSAGES</CODE> locale is used.  If this changes between two
 | |
| executions of the same <CODE>gettext</CODE> call in the program, both calls
 | |
| reference a different message catalog.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| For the easiest case, which is normally used in internationalized
 | |
| packages, once at the beginning of execution a call to <CODE>textdomain</CODE>
 | |
| is issued, setting the domain to a unique name, normally the package
 | |
| name.  In the following code all strings which have to be translated are
 | |
| filtered through the gettext function.  That's all, the package speaks
 | |
| your language.
 | |
| 
 | |
| </P>
 | |
| 
 | |
| 
 | |
| <H3><A NAME="SEC45" HREF="gettext_toc.html#TOC45">Solving Ambiguities</A></H3>
 | |
| 
 | |
| <P>
 | |
| While this single name domain work good for most applications there
 | |
| might be the need to get translations from more than one domain.  Of
 | |
| course one could switch between different domains with calls to
 | |
| <CODE>textdomain</CODE>, but this is really not convenient nor is it fast.  A
 | |
| possible situation could be one case discussing while this writing:  all
 | |
| error messages of functions in the set of common used functions should
 | |
| go into a separate domain <CODE>error</CODE>.  By this mean we would only need
 | |
| to translate them once.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| For this reasons there are two more functions to retrieve strings:
 | |
| 
 | |
| </P>
 | |
| 
 | |
| <PRE>
 | |
| char *dgettext (const char *domain_name, const char *msgid);
 | |
| char *dcgettext (const char *domain_name, const char *msgid,
 | |
|                  int category);
 | |
| </PRE>
 | |
| 
 | |
| <P>
 | |
| Both take an additional argument at the first place, which corresponds
 | |
| to the argument of <CODE>textdomain</CODE>.  The third argument of
 | |
| <CODE>dcgettext</CODE> allows to use another locale but <CODE>LC_MESSAGES</CODE>.
 | |
| But I really don't know where this can be useful.  If the
 | |
| <VAR>domain_name</VAR> is <CODE>NULL</CODE> or <VAR>category</VAR> has an value beside
 | |
| the known ones, the result is undefined.  It should also be noted that
 | |
| this function is not part of the second known implementation of this
 | |
| function family, the one found in Solaris.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| A second ambiguity can arise by the fact, that perhaps more than one
 | |
| domain has the same name.  This can be solved by specifying where the
 | |
| needed message catalog files can be found.
 | |
| 
 | |
| </P>
 | |
| 
 | |
| <PRE>
 | |
| char *bindtextdomain (const char *domain_name,
 | |
|                       const char *dir_name);
 | |
| </PRE>
 | |
| 
 | |
| <P>
 | |
| Calling this function binds the given domain to a file in the specified
 | |
| directory (how this file is determined follows below).  Especially a
 | |
| file in the systems default place is not favored against the specified
 | |
| file anymore (as it would be by solely using <CODE>textdomain</CODE>).  A
 | |
| <CODE>NULL</CODE> pointer for the <VAR>dir_name</VAR> parameter returns the binding
 | |
| associated with <VAR>domain_name</VAR>.  If <VAR>domain_name</VAR> itself is
 | |
| <CODE>NULL</CODE> nothing happens and a <CODE>NULL</CODE> pointer is returned.  Here
 | |
| again as for all the other functions is true that none of the return
 | |
| value must be changed!
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| It is important to remember that relative path names for the
 | |
| <VAR>dir_name</VAR> parameter can be trouble.  Since the path is always
 | |
| computed relative to the current directory different results will be
 | |
| achieved when the program executes a <CODE>chdir</CODE> command.  Relative
 | |
| paths should always be avoided to avoid dependencies and
 | |
| unreliabilities.
 | |
| 
 | |
| </P>
 | |
| 
 | |
| 
 | |
| <H3><A NAME="SEC46" HREF="gettext_toc.html#TOC46">Locating Message Catalog Files</A></H3>
 | |
| 
 | |
| <P>
 | |
| Because many different languages for many different packages have to be
 | |
| stored we need some way to add these information to file message catalog
 | |
| files.  The way usually used in Unix environments is have this encoding
 | |
| in the file name.  This is also done here.  The directory name given in
 | |
| <CODE>bindtextdomain</CODE>s second argument (or the default directory),
 | |
| followed by the value and name of the locale and the domain name are
 | |
| concatenated:
 | |
| 
 | |
| </P>
 | |
| 
 | |
| <PRE>
 | |
| <VAR>dir_name</VAR>/<VAR>locale</VAR>/LC_<VAR>category</VAR>/<VAR>domain_name</VAR>.mo
 | |
| </PRE>
 | |
| 
 | |
| <P>
 | |
| The default value for <VAR>dir_name</VAR> is system specific.  For the GNU
 | |
| library, and for packages adhering to its conventions, it's:
 | |
| 
 | |
| <PRE>
 | |
| /usr/local/share/locale
 | |
| </PRE>
 | |
| 
 | |
| <P>
 | |
| <VAR>locale</VAR> is the value of the locale whose name is this
 | |
| <CODE>LC_<VAR>category</VAR></CODE>.  For <CODE>gettext</CODE> and <CODE>dgettext</CODE> this
 | |
| locale is always <CODE>LC_MESSAGES</CODE>.  <CODE>dcgettext</CODE> specifies the
 | |
| locale by the third argument.<A NAME="DOCF2" HREF="gettext_foot.html#FOOT2">(2)</A> <A NAME="DOCF3" HREF="gettext_foot.html#FOOT3">(3)</A>
 | |
| 
 | |
| </P>
 | |
| 
 | |
| 
 | |
| <H3><A NAME="SEC47" HREF="gettext_toc.html#TOC47">Optimization of the *gettext functions</A></H3>
 | |
| 
 | |
| <P>
 | |
| At this point of the discussion we should talk about an advantage of the
 | |
| GNU <CODE>gettext</CODE> implementation.  Some readers might have pointed out
 | |
| that an internationalized program might have a poor performance if some
 | |
| string has to be translated in an inner loop.  While this is unavoidable
 | |
| when the string varies from one run of the loop to the other it is
 | |
| simply a waste of time when the string is always the same.  Take the
 | |
| following example:
 | |
| 
 | |
| </P>
 | |
| 
 | |
| <PRE>
 | |
| {
 | |
|   while (...)
 | |
|     {
 | |
|       puts (gettext ("Hello world"));
 | |
|     }
 | |
| }
 | |
| </PRE>
 | |
| 
 | |
| <P>
 | |
| When the locale selection does not change between two runs the resulting
 | |
| string is always the same.  One way to use this is:
 | |
| 
 | |
| </P>
 | |
| 
 | |
| <PRE>
 | |
| {
 | |
|   str = gettext ("Hello world");
 | |
|   while (...)
 | |
|     {
 | |
|       puts (str);
 | |
|     }
 | |
| }
 | |
| </PRE>
 | |
| 
 | |
| <P>
 | |
| But this solution is not usable in all situation (e.g. when the locale
 | |
| selection changes) nor is it good readable.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| The GNU C compiler, version 2.7 and above, provide another solution for
 | |
| this.  To describe this we show here some lines of the
 | |
| <TT>`intl/libgettext.h'</TT> file.  For an explanation of the expression
 | |
| command block see section `Statements and Declarations in Expressions' in <CITE>The GNU CC Manual</CITE>.
 | |
| 
 | |
| </P>
 | |
| 
 | |
| <PRE>
 | |
| #  if defined __GNUC__ && __GNUC__ == 2 && __GNUC_MINOR__ >= 7
 | |
| extern int _nl_msg_cat_cntr;
 | |
| #   define	dcgettext(domainname, msgid, category)           \
 | |
|   (__extension__                                                 \
 | |
|    ({                                                            \
 | |
|      char *result;                                               \
 | |
|      if (__builtin_constant_p (msgid))                           \
 | |
|        {                                                         \
 | |
|          static char *__translation__;                           \
 | |
|          static int __catalog_counter__;                         \
 | |
|          if (! __translation__                                   \
 | |
|              || __catalog_counter__ != _nl_msg_cat_cntr)         \
 | |
|            {                                                     \
 | |
|              __translation__ =                                   \
 | |
|                dcgettext__ ((domainname), (msgid), (category));  \
 | |
|              __catalog_counter__ = _nl_msg_cat_cntr;             \
 | |
|            }                                                     \
 | |
|          result = __translation__;                               \
 | |
|        }                                                         \
 | |
|      else                                                        \
 | |
|        result = dcgettext__ ((domainname), (msgid), (category)); \
 | |
|      result;                                                     \
 | |
|     }))
 | |
| #  endif
 | |
| </PRE>
 | |
| 
 | |
| <P>
 | |
| The interesting thing here is the <CODE>__builtin_constant_p</CODE> predicate.
 | |
| This is evaluated at compile time and so optimization can take place
 | |
| immediately.  Here two cases are distinguished: the argument to
 | |
| <CODE>gettext</CODE> is not a constant value in which case simply the function
 | |
| <CODE>dcgettext__</CODE> is called, the real implementation of the
 | |
| <CODE>dcgettext</CODE> function.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| If the string argument <EM>is</EM> constant we can reuse the once gained
 | |
| translation when the locale selection has not changed.  This is exactly
 | |
| what is done here.  The <CODE>_nl_msg_cat_cntr</CODE> variable is defined in
 | |
| the <TT>`loadmsgcat.c'</TT> which is available in <TT>`libintl.a'</TT> and is
 | |
| changed whenever a new message catalog is loaded.
 | |
| 
 | |
| </P>
 | |
| 
 | |
| 
 | |
| <H2><A NAME="SEC48" HREF="gettext_toc.html#TOC48">Comparing the Two Interfaces</A></H2>
 | |
| 
 | |
| <P>
 | |
| The following discussion is perhaps a little bit colored.  As said
 | |
| above we implemented GNU <CODE>gettext</CODE> following the Uniforum
 | |
| proposal and this surely has its reasons.  But it should show how we
 | |
| came to this decision.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| First we take a look at the developing process.  When we write an
 | |
| application using NLS provided by <CODE>gettext</CODE> we proceed as always.
 | |
| Only when we come to a string which might be seen by the users and thus
 | |
| has to be translated we use <CODE>gettext("...")</CODE> instead of
 | |
| <CODE>"..."</CODE>.  At the beginning of each source file (or in a central
 | |
| header file) we define
 | |
| 
 | |
| </P>
 | |
| 
 | |
| <PRE>
 | |
| #define gettext(String) (String)
 | |
| </PRE>
 | |
| 
 | |
| <P>
 | |
| Even this definition can be avoided when the system supports the
 | |
| <CODE>gettext</CODE> function in its C library.  When we compile this code the
 | |
| result is the same as if no NLS code is used.  When  you take a look at
 | |
| the GNU <CODE>gettext</CODE> code you will see that we use <CODE>_("...")</CODE>
 | |
| instead of <CODE>gettext("...")</CODE>.  This reduces the number of
 | |
| additional characters per translatable string to <EM>3</EM> (in words:
 | |
| three).
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| When now a production version of the program is needed we simply replace
 | |
| the definition
 | |
| 
 | |
| </P>
 | |
| 
 | |
| <PRE>
 | |
| #define _(String) (String)
 | |
| </PRE>
 | |
| 
 | |
| <P>
 | |
| by
 | |
| 
 | |
| </P>
 | |
| 
 | |
| <PRE>
 | |
| #include <libintl.h>
 | |
| #define _(String) gettext (String)
 | |
| </PRE>
 | |
| 
 | |
| <P>
 | |
| Additionally we run the program <TT>`xgettext'</TT> on all source code file
 | |
| which contain translatable strings and that's it: we have a running
 | |
| program which does not depend on translations to be available, but which
 | |
| can use any that becomes available.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| The same procedure can be done for the <CODE>gettext_noop</CODE> invocations
 | |
| (see section <A HREF="gettext_3.html#SEC18">Special Cases of Translatable Strings</A>).  First you can define <CODE>gettext_noop</CODE> to a
 | |
| no-op macro and later use the definition from <TT>`libintl.h'</TT>.  Because
 | |
| this name is not used in Suns implementation of <TT>`libintl.h'</TT>,
 | |
| you should consider the following code for your project:
 | |
| 
 | |
| </P>
 | |
| 
 | |
| <PRE>
 | |
| #ifdef gettext_noop
 | |
| # define N_(String) gettext_noop (String)
 | |
| #else
 | |
| # define N_(String) (String)
 | |
| #endif
 | |
| </PRE>
 | |
| 
 | |
| <P>
 | |
| <CODE>N_</CODE> is a short form similar to <CODE>_</CODE>.  The <TT>`Makefile'</TT> in
 | |
| the <TT>`po/'</TT> directory of GNU gettext knows by default both of the
 | |
| mentioned short forms so you are invited to follow this proposal for
 | |
| your own ease.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| Now to <CODE>catgets</CODE>.  The main problem is the work for the
 | |
| programmer.  Every time he comes to a translatable string he has to
 | |
| define a number (or a symbolic constant) which has also be defined in
 | |
| the message catalog file.  He also has to take care for duplicate
 | |
| entries, duplicate message IDs etc.  If he wants to have the same
 | |
| quality in the message catalog as the GNU <CODE>gettext</CODE> program
 | |
| provides he also has to put the descriptive comments for the strings and
 | |
| the location in all source code files in the message catalog.  This is
 | |
| nearly a Mission: Impossible.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| But there are also some points people might call advantages speaking for
 | |
| <CODE>catgets</CODE>.  If you have a single word in a string and this string
 | |
| is used in different contexts it is likely that in one or the other
 | |
| language the word has different translations.  Example:
 | |
| 
 | |
| </P>
 | |
| 
 | |
| <PRE>
 | |
| printf ("%s: %d", gettext ("number"), number_of_errors)
 | |
| 
 | |
| printf ("you should see %d %s", number_count,
 | |
|         number_count == 1 ? gettext ("number") : gettext ("numbers"))
 | |
| </PRE>
 | |
| 
 | |
| <P>
 | |
| Here we have to translate two times the string <CODE>"number"</CODE>.  Even
 | |
| if you do not speak a language beside English it might be possible to
 | |
| recognize that the two words have a different meaning.  In German the
 | |
| first appearance has to be translated to <CODE>"Anzahl"</CODE> and the second
 | |
| to <CODE>"Zahl"</CODE>.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| Now you can say that this example is really esoteric.  And you are
 | |
| right!  This is exactly how we felt about this problem and decide that
 | |
| it does not weight that much.  The solution for the above problem could
 | |
| be very easy:
 | |
| 
 | |
| </P>
 | |
| 
 | |
| <PRE>
 | |
| printf ("%s %d", gettext ("number:"), number_of_errors)
 | |
| 
 | |
| printf (number_count == 1 ? gettext ("you should see %d number")
 | |
|                           : gettext ("you should see %d numbers"),
 | |
|         number_count)
 | |
| </PRE>
 | |
| 
 | |
| <P>
 | |
| We believe that we can solve all conflicts with this method.  If it is
 | |
| difficult one can also consider changing one of the conflicting string a
 | |
| little bit.  But it is not impossible to overcome.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| Translator note: It is perhaps appropriate here to tell those English
 | |
| speaking programmers that the plural form of a noun cannot be formed by
 | |
| appending a single `s'.  Most other languages use different methods.
 | |
| Even the above form is not general enough to cope with all languages.
 | |
| Rafal Maszkowski <rzm@mat.uni.torun.pl> reports:
 | |
| 
 | |
| </P>
 | |
| 
 | |
| <BLOCKQUOTE>
 | |
| <P>
 | |
| In Polish we use e.g. plik (file) this way:
 | |
| 
 | |
| <PRE>
 | |
| 1 plik
 | |
| 2,3,4 pliki
 | |
| 5-21 pliko'w
 | |
| 22-24 pliki
 | |
| 25-31 pliko'w
 | |
| </PRE>
 | |
| 
 | |
| <P>
 | |
| and so on (o' means 8859-2 oacute which should be rather okreska,
 | |
| similar to aogonek).
 | |
| </BLOCKQUOTE>
 | |
| 
 | |
| <P>
 | |
| A workable approach might be to consider methods like the one used for
 | |
| <CODE>LC_TIME</CODE> in the POSIX.2 standard.  The value of the
 | |
| <CODE>alt_digits</CODE> field can be up to 100 strings which represent the
 | |
| numbers 1 to 100.  Using this in a situation of an internationalized
 | |
| program means that an array of translatable strings should be indexed by
 | |
| the number which should represent.  A small example:
 | |
| 
 | |
| </P>
 | |
| 
 | |
| <PRE>
 | |
| void
 | |
| print_month_info (int month)
 | |
| {
 | |
|   const char *month_pos[12] =
 | |
|   { N_("first"), N_("second"), N_("third"),    N_("fourth"),
 | |
|     N_("fifth"), N_("sixth"),  N_("seventh"),  N_("eighth"),
 | |
|     N_("ninth"), N_("tenth"),  N_("eleventh"), N_("twelfth") };
 | |
|   printf (_("%s is the %s month\n"), nl_langinfo (MON_1 + month),
 | |
|           _(month_pos[month]));
 | |
| }
 | |
| </PRE>
 | |
| 
 | |
| <P>
 | |
| It should be obvious that this method is only reasonable for small
 | |
| ranges of numbers.
 | |
| 
 | |
| </P>
 | |
| 
 | |
| 
 | |
| 
 | |
| <H2><A NAME="SEC49" HREF="gettext_toc.html#TOC49">Using libintl.a in own programs</A></H2>
 | |
| 
 | |
| <P>
 | |
| Starting with version 0.9.4 the library <CODE>libintl.h</CODE> should be
 | |
| self-contained.  I.e., you can use it in your own programs without
 | |
| providing additional functions.  The <TT>`Makefile'</TT> will put the header
 | |
| and the library in directories selected using the <CODE>$(prefix)</CODE>.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| One exception of the above is found on HP-UX systems.  Here the C library
 | |
| does not contain the <CODE>alloca</CODE> function (and the HP compiler does
 | |
| not generate it inlined).  But it is not intended to rewrite the whole
 | |
| library just because of this dumb system.  Instead include the
 | |
| <CODE>alloca</CODE> function in all package you use the <CODE>libintl.a</CODE> in.
 | |
| 
 | |
| </P>
 | |
| 
 | |
| 
 | |
| <H2><A NAME="SEC50" HREF="gettext_toc.html#TOC50">Being a <CODE>gettext</CODE> grok</A></H2>
 | |
| 
 | |
| <P>
 | |
| To fully exploit the functionality of the GNU <CODE>gettext</CODE> library it
 | |
| is surely helpful to read the source code.  But for those who don't want
 | |
| to spend that much time in reading the (sometimes complicated) code here
 | |
| is a list comments:
 | |
| 
 | |
| </P>
 | |
| 
 | |
| <UL>
 | |
| <LI>Changing the language at runtime
 | |
| 
 | |
| For interactive programs it might be useful to offer a selection of the
 | |
| used language at runtime.  To understand how to do this one need to know
 | |
| how the used language is determined while executing the <CODE>gettext</CODE>
 | |
| function.  The method which is presented here only works correctly
 | |
| with the GNU implementation of the <CODE>gettext</CODE> functions.  It is not
 | |
| possible with underlying <CODE>catgets</CODE> functions or <CODE>gettext</CODE>
 | |
| functions from the systems C library.  The exception is of course the
 | |
| GNU C Library which uses the GNU <CODE>gettext</CODE> Library for message handling.
 | |
| 
 | |
| In the function <CODE>dcgettext</CODE> at every call the current setting of
 | |
| the highest priority environment variable is determined and used.
 | |
| Highest priority means here the following list with decreasing
 | |
| priority:
 | |
| 
 | |
| 
 | |
| <OL>
 | |
| <LI><CODE>LANGUAGE</CODE>
 | |
| 
 | |
| <LI><CODE>LC_ALL</CODE>
 | |
| 
 | |
| <LI><CODE>LC_xxx</CODE>, according to selected locale
 | |
| 
 | |
| <LI><CODE>LANG</CODE>
 | |
| 
 | |
| </OL>
 | |
| 
 | |
| Afterwards the path is constructed using the found value and the
 | |
| translation file is loaded if available.
 | |
| 
 | |
| What is now when the value for, say, <CODE>LANGUAGE</CODE> changes.  According
 | |
| to the process explained above the new value of this variable is found
 | |
| as soon as the <CODE>dcgettext</CODE> function is called.  But this also means
 | |
| the (perhaps) different message catalog file is loaded.  In other
 | |
| words: the used language is changed.
 | |
| 
 | |
| But there is one little hook.  The code for gcc-2.7.0 and up provides
 | |
| some optimization.  This optimization normally prevents the calling of
 | |
| the <CODE>dcgettext</CODE> function as long as no new catalog is loaded.  But
 | |
| if <CODE>dcgettext</CODE> is not called the program also cannot find the
 | |
| <CODE>LANGUAGE</CODE> variable be changed (see section <A HREF="gettext_8.html#SEC47">Optimization of the *gettext functions</A>).  A
 | |
| solution for this is very easy.  Include the following code in the
 | |
| language switching function.
 | |
| 
 | |
| 
 | |
| <PRE>
 | |
|   /* Change language.  */
 | |
|   setenv ("LANGUAGE", "fr", 1);
 | |
| 
 | |
|   /* Make change known.  */
 | |
|   {
 | |
|     extern int  _nl_msg_cat_cntr;
 | |
|     ++_nl_msg_cat_cntr;
 | |
|   }
 | |
| </PRE>
 | |
| 
 | |
| The variable <CODE>_nl_msg_cat_cntr</CODE> is defined in <TT>`loadmsgcat.c'</TT>.
 | |
| The programmer will find himself in need for a construct like this only
 | |
| when developing programs which do run longer and provide the user to
 | |
| select the language at runtime.  Non-interactive programs (like all
 | |
| these little Unix tools) should never need this.
 | |
| 
 | |
| </UL>
 | |
| 
 | |
| 
 | |
| 
 | |
| <H2><A NAME="SEC51" HREF="gettext_toc.html#TOC51">Temporary Notes for the Programmers Chapter</A></H2>
 | |
| 
 | |
| 
 | |
| 
 | |
| <H3><A NAME="SEC52" HREF="gettext_toc.html#TOC52">Temporary - Two Possible Implementations</A></H3>
 | |
| 
 | |
| <P>
 | |
| There are two competing methods for language independent messages:
 | |
| the X/Open <CODE>catgets</CODE> method, and the Uniforum <CODE>gettext</CODE>
 | |
| method.  The <CODE>catgets</CODE> method indexes messages by integers; the
 | |
| <CODE>gettext</CODE> method indexes them by their English translations.
 | |
| The <CODE>catgets</CODE> method has been around longer and is supported
 | |
| by more vendors.  The <CODE>gettext</CODE> method is supported by Sun,
 | |
| and it has been heard that the COSE multi-vendor initiative is
 | |
| supporting it.  Neither method is a POSIX standard; the POSIX.1
 | |
| committee had a lot of disagreement in this area.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| Neither one is in the POSIX standard.  There was much disagreement
 | |
| in the POSIX.1 committee about using the <CODE>gettext</CODE> routines
 | |
| vs. <CODE>catgets</CODE> (XPG).  In the end the committee couldn't
 | |
| agree on anything, so no messaging system was included as part
 | |
| of the standard.  I believe the informative annex of the standard
 | |
| includes the XPG3 messaging interfaces, "...as an example of
 | |
| a messaging system that has been implemented..."
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| They were very careful not to say anywhere that you should use one
 | |
| set of interfaces over the other.  For more on this topic please
 | |
| see the Programming for Internationalization FAQ.
 | |
| 
 | |
| </P>
 | |
| 
 | |
| 
 | |
| <H3><A NAME="SEC53" HREF="gettext_toc.html#TOC53">Temporary - About <CODE>catgets</CODE></A></H3>
 | |
| 
 | |
| <P>
 | |
| There have been a few discussions of late on the use of
 | |
| <CODE>catgets</CODE> as a base.  I think it important to present both
 | |
| sides of the argument and hence am opting to play devil's advocate
 | |
| for a little bit.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| I'll not deny the fact that <CODE>catgets</CODE> could have been designed
 | |
| a lot better.  It currently has quite a number of limitations and
 | |
| these have already been pointed out.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| However there is a great deal to be said for consistency and
 | |
| standardization.  A common recurring problem when writing Unix
 | |
| software is the myriad portability problems across Unix platforms.
 | |
| It seems as if every Unix vendor had a look at the operating system
 | |
| and found parts they could improve upon.  Undoubtedly, these
 | |
| modifications are probably innovative and solve real problems.
 | |
| However, software developers have a hard time keeping up with all
 | |
| these changes across so many platforms.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| And this has prompted the Unix vendors to begin to standardize their
 | |
| systems.  Hence the impetus for Spec1170.  Every major Unix vendor
 | |
| has committed to supporting this standard and every Unix software
 | |
| developer waits with glee the day they can write software to this
 | |
| standard and simply recompile (without having to use autoconf)
 | |
| across different platforms.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| As I understand it, Spec1170 is roughly based upon version 4 of the
 | |
| X/Open Portability Guidelines (XPG4).  Because <CODE>catgets</CODE> and
 | |
| friends are defined in XPG4, I'm led to believe that <CODE>catgets</CODE>
 | |
| is a part of Spec1170 and hence will become a standardized component
 | |
| of all Unix systems.
 | |
| 
 | |
| </P>
 | |
| 
 | |
| 
 | |
| <H3><A NAME="SEC54" HREF="gettext_toc.html#TOC54">Temporary - Why a single implementation</A></H3>
 | |
| 
 | |
| <P>
 | |
| Now it seems kind of wasteful to me to have two different systems
 | |
| installed for accessing message catalogs.  If we do want to remedy
 | |
| <CODE>catgets</CODE> deficiencies why don't we try to expand <CODE>catgets</CODE>
 | |
| (in a compatible manner) rather than implement an entirely new system.
 | |
| Otherwise, we'll end up with two message catalog access systems installed
 | |
| with an operating system - one set of routines for packages using GNU
 | |
| <CODE>gettext</CODE> for their internationalization, and another set of routines
 | |
| (catgets) for all other software.  Bloated?
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| Supposing another catalog access system is implemented.  Which do
 | |
| we recommend?  At least for Linux, we need to attract as many
 | |
| software developers as possible.  Hence we need to make it as easy
 | |
| for them to port their software as possible.  Which means supporting
 | |
| <CODE>catgets</CODE>.  We will be implementing the <CODE>glocale</CODE> code
 | |
| within our <CODE>libc</CODE>, but does this mean we also have to incorporate
 | |
| another message catalog access scheme within our <CODE>libc</CODE> as well?
 | |
| And what about people who are going to be using the <CODE>glocale</CODE>
 | |
| + non-<CODE>catgets</CODE> routines.  When they port their software to
 | |
| other platforms, they're now going to have to include the front-end
 | |
| (<CODE>glocale</CODE>) code plus the back-end code (the non-<CODE>catgets</CODE>
 | |
| access routines) with their software instead of just including the
 | |
| <CODE>glocale</CODE> code with their software.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| Message catalog support is however only the tip of the iceberg.
 | |
| What about the data for the other locale categories.  They also have
 | |
| a number of deficiencies.  Are we going to abandon them as well and
 | |
| develop another duplicate set of routines (should <CODE>glocale</CODE>
 | |
| expand beyond message catalog support)?
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| Like many parts of Unix that can be improved upon, we're stuck with balancing
 | |
| compatibility with the past with useful improvements and innovations for
 | |
| the future.
 | |
| 
 | |
| </P>
 | |
| 
 | |
| 
 | |
| 
 | |
| <H3><A NAME="SEC55" HREF="gettext_toc.html#TOC55">Temporary - Notes</A></H3>
 | |
| 
 | |
| <P>
 | |
| X/Open agreed very late on the standard form so that many
 | |
| implementations differ from the final form.  Both of my system (old
 | |
| Linux catgets and Ultrix-4) have a strange variation.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| OK.  After incorporating the last changes I have to spend some time on
 | |
| making the GNU/Linux <CODE>libc</CODE> <CODE>gettext</CODE> functions.  So in future
 | |
| Solaris is not the only system having <CODE>gettext</CODE>.
 | |
| 
 | |
| </P>
 | |
| <P><HR><P>
 | |
| <p>Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_7.html">previous</A>, <A HREF="gettext_9.html">next</A>, <A HREF="gettext_12.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
 | |
| </BODY>
 | |
| </HTML>
 |