git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@7748 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
		
			
				
	
	
		
			259 lines
		
	
	
		
			11 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			259 lines
		
	
	
		
			11 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
| <HTML>
 | |
| <HEAD>
 | |
| <!-- This HTML file has been created by texi2html 1.54
 | |
|      from gettext.texi on 25 January 1999 -->
 | |
| 
 | |
| <TITLE>GNU gettext utilities - Producing Binary MO Files</TITLE>
 | |
| <link href="gettext_7.html" rel=Next>
 | |
| <link href="gettext_5.html" rel=Previous>
 | |
| <link href="gettext_toc.html" rel=ToC>
 | |
| 
 | |
| </HEAD>
 | |
| <BODY>
 | |
| <p>Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_5.html">previous</A>, <A HREF="gettext_7.html">next</A>, <A HREF="gettext_12.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
 | |
| <P><HR><P>
 | |
| 
 | |
| 
 | |
| <H1><A NAME="SEC32" HREF="gettext_toc.html#TOC32">Producing Binary MO Files</A></H1>
 | |
| 
 | |
| 
 | |
| 
 | |
| <H2><A NAME="SEC33" HREF="gettext_toc.html#TOC33">Invoking the <CODE>msgfmt</CODE> Program</A></H2>
 | |
| 
 | |
| 
 | |
| <PRE>
 | |
| Usage: msgfmt [<VAR>option</VAR>] <VAR>filename</VAR>.po ...
 | |
| </PRE>
 | |
| 
 | |
| <DL COMPACT>
 | |
| 
 | |
| <DT><SAMP>`-a <VAR>number</VAR>'</SAMP>
 | |
| <DD>
 | |
| <DT><SAMP>`--alignment=<VAR>number</VAR>'</SAMP>
 | |
| <DD>
 | |
| Align strings to <VAR>number</VAR> bytes (default: 1).
 | |
| 
 | |
| <DT><SAMP>`-h'</SAMP>
 | |
| <DD>
 | |
| <DT><SAMP>`--help'</SAMP>
 | |
| <DD>
 | |
| Display this help and exit.
 | |
| 
 | |
| <DT><SAMP>`--no-hash'</SAMP>
 | |
| <DD>
 | |
| Binary file will not include the hash table.
 | |
| 
 | |
| <DT><SAMP>`-o <VAR>file</VAR>'</SAMP>
 | |
| <DD>
 | |
| <DT><SAMP>`--output-file=<VAR>file</VAR>'</SAMP>
 | |
| <DD>
 | |
| Specify output file name as <VAR>file</VAR>.
 | |
| 
 | |
| <DT><SAMP>`--strict'</SAMP>
 | |
| <DD>
 | |
| Direct the program to work strictly following the Uniforum/Sun
 | |
| implementation.  Currently this only affects the naming of the output
 | |
| file.  If this option is not given the name of the output file is the
 | |
| same as the domain name.  If the strict Uniforum mode is enable the
 | |
| suffix <TT>`.mo'</TT> is added to the file name if it is not already
 | |
| present.
 | |
| 
 | |
| We find this behaviour of Sun's implementation rather silly and so by
 | |
| default this mode is <EM>not</EM> selected.
 | |
| 
 | |
| <DT><SAMP>`-v'</SAMP>
 | |
| <DD>
 | |
| <DT><SAMP>`--verbose'</SAMP>
 | |
| <DD>
 | |
| Detect and diagnose input file anomalies which might represent
 | |
| translation errors.  The <CODE>msgid</CODE> and <CODE>msgstr</CODE> strings are
 | |
| studied and compared.  It is considered abnormal that one string
 | |
| starts or ends with a newline while the other does not.
 | |
| 
 | |
| Also, if the string represents a format sring used in a
 | |
| <CODE>printf</CODE>-like function both strings should have the same number of
 | |
| <SAMP>`%'</SAMP> format specifiers, with matching types.  If the flag
 | |
| <CODE>c-format</CODE> or <CODE>possible-c-format</CODE> appears in the special
 | |
| comment <KBD>#,</KBD> for this entry a check is performed.  For example, the
 | |
| check will diagnose using <SAMP>`%.*s'</SAMP> against <SAMP>`%s'</SAMP>, or <SAMP>`%d'</SAMP>
 | |
| against <SAMP>`%s'</SAMP>, or <SAMP>`%d'</SAMP> against <SAMP>`%x'</SAMP>.  It can even handle
 | |
| positional parameters.
 | |
| 
 | |
| Normally the <CODE>xgettext</CODE> program automatically decides whether a
 | |
| string is a format string or not.  This algorithm is not perfect,
 | |
| though.  It might regard a string as a format string though it is not
 | |
| used in a <CODE>printf</CODE>-like function and so <CODE>msgfmt</CODE> might report
 | |
| errors where there are none.  Or the other way round: a string is not
 | |
| regarded as a format string but it is used in a <CODE>printf</CODE>-like
 | |
| function.
 | |
| 
 | |
| So solve this problem the programmer can dictate the decision to the
 | |
| <CODE>xgettext</CODE> program (see section <A HREF="gettext_3.html#SEC17">Special Comments preceding Keywords</A>).  The translator should not
 | |
| consider removing the flag from the <KBD>#,</KBD> line.  This "fix" would be
 | |
| reversed again as soon as <CODE>msgmerge</CODE> is called the next time.
 | |
| 
 | |
| <DT><SAMP>`-V'</SAMP>
 | |
| <DD>
 | |
| <DT><SAMP>`--version'</SAMP>
 | |
| <DD>
 | |
| Output version information and exit.
 | |
| 
 | |
| </DL>
 | |
| 
 | |
| <P>
 | |
| If input file is <SAMP>`-'</SAMP>, standard input is read.  If output file
 | |
| is <SAMP>`-'</SAMP>, output is written to standard output.
 | |
| 
 | |
| </P>
 | |
| 
 | |
| 
 | |
| <H2><A NAME="SEC34" HREF="gettext_toc.html#TOC34">The Format of GNU MO Files</A></H2>
 | |
| 
 | |
| <P>
 | |
| The format of the generated MO files is best described by a picture,
 | |
| which appears below.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| The first two words serve the identification of the file.  The magic
 | |
| number will always signal GNU MO files.  The number is stored in the
 | |
| byte order of the generating machine, so the magic number really is
 | |
| two numbers: <CODE>0x950412de</CODE> and <CODE>0xde120495</CODE>.  The second
 | |
| word describes the current revision of the file format.  For now the
 | |
| revision is 0.  This might change in future versions, and ensures
 | |
| that the readers of MO files can distinguish new formats from old
 | |
| ones, so that both can be handled correctly.  The version is kept
 | |
| separate from the magic number, instead of using different magic
 | |
| numbers for different formats, mainly because <TT>`/etc/magic'</TT> is
 | |
| not updated often.  It might be better to have magic separated from
 | |
| internal format version identification.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| Follow a number of pointers to later tables in the file, allowing
 | |
| for the extension of the prefix part of MO files without having to
 | |
| recompile programs reading them.  This might become useful for later
 | |
| inserting a few flag bits, indication about the charset used, new
 | |
| tables, or other things.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| Then, at offset <VAR>O</VAR> and offset <VAR>T</VAR> in the picture, two tables
 | |
| of string descriptors can be found.  In both tables, each string
 | |
| descriptor uses two 32 bits integers, one for the string length,
 | |
| another for the offset of the string in the MO file, counting in bytes
 | |
| from the start of the file.  The first table contains descriptors
 | |
| for the original strings, and is sorted so the original strings
 | |
| are in increasing lexicographical order.  The second table contains
 | |
| descriptors for the translated strings, and is parallel to the first
 | |
| table: to find the corresponding translation one has to access the
 | |
| array slot in the second array with the same index.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| Having the original strings sorted enables the use of simple binary
 | |
| search, for when the MO file does not contain an hashing table, or
 | |
| for when it is not practical to use the hashing table provided in
 | |
| the MO file.  This also has another advantage, as the empty string
 | |
| in a PO file GNU <CODE>gettext</CODE> is usually <EM>translated</EM> into
 | |
| some system information attached to that particular MO file, and the
 | |
| empty string necessarily becomes the first in both the original and
 | |
| translated tables, making the system information very easy to find.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| The size <VAR>S</VAR> of the hash table can be zero.  In this case, the
 | |
| hash table itself is not contained in the MO file.  Some people might
 | |
| prefer this because a precomputed hashing table takes disk space, and
 | |
| does not win <EM>that</EM> much speed.  The hash table contains indices
 | |
| to the sorted array of strings in the MO file.  Conflict resolution is
 | |
| done by double hashing.  The precise hashing algorithm used is fairly
 | |
| dependent of GNU <CODE>gettext</CODE> code, and is not documented here.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| As for the strings themselves, they follow the hash file, and each
 | |
| is terminated with a <KBD>NUL</KBD>, and this <KBD>NUL</KBD> is not counted in
 | |
| the length which appears in the string descriptor.  The <CODE>msgfmt</CODE>
 | |
| program has an option selecting the alignment for MO file strings.
 | |
| With this option, each string is separately aligned so it starts at
 | |
| an offset which is a multiple of the alignment value.  On some RISC
 | |
| machines, a correct alignment will speed things up.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| Nothing prevents a MO file from having embedded <KBD>NUL</KBD>s in strings.
 | |
| However, the program interface currently used already presumes
 | |
| that strings are <KBD>NUL</KBD> terminated, so embedded <KBD>NUL</KBD>s are
 | |
| somewhat useless.  But MO file format is general enough so other
 | |
| interfaces would be later possible, if for example, we ever want to
 | |
| implement wide characters right in MO files, where <KBD>NUL</KBD> bytes may
 | |
| accidently appear.
 | |
| 
 | |
| </P>
 | |
| <P>
 | |
| This particular issue has been strongly debated in the GNU
 | |
| <CODE>gettext</CODE> development forum, and it is expectable that MO file
 | |
| format will evolve or change over time.  It is even possible that many
 | |
| formats may later be supported concurrently.  But surely, we have to
 | |
| start somewhere, and the MO file format described here is a good start.
 | |
| Nothing is cast in concrete, and the format may later evolve fairly
 | |
| easily, so we should feel comfortable with the current approach.
 | |
| 
 | |
| </P>
 | |
| 
 | |
| <PRE>
 | |
|         byte
 | |
|              +------------------------------------------+
 | |
|           0  | magic number = 0x950412de                |
 | |
|              |                                          |
 | |
|           4  | file format revision = 0                 |
 | |
|              |                                          |
 | |
|           8  | number of strings                        |  == N
 | |
|              |                                          |
 | |
|          12  | offset of table with original strings    |  == O
 | |
|              |                                          |
 | |
|          16  | offset of table with translation strings |  == T
 | |
|              |                                          |
 | |
|          20  | size of hashing table                    |  == S
 | |
|              |                                          |
 | |
|          24  | offset of hashing table                  |  == H
 | |
|              |                                          |
 | |
|              .                                          .
 | |
|              .    (possibly more entries later)         .
 | |
|              .                                          .
 | |
|              |                                          |
 | |
|           O  | length & offset 0th string  ----------------.
 | |
|       O + 8  | length & offset 1st string  ------------------.
 | |
|               ...                                    ...   | |
 | |
| O + ((N-1)*8)| length & offset (N-1)th string           |  | |
 | |
|              |                                          |  | |
 | |
|           T  | length & offset 0th translation  ---------------.
 | |
|       T + 8  | length & offset 1st translation  -----------------.
 | |
|               ...                                    ...   | | | |
 | |
| T + ((N-1)*8)| length & offset (N-1)th translation      |  | | | |
 | |
|              |                                          |  | | | |
 | |
|           H  | start hash table                         |  | | | |
 | |
|               ...                                    ...   | | | |
 | |
|   H + S * 4  | end hash table                           |  | | | |
 | |
|              |                                          |  | | | |
 | |
|              | NUL terminated 0th string  <----------------' | | |
 | |
|              |                                          |    | | |
 | |
|              | NUL terminated 1st string  <------------------' | |
 | |
|              |                                          |      | |
 | |
|               ...                                    ...       | |
 | |
|              |                                          |      | |
 | |
|              | NUL terminated 0th translation  <---------------' |
 | |
|              |                                          |        |
 | |
|              | NUL terminated 1st translation  <-----------------'
 | |
|              |                                          |
 | |
|               ...                                    ...
 | |
|              |                                          |
 | |
|              +------------------------------------------+
 | |
| </PRE>
 | |
| 
 | |
| <P><HR><P>
 | |
| <p>Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_5.html">previous</A>, <A HREF="gettext_7.html">next</A>, <A HREF="gettext_12.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
 | |
| </BODY>
 | |
| </HTML>
 |