git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@1306 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
		
			
				
	
	
		
			141 lines
		
	
	
		
			4.4 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			141 lines
		
	
	
		
			4.4 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
Implementation notes
 | 
						|
--------------------
 | 
						|
 | 
						|
Files
 | 
						|
-----
 | 
						|
 | 
						|
The library tex2any.lib contains the generic Latex parser.
 | 
						|
It comprises tex2any.cc, tex2any.h and texutils.cc.
 | 
						|
 | 
						|
The executable Tex2RTF is made up of tex2any.lib,
 | 
						|
tex2rtf.cc (main driver and user interface), and specific
 | 
						|
drivers for generating output: rtfutils.cc, htmlutil.cc
 | 
						|
and xlputils.cc.
 | 
						|
 | 
						|
Data structures
 | 
						|
---------------
 | 
						|
 | 
						|
Class declarations are found in tex2any.h.
 | 
						|
 | 
						|
TexMacroDef holds a macro (Latex command) definition: name, identifier,
 | 
						|
number of arguments, whether it should be ignored, etc. Integer
 | 
						|
identifiers are used for each Latex command for efficiency when
 | 
						|
generating output. A hash table MacroDefs stores all the TexMacroDefs,
 | 
						|
indexed on command name.
 | 
						|
 | 
						|
Each unit of a Latex file is stored in a TexChunk. A TexChunk can be
 | 
						|
a macro, argument or just a string: a TexChunk macro has child
 | 
						|
chunks for the arguments, and each argument will have one or more
 | 
						|
children for representing another command or a simple string.
 | 
						|
 | 
						|
Parsing
 | 
						|
-------
 | 
						|
 | 
						|
Parsing is relatively add hoc. read_a_line reads in a line at a time,
 | 
						|
doing some processing for file commands (e.g. input, verbatiminclude).
 | 
						|
File handles are stored in a stack so file input commands may be nested.
 | 
						|
 | 
						|
ParseArg parses an argument (which might be the whole Latex input,
 | 
						|
which is treated as an argument) or a single command, or a command
 | 
						|
argument. The parsing gets a little hairy because an environment,
 | 
						|
a normal command and bracketed commands (e.g. {\bf thing}) all get
 | 
						|
parsed into the same format. An environment, for example,
 | 
						|
is usually a one-argument command, as is {\bf thing}. It also
 | 
						|
deals with user-defined macros.
 | 
						|
 | 
						|
Whilst parsing, the function MatchMacro gets called to
 | 
						|
attempt to find a command following a backslash (or the
 | 
						|
start of an environment). ParseMacroBody parses the
 | 
						|
arguments of a command when one is found.
 | 
						|
 | 
						|
Generation
 | 
						|
----------
 | 
						|
 | 
						|
The upshot of parsing is a hierarchy of TexChunks.
 | 
						|
TraverseFromDocument calls the recursive TraverseFromChunk,
 | 
						|
and is called by the 'client' converter application to
 | 
						|
start the generation process. TraverseFromChunk
 | 
						|
calls the two functions OnMacro and OnArgument,
 | 
						|
twice for each chunk to allow for preprocessing
 | 
						|
and postprocessing of each macro or argument.
 | 
						|
 | 
						|
The client defines OnMacro and OnArgument to test
 | 
						|
the command identifier, and output the appropriate
 | 
						|
code. To help do this, the function TexOutput
 | 
						|
outputs to the current stream(s), and
 | 
						|
SetCurrentOutput(s) allows the setting of one
 | 
						|
or two output streams for the output to be sent to.
 | 
						|
Usually two outputs at a time are sufficient for
 | 
						|
hypertext applications where a title is likely
 | 
						|
to appear in an index and as a section header.
 | 
						|
 | 
						|
There are support functions for getting the string
 | 
						|
data for the current chunk (GetArgData) and the
 | 
						|
current chunk (GetArgChunk). If you have a handle
 | 
						|
on a chunk, you can output it several times by calling
 | 
						|
TraverseChildrenFromChunk (not TraverseFromChunk because
 | 
						|
that causes infinite recursion).
 | 
						|
 | 
						|
The client (here, Tex2RTF) also defines OnError and OnInform output
 | 
						|
functions appropriate to the desired user interface.
 | 
						|
 | 
						|
References
 | 
						|
----------
 | 
						|
 | 
						|
Adding, finding and resolving references are supported
 | 
						|
with functions from texutils.cc. WriteTexReferences
 | 
						|
and ReadTexReferences allow saving and reading references
 | 
						|
between conversion processes, rather like real LaTeX.
 | 
						|
 | 
						|
Bibliography
 | 
						|
------------
 | 
						|
 | 
						|
Again texutils.cc provides functions for reading in .bib files and
 | 
						|
resolving references. The function OutputBibItem gives a generic way
 | 
						|
outputting bibliography items, by 'faking' calls to OnMacro and
 | 
						|
OnArgument, allowing the existing low-level client code to take care of
 | 
						|
formatting.
 | 
						|
 | 
						|
Units
 | 
						|
-----
 | 
						|
 | 
						|
Unit parsing code is in texutils.cc as ParseUnitArgument. It converts
 | 
						|
units to points.
 | 
						|
 | 
						|
Common errors
 | 
						|
-------------
 | 
						|
 | 
						|
1) Macro not found: \end{center} ...
 | 
						|
 | 
						|
Rewrite:
 | 
						|
 | 
						|
\begin{center}
 | 
						|
{\large{\underline{A}}}
 | 
						|
\end{center}
 | 
						|
 | 
						|
as:
 | 
						|
 | 
						|
\begin{center}
 | 
						|
{\large \underline{A}}
 | 
						|
\end{center}
 | 
						|
 | 
						|
2) Tables crash RTF. Set 'compatibility ' to TRUE in .ini file; also
 | 
						|
check for \\ end of row characters on their own on a line, insert
 | 
						|
correct number of ampersands for the number of columns.  E.g.
 | 
						|
 | 
						|
hello & world\\
 | 
						|
\\
 | 
						|
 | 
						|
becomes
 | 
						|
 | 
						|
hello & world\\
 | 
						|
&\\
 | 
						|
 | 
						|
3) If list items indent erratically, try increasing
 | 
						|
listItemIndent to give more space between label and following text.
 | 
						|
A global replace of '\item [' to '\item[' may also be helpful to remove
 | 
						|
unnecessary space before the item label.
 | 
						|
 | 
						|
4) Missing figure or section references: ensure all labels _directly_ follow captions
 | 
						|
or sections (no intervening white space).
 |