From ec482247094681b2da3524ef212796a0c145200b Mon Sep 17 00:00:00 2001 From: Ove Kaaven Date: Sat, 25 Mar 2000 14:41:21 +0000 Subject: [PATCH] Topic overview for wxMBConv classes (more or less). Describes how it is *supposed* to work, not how it actually works, but I'll begin implementing the missing pieces as soon as I get my tree to compile in Unicode mode again... git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/branches/WX_2_2_BRANCH@6931 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775 --- docs/latex/wx/tmbconv.tex | 155 ++++++++++++++++++++++++++++++++++++++ docs/latex/wx/topics.tex | 1 + 2 files changed, 156 insertions(+) create mode 100644 docs/latex/wx/tmbconv.tex diff --git a/docs/latex/wx/tmbconv.tex b/docs/latex/wx/tmbconv.tex new file mode 100644 index 0000000000..bbdc846221 --- /dev/null +++ b/docs/latex/wx/tmbconv.tex @@ -0,0 +1,155 @@ +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%% Name: tmbconv.tex +%% Purpose: Overview of the wxMBConv classes in wxWindows +%% Author: Ove Kaaven +%% Modified by: +%% Created: 25.03.00 +%% RCS-ID: $Id$ +%% Copyright: (c) 2000 Ove Kaaven +%% Licence: wxWindows license +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\section{wxMBConv classes}\label{mbconvclasses} + +Classes: \helpref{wxMBConv}{wxmbconv}, \helpref{wxMBConvFile}{wxmbconvfile}, +\helpref{wxMBConvUTF7}{wxmbconvutf7}, \helpref{wxMBConvUTF8}{wxmbconvutf8}, +\helpref{wxCSConv}{wxcsconv} + +The wxMBConv classes in wxWindows enables an Unicode-aware application to +easily convert between Unicode and the variety of 8-bit encoding systems still +in use. + +\subsection{Background: The need for conversion} + +As programs are becoming more and more globalized, and user exchange documents +across country boundaries as never before, applications increasingly need to +take into account all the different character sets in use around the world. It +is no longer enough to just depend on the default byte-sized character set that +computers have traditionally used. + +A few years ago, a solution was proposed: the Unicode standard. Able to contain +the complete set of characters in use in one unified global coding system, +it would resolve the character set problems once and for all. + +But it hasn't happened yet, and the migration towards Unicode has created new +challenges, resulting in "compatibility encodings" such as UTF-8. A large +amount of systems out there still depends on the old 8-bit encodings, hampered +by the huge amounts of legacy code still widely deployed. Even sending +Unicode data from one Unicode-aware system to another may need encoding to an +8-bit multibyte encoding (UTF-7 or UTF-8 is typically used for this purpose), to +pass unhindered through any traditional transport channels. + +\subsection{Background: The wxString class} + +If you have compiled wxWindows in Unicode mode, the wxChar type will become +identical to wchar\_t rather than char, and a wxString stores wxChars. Hence, +all wxString manipulation in your application will then operate on Unicode +strings, and almost as easily as working with ordinary char strings (you +just need to remember to use the wxT() macro to encapsulate any string +literals). + +But often, your environment doesn't want Unicode strings. You could be sending +data over a network, or processing a text file for some other application. You +need a way to quickly convert your easily-handled Unicode data to and from a +traditional 8-bit-encoding. And this is what the wxMBConv classes does. + +\subsection{wxMBConv classes} + +The base class for all these conversions is the wxMBConv class (which itself +implements standard libc locale conversion). Derived classes include +wxMBConvFile, wxMBConvUTF7, wxMBConvUTF8, and wxCSConv, which implement +different kinds of conversions. You can also derive your own class for your +own custom encoding and use it, should you need it. All you need to do is +override the MB2WC and WC2MB methods. + +\subsection{wxMBConv objects} + +In C++, for a class to be useful and possible to pass around, it needs to be +instantiated. All of the wxWindows-provided wxMBConv classes have predefined +instances (wxConvLibc, wxConvFile, wxConvUTF7, wxConvUTF8, wxConvLocal). +You can use these predefined objects directly, or you can instantiate your own +objects. + +A variable, wxConvCurrent, points to the conversion object that the user interface +is supposed to use, in the case that the user interface is not Unicode-based (like +with GTK+ 1.2). By default, it points to wxConvLibc or wxConvLocal, depending on +which works best on the current platform. + +\subsection{wxCSConv} + +The wxCSConv class is special because when it's instantiated, you can tell it +which character set it should use, which makes it meaningful to keep many +instances of them around, each with a different character set (or you can +create a wxCSConv instance on the fly). + +The predefined wxCSConv instance, wxConvLocal, is preset to use the +default user character set, but you should rarely need to use it directly, +it is better to go through wxConvCurrent. + +\subsection{Converting strings} + +Once you have chosen which object you want to use to convert your text, +here is how you would use them with wxString. These examples all assume +that you are using a Unicode build of wxWindows, although they will still +compile in a non-Unicode build (they just won't convert anything). + +Example 1: Constructing a wxString from input in current encoding. + +\begin{verbatim} +wxString str(input_data, *wxConvCurrent); +\end{verbatim} + +Example 2: Input in UTF-8 encoding. + +\begin{verbatim} +wxString str(input_data, wxConvUTF8); +\end{verbatim} + +Example 3: Input in KOI8-R. Construction of wxCSConv instance on the fly. + +\begin{verbatim} +wxString str(input_data, wxCSConv(wxT("koi8-r"))); +\end{verbatim} + +Example 4: Printing a wxString to stdout in UTF-8 encoding. + +\begin{verbatim} +puts(str.mb_str(wxConvUTF8)); +\end{verbatim} + +Example 5: Printing a wxString to stdout in custom encoding. +Using preconstructed wxCSConv instance. + +\begin{verbatim} +wxCSConv cust(user_encoding); +printf("Data: %s\\n", (const char*) str.mb_str(cust)); +\end{verbatim} + +Note: Since mb_str() returns a temporary wxCharBuffer to hold the result +of the conversion, you need to explicitly cast it to const char* if you use +it in a vararg context (like with printf). + +\subsection{Converting buffers} + +If you have specialized needs, or just don't want to use wxString, you +can also use the conversion methods of the conversion objects directly. +This can even be useful if you need to do conversion in a non-Unicode +build of wxWindows; converting a string from UTF-8 to the current +encoding should be possible by doing this: + +\begin{verbatim} +wxString str(wxConvUTF8.cMB2WC(input_data), *wxConvCurrent); +\end{verbatim} + +Here, cMB2WC of the UTF8 object returns a wxWCharBuffer containing a Unicode +string. The wxString constructor then converts it back to an 8-bit character +set using the passed conversion object, *wxConvCurrent. (In a Unicode build +of wxWindows, the constructor ignores the passed conversion object and +retains the Unicode data.) + +To print a wxChar buffer to a non-Unicode stdout: + +\begin{verbatim} +printf("Data: %s\\n", (const char*) wxConvCurrent->cWX2MB(unicode_data)); +\end{verbatim} + diff --git a/docs/latex/wx/topics.tex b/docs/latex/wx/topics.tex index c232ebbef4..5b104537eb 100644 --- a/docs/latex/wx/topics.tex +++ b/docs/latex/wx/topics.tex @@ -12,6 +12,7 @@ This chapter contains a selection of topic overviews, first things first: \input truntime.tex \input tstring.tex \input tunicode.tex +\input tmbconv.tex \input ti18n.tex \input tnoneng.tex \input tcontain.tex