embedded '\0' characters. Fixed various wxString::*find* methods (patch by Robert Vazan), plus added missing overloaded variants of find_first/last_(not_)of. Added more tests in console sample. git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@23357 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
		
			
				
	
	
		
			272 lines
		
	
	
		
			14 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
			
		
		
	
	
			272 lines
		
	
	
		
			14 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
| \section{wxString overview}\label{wxstringoverview}
 | |
| 
 | |
| Classes: \helpref{wxString}{wxstring}, \helpref{wxArrayString}{wxarraystring}, \helpref{wxStringTokenizer}{wxstringtokenizer}
 | |
| 
 | |
| \subsection{Introduction}
 | |
| 
 | |
| wxString is a class which represents a character string of arbitrary length (limited by 
 | |
| {\it MAX\_INT} which is usually 2147483647 on 32 bit machines) and containing
 | |
| arbitrary characters. The ASCII NUL character is allowed, but be aware that
 | |
| in the current string implementation some methods might not work correctly
 | |
| in this case.
 | |
| 
 | |
| wxString works with both ASCII (traditional, 7 or 8 bit, characters) as well as
 | |
| Unicode (wide characters) strings.
 | |
| 
 | |
| This class has all the standard operations you can expect to find in a string class:
 | |
| dynamic memory management (string extends to accommodate new characters),
 | |
| construction from other strings, C strings and characters, assignment operators,
 | |
| access to individual characters, string concatenation and comparison, substring
 | |
| extraction, case conversion, trimming and padding (with spaces), searching and
 | |
| replacing and both C-like \helpref{Printf()}{wxstringprintf} and stream-like
 | |
| insertion functions as well as much more - see \helpref{wxString}{wxstring} 
 | |
| for a list of all functions.
 | |
| 
 | |
| \subsection{Comparison of wxString to other string classes}
 | |
| 
 | |
| The advantages of using a special string class instead of working directly with
 | |
| C strings are so obvious that there is a huge number of such classes available.
 | |
| The most important advantage is the need to always
 | |
| remember to allocate/free memory for C strings; working with fixed size buffers almost
 | |
| inevitably leads to buffer overflows. At last, C++ has a standard string class
 | |
| (std::string). So why the need for wxString?
 | |
| 
 | |
| There are several advantages:
 | |
| 
 | |
| \begin{enumerate}\itemsep=0pt
 | |
| \item {\bf Efficiency} This class was made to be as efficient as possible: both
 | |
| in terms of size (each wxString objects takes exactly the same space as a {\it
 | |
| char *} pointer, sing \helpref{reference counting}{wxstringrefcount}) and speed.
 | |
| It also provides performance \helpref{statistics gathering code}{wxstringtuning} 
 | |
| which may be enabled to fine tune the memory allocation strategy for your
 | |
| particular application - and the gain might be quite big.
 | |
| \item {\bf Compatibility} This class tries to combine almost full compatibility
 | |
| with the old wxWindows 1.xx wxString class, some reminiscence to MFC CString
 | |
| class and 90\% of the functionality of std::string class.
 | |
| \item {\bf Rich set of functions} Some of the functions present in wxString are
 | |
| very useful but don't exist in most of other string classes: for example, 
 | |
| \helpref{AfterFirst}{wxstringafterfirst}, 
 | |
| \helpref{BeforeLast}{wxstringbeforelast}, \helpref{operator<<}{wxstringoperatorout} 
 | |
| or \helpref{Printf}{wxstringprintf}. Of course, all the standard string
 | |
| operations are supported as well.
 | |
| \item {\bf Unicode} wxString is Unicode friendly: it allows to easily convert
 | |
| to and from ANSI and Unicode strings in any build mode (see the 
 | |
| \helpref{Unicode overview}{unicode} for more details) and maps to either
 | |
| {\tt string} or {\tt wstring} transparently depending on the current mode.
 | |
| \item {\bf Used by wxWindows} And, of course, this class is used everywhere
 | |
| inside wxWindows so there is no performance loss which would result from
 | |
| conversions of objects of any other string class (including std::string) to
 | |
| wxString internally by wxWindows.
 | |
| \end{enumerate}
 | |
| 
 | |
| However, there are several problems as well. The most important one is probably
 | |
| that there are often several functions to do exactly the same thing: for
 | |
| example, to get the length of the string either one of 
 | |
| length(), \helpref{Len()}{wxstringlen} or 
 | |
| \helpref{Length()}{wxstringlength} may be used. The first function, as almost
 | |
| all the other functions in lowercase, is std::string compatible. The second one
 | |
| is "native" wxString version and the last one is wxWindows 1.xx way. So the
 | |
| question is: which one is better to use? And the answer is that:
 | |
| 
 | |
| {\bf The usage of std::string compatible functions is strongly advised!} It will
 | |
| both make your code more familiar to other C++ programmers (who are supposed to
 | |
| have knowledge of std::string but not of wxString), let you reuse the same code
 | |
| in both wxWindows and other programs (by just typedefing wxString as std::string
 | |
| when used outside wxWindows) and by staying compatible with future versions of
 | |
| wxWindows which will probably start using std::string sooner or later too.
 | |
| 
 | |
| In the situations where there is no corresponding std::string function, please
 | |
| try to use the new wxString methods and not the old wxWindows 1.xx variants
 | |
| which are deprecated and may disappear in future versions.
 | |
| 
 | |
| \subsection{Some advice about using wxString}\label{wxstringadvices}
 | |
| 
 | |
| Probably the main trap with using this class is the implicit conversion operator to 
 | |
| {\it const char *}. It is advised that you use \helpref{c\_str()}{wxstringcstr}
 | |
| instead to clearly indicate when the conversion is done. Specifically, the
 | |
| danger of this implicit conversion may be seen in the following code fragment:
 | |
| 
 | |
| \begin{verbatim}
 | |
| // this function converts the input string to uppercase, output it to the screen
 | |
| // and returns the result
 | |
| const char *SayHELLO(const wxString& input)
 | |
| {
 | |
|     wxString output = input.Upper();
 | |
| 
 | |
|     printf("Hello, %s!\n", output);
 | |
| 
 | |
|     return output;
 | |
| }
 | |
| \end{verbatim}
 | |
| 
 | |
| There are two nasty bugs in these three lines. First of them is in the call to the 
 | |
| {\it printf()} function. Although the implicit conversion to C strings is applied
 | |
| automatically by the compiler in the case of
 | |
| 
 | |
| \begin{verbatim}
 | |
|     puts(output);
 | |
| \end{verbatim}
 | |
| 
 | |
| because the argument of {\it puts()} is known to be of the type {\it const char *},
 | |
| this is {\bf not} done for {\it printf()} which is a function with variable
 | |
| number of arguments (and whose arguments are of unknown types). So this call may
 | |
| do anything at all (including displaying the correct string on screen), although
 | |
| the most likely result is a program crash. The solution is to use 
 | |
| \helpref{c\_str()}{wxstringcstr}: just replace this line with
 | |
| 
 | |
| \begin{verbatim}
 | |
|     printf("Hello, %s!\n", output.c_str());
 | |
| \end{verbatim}
 | |
| 
 | |
| The second bug is that returning {\it output} doesn't work. The implicit cast is
 | |
| used again, so the code compiles, but as it returns a pointer to a buffer
 | |
| belonging to a local variable which is deleted as soon as the function exits,
 | |
| its contents is totally arbitrary. The solution to this problem is also easy:
 | |
| just make the function return wxString instead of a C string.
 | |
| 
 | |
| This leads us to the following general advice: all functions taking string
 | |
| arguments should take {\it const wxString\&} (this makes assignment to the
 | |
| strings inside the function faster because of 
 | |
| \helpref{reference counting}{wxstringrefcount}) and all functions returning
 | |
| strings should return {\it wxString} - this makes it safe to return local
 | |
| variables.
 | |
| 
 | |
| \subsection{Other string related functions and classes}
 | |
| 
 | |
| As most programs use character strings, the standard C library provides quite
 | |
| a few functions to work with them. Unfortunately, some of them have rather
 | |
| counter-intuitive behaviour (like strncpy() which doesn't always terminate the
 | |
| resulting string with a NULL) and are in general not very safe (passing NULL
 | |
| to them will probably lead to program crash). Moreover, some very useful
 | |
| functions are not standard at all. This is why in addition to all wxString
 | |
| functions, there are also a few global string functions which try to correct
 | |
| these problems: \helpref{wxIsEmpty()}{wxisempty} verifies whether the string
 | |
| is empty (returning {\tt true} for {\tt NULL} pointers), 
 | |
| \helpref{wxStrlen()}{wxstrlen} also handles NULLs correctly and returns 0 for
 | |
| them and \helpref{wxStricmp()}{wxstricmp} is just a platform-independent
 | |
| version of case-insensitive string comparison function known either as
 | |
| stricmp() or strcasecmp() on different platforms.
 | |
| 
 | |
| The {\tt <wx/string.h>} header also defines \helpref{wxSnprintf}{wxsnprintf} 
 | |
| and \helpref{wxVsnprintf}{wxvsnprintf} functions which should be used instead
 | |
| of the inherently dangerous standard {\tt sprintf()} and which use {\tt
 | |
| snprintf()} instead which does buffer size checks whenever possible. Of
 | |
| course, you may also use \helpref{wxString::Printf}{wxstringprintf} which is
 | |
| also safe.
 | |
| 
 | |
| There is another class which might be useful when working with wxString: 
 | |
| \helpref{wxStringTokenizer}{wxstringtokenizer}. It is helpful when a string must
 | |
| be broken into tokens and replaces the standard C library {\it
 | |
| strtok()} function.
 | |
| 
 | |
| And the very last string-related class is \helpref{wxArrayString}{wxarraystring}: it
 | |
| is just a version of the "template" dynamic array class which is specialized to work
 | |
| with strings. Please note that this class is specially optimized (using its
 | |
| knowledge of the internal structure of wxString) for storing strings and so it is
 | |
| vastly better from a performance point of view than a wxObjectArray of wxStrings.
 | |
| 
 | |
| \subsection{Reference counting and why you shouldn't care about it}\label{wxstringrefcount}
 | |
| 
 | |
| wxString objects use a technique known as {\it copy on write} (COW). This means
 | |
| that when a string is assigned to another, no copying really takes place: only
 | |
| the reference count on the shared string data is incremented and both strings
 | |
| share the same data.
 | |
| 
 | |
| But as soon as one of the two (or more) strings is modified, the data has to be
 | |
| copied because the changes to one of the strings shouldn't be seen in the
 | |
| others. As data copying only happens when the string is written to, this is
 | |
| known as COW.
 | |
| 
 | |
| What is important to understand is that all this happens absolutely
 | |
| transparently to the class users and that whether a string is shared or not is
 | |
| not seen from the outside of the class - in any case, the result of any
 | |
| operation on it is the same.
 | |
| 
 | |
| Probably the unique case when you might want to think about reference
 | |
| counting is when a string character is taken from a string which is not a
 | |
| constant (or a constant reference). In this case, due to C++ rules, the
 | |
| "read-only" {\it operator[]} (which is the same as 
 | |
| \helpref{GetChar()}{wxstringgetchar}) cannot be chosen and the "read/write" 
 | |
| {\it operator[]} (the same as 
 | |
| \helpref{GetWritableChar()}{wxstringgetwritablechar}) is used instead. As the
 | |
| call to this operator may modify the string, its data is unshared (COW is done)
 | |
| and so if the string was really shared there is some performance loss (both in
 | |
| terms of speed and memory consumption). In the rare cases when this may be
 | |
| important, you might prefer using \helpref{GetChar()}{wxstringgetchar} instead
 | |
| of the array subscript operator for this reasons. Please note that 
 | |
| \helpref{at()}{wxstringat} method has the same problem as the subscript operator in
 | |
| this situation and so using it is not really better. Also note that if all
 | |
| string arguments to your functions are passed as {\it const wxString\&} (see the
 | |
| section \helpref{Some advice}{wxstringadvices}) this situation will almost
 | |
| never arise because for constant references the correct operator is called automatically.
 | |
| 
 | |
| \subsection{Tuning wxString for your application}\label{wxstringtuning}
 | |
| 
 | |
| \normalbox{{\bf Note:} this section is strictly about performance issues and is
 | |
| absolutely not necessary to read for using wxString class. Please skip it unless
 | |
| you feel familiar with profilers and relative tools. If you do read it, please
 | |
| also read the preceding section about 
 | |
| \helpref{reference counting}{wxstringrefcount}.}
 | |
| 
 | |
| For the performance reasons wxString doesn't allocate exactly the amount of
 | |
| memory needed for each string. Instead, it adds a small amount of space to each
 | |
| allocated block which allows it to not reallocate memory (a relatively
 | |
| expensive operation) too often as when, for example, a string is constructed by
 | |
| subsequently adding one character at a time to it, as for example in:
 | |
| 
 | |
| \begin{verbatim}
 | |
| // delete all vowels from the string
 | |
| wxString DeleteAllVowels(const wxString& original)
 | |
| {
 | |
|     wxString result;
 | |
| 
 | |
|     size_t len = original.length();
 | |
|     for ( size_t n = 0; n < len; n++ )
 | |
|     {
 | |
|         if ( strchr("aeuio", tolower(original[n])) == NULL )
 | |
|             result += original[n];
 | |
|     }
 | |
| 
 | |
|     return result;
 | |
| }
 | |
| \end{verbatim}
 | |
| 
 | |
| This is quite a common situation and not allocating extra memory at all would
 | |
| lead to very bad performance in this case because there would be as many memory
 | |
| (re)allocations as there are consonants in the original string. Allocating too
 | |
| much extra memory would help to improve the speed in this situation, but due to
 | |
| a great number of wxString objects typically used in a program would also
 | |
| increase the memory consumption too much.
 | |
| 
 | |
| The very best solution in precisely this case would be to use 
 | |
| \helpref{Alloc()}{wxstringalloc} function to preallocate, for example, len bytes
 | |
| from the beginning - this will lead to exactly one memory allocation being
 | |
| performed (because the result is at most as long as the original string).
 | |
| 
 | |
| However, using Alloc() is tedious and so wxString tries to do its best. The
 | |
| default algorithm assumes that memory allocation is done in granularity of at
 | |
| least 16 bytes (which is the case on almost all of wide-spread platforms) and so
 | |
| nothing is lost if the amount of memory to allocate is rounded up to the next
 | |
| multiple of 16. Like this, no memory is lost and 15 iterations from 16 in the
 | |
| example above won't allocate memory but use the already allocated pool.
 | |
| 
 | |
| The default approach is quite conservative. Allocating more memory may bring
 | |
| important performance benefits for programs using (relatively) few very long
 | |
| strings. The amount of memory allocated is configured by the setting of {\it
 | |
| EXTRA\_ALLOC} in the file string.cpp during compilation (be sure to understand
 | |
| why its default value is what it is before modifying it!). You may try setting
 | |
| it to greater amount (say twice nLen) or to 0 (to see performance degradation
 | |
| which will follow) and analyse the impact of it on your program. If you do it,
 | |
| you will probably find it helpful to also define WXSTRING\_STATISTICS symbol
 | |
| which tells the wxString class to collect performance statistics and to show
 | |
| them on stderr on program termination. This will show you the average length of
 | |
| strings your program manipulates, their average initial length and also the
 | |
| percent of times when memory wasn't reallocated when string concatenation was
 | |
| done but the already preallocated memory was used (this value should be about
 | |
| 98\% for the default allocation policy, if it is less than 90\% you should
 | |
| really consider fine tuning wxString for your application).
 | |
| 
 | |
| It goes without saying that a profiler should be used to measure the precise
 | |
| difference the change to EXTRA\_ALLOC makes to your program.
 | |
| 
 |