No real changes, but just get rid of two functions doing the same thing
but using (semantically) different API, this was just too confusing.
Change all the code to use wxDecodeSurrogate() that encapsulates
decoding the surrogate and advancing the input pointer as needed and so
is less error-prone.
More generally, change the code to use end pointers instead of
decrementing the length to check for the end condition: this is more
clear, simpler and probably even more efficient.
Pass length value to decode_utf16() and end pointer to
wxDecodeSurrogate() to ensure that we never read beyond the end of the
buffer when decoding UTF-16 when the last (complete) 16 bit value in the
buffer is the first half of a surrogate.
This had been previously partially addressed by ad hoc changes, e.g.
f72aa7b1c9 did it for wxMBConvUTF16swap,
but the problem still remained for wxMBConvUTF16straight. Ensure that
this bug is fixed everywhere now but making it impossible to even try
decoding a surrogate without providing the buffer length.
Under Unix systems, this is the same thing, but under MSW, where
sizeof(wchar_t) == 2, this allows to pass wchar_t pointers to this
function without casts.
It also makes it consistent with wxDecodeSurrogate() and allows to get
rid of another ugly cast there.
No real changes.
Calling wxMBConvUTF7::ToWChar(..., "+", 1) resulted in reading
uninitialized memory as the decoding code didn't check that there were
any bytes left when switching to the "shifted" mode.
Fix this by explicitly checking for this and returning an error if
nothing is left.
This commit refactors the overloads of cMB2WC() and cWC2MB() methods
taking raw pointers and buffers to reuse the same code and fixes the
wrong length of the buffer returned by cWC2MB(wchar_t*) overload for
conversions using multiple bytes to represent the NUL terminator
character (it previously was wrong for UTF-16 and UTF-32 conversions due
to wrongly subtracting 1 from the length when creating it instead of
correctly subtracting GetMBNulLen()) and the wrong length of the buffer
returned from cMB2WC(char*) overload where no adjustment for the
trailing NUL was done at all.
Also return simple default-constructed buffers from these methods in
case of failure instead of using wxScopedCharBuffer::CreateNonOwned()
which is less obvious and less efficient (even if the latter probably
doesn't matter here because it's only done in case of an error).
Finally, add tests checking that using WC2MB() or either of cWC2MB()
overloads returns the buffers of the same length and with the same
contents.
The Darwin linking problem mentioned in the comment doesn't exist in any
of the still supported macOS versions, so it doesn't make sense to
continue working around it.
Don't optimize the required length as this is a tiny gain resulting in big
problems with the strings containing surrogates for which the actual result is
shorter than the length returned, resulting in extra NUL bytes at the end of
the converted buffer.
This is similar to 3410aa372f (see #16298) but
for UTF-32 and not UTF-16.
Closes#17070.
These functions were almost but not quite identical to it:
wxSafeConvertMB2WX() tried the current locale encoding before UTF-8 while
wxConvWhateverWorks tries UTF-8 first and then the current locale encoding.
The latter behaviour is more correct as valid UTF-8 could be misinterpreted as
some legacy multibyte encoding otherwise, so get rid of this difference and
just forward these functions to wxConvWhateverWorks.
This ensures that we can create output files with Unicode names even when
they're not representable in the current locale encoding, notably when the
current locale has never been changed and is still the default "C" one, not
supporting anything else other than 7 bit ASCII.
Credits for the new class name go to Woody Allen.
It doesn't make sense to use any fallbacks when converting to/from UTF-8 and
this wasn't even done consistently as only wxSafeConvertWX2MB() used
MAP_INVALID_UTF8_TO_OCTAL, but not wxSafeConvertMB2WX().
More importantly, UTF-8 conversion can never fail for a valid Unicode string,
so there is no need for any fall backs.
Correctly account for the second half of the surrogate in
wxMBConvUTF8::FromWChar() implementation, this makes it actually work for the
strings containing surrogates on the platforms using UTF-16 encoding for
wchar_t (such as MSW).
See #17070.
UTF-32 conversions only estimate, from above, the size of the output buffer
needed, so the value returned from the first call to FromWChar(NULL) in
cWC2MB() can be inexact for them and we need to return the value returned by
the second call to FromWChar() doing the real conversion from cWC2MB() itself
to ensure that we return the correct output length.
See #17070.
Correctly fail if the wide string being converted is UTF-16 encoded (which can
only happen on platforms using 16 bit wchar_t, i.e. MSW) and ends in the
middle of a surrogate pair.
Notice that other conversions still wrongly encode invalid wchar_t sequences
such as 0xd800 not followed by anything, this will need to be fixed in the
future, but for now at least make it work for the most commonly used
conversion.
See #17070.
Windows CE doesn't seem to be supported by Microsoft any longer. Last CE
release was in early 2013 and the PocketPC and Smartphone targets supported by
wxWidgets are long gone.
The build files where already removed in an earlier cleanup this commit
removes all files, every #ifdef and all documentation regarding the Windows CE
support.
Closes https://github.com/wxWidgets/wxWidgets/pull/81
Unless the table is faulty, this comparison can never fail. It is thus redundant and not needed. As optimizing compilers aren't smart enough yet to detect this, this commit removes the redundant check.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@78264 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
Don't stop converting subsequent chunks just because the length of one of them
was 0: this can happen if the first character of a string is a NUL or if there
are two (or more) NULs in it later.
Simply remove the check for this and continue as usual even in this case.
Also add a unit test verifying that we do translate NULs in input into NULs in
output.
Closes#16620.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@78021 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
Don't optimize the returned length for surrogate-less case, this does save a
pass of the string but at the price of returning a wrong result, which is not
worth it, just compute the really required length exactly.
Closes#16298.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@76622 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
Most importantly, this allows us to remove all MSLU-related stuff.
Some functions which were previously loaded dynamically can now be just used
directly, too.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@76535 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
We don't support systems predating Windows 2000 SP4 any more, so there is no
need to check for them. This also allows to get rid of the code checking for
conversion correctness.
Also remove the broken URLs from the comments, they didn't contain any
particularly useful information anyhow.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@76406 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
Make overriding virtual methods more explicit and enable additional checks
provided by C++11 compilers when "override" is used.
Closes#16100.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@76173 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
We read beyond the provided maximal length as we didn't update the remaining
length while parsing the remaining bytes of an UTF-8-encoded code point.
Fix this and add a test for it.
Closes#15901.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@75733 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
This keyword is not expanded by Git which means it's not replaced with the
correct revision value in the releases made using git-based scripts and it's
confusing to have lines with unexpanded "$Id$" in the released files. As
expanding them with Git is not that simple (it could be done with git archive
and export-subst attribute) and there are not many benefits in having them in
the first place, just remove all these lines.
If nothing else, this will make an eventual transition to Git simpler.
Closes#14487.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@74602 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
wxCharBuffer already initializes the last byte of the buffer it allocates to 0
so there is no need to do it explicitly.
Also don't allocate an extra byte, wxCharBuffer already adds one to the length
passed to it for the trailing NUL.
See #13885.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@73141 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
Apply the same fix as was done in r68694 for ToWChar() to FromWChar(): it also
returned an off by 1 value when not using MAP_INVALID_UTF8_NOT.
Closes#13400.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@70462 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
wxMBConvStrictUTF8::FromWChar() didn't update the input length correctly when
encountering a surrogate while decoding UTF-16 and could read beyond the end
of the input buffer in this case.
Fix this by simply adjusting the input length when a surrogate is read.
Closes#13614.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@69676 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
wxMBConvUTF8::ToWChar() was off by 1 when the input length was explicitly
specified, the extra NUL should only be added in the implicit length case.
This bug didn't occur for the default wxMBConvUTF8 object as it simply
forwarded to the base class wxMBConvStrictUTF8 implementation but it happened
when MAP_INVALID_UTF8_TO_OCTAL or MAP_INVALID_UTF8_TO_PUA was used.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@68694 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
In optimized build g++ warned about the second element of two-element array
passed to encode_utf16() being possibly uninitialized. This wasn't really the
case but change the code just to avoid the warnings.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@68112 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
Deferred initialization code was not MT-safe and just wasn't that useful
anyhow because it is rare to create a wxCSConv object and not use it
afterwards.
Remove the deferred initialization logic and create the real conversion used
by wxCSConv immediately in its ctor.
Also improve the comments by clearly explaining the possible values of
wxCSConv::m_name and m_encoding.
Closes#12630.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@66119 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
Use just "char *" for wxMBConv_iconv::m_name to avoid MT-safety problems
related to using a wxString (which is not always MT-safe) from multiple
threads.
See #12630.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@65968 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
Replace many comments indicating that the C cast used was really a
const_cast<> with the proper cast itself. There is no reason to not use it any
longer, all the supported compilers understand it.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@65861 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
Returning invalid buffer for empty input is unexpected and resulted in e.g.
wxString::utf8_str() returning NULL and not "" in ANSI build for empty strings
(which, in turn, resulted in crashes in the test suite and undoubtedly not
only) as well as crashes when calling GTK+ functions (see #12432). Other uses
of cMB2WC() also show that NULL return value from it is unexpected as it is
often passed to CRT functions not accepting NULL.
So return empty buffer instead for empty input to avoid all these problems.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@65836 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
Use wxDELETE[A]() functions which automatically NULL out their arguments after
deleting them instead of doing it manually.
Closes#9685.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@64656 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
wxWidgets requires wchar_t for some time now; wx/chartype.h has a check
to fail complation without it. Simplify code by removing now-dead code
for the !wxUSE_WCHAR_T case.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@63991 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
The variable "lenChunk" was incorrectly used as the length of the wide string
chunk which could result in wrong output.
Worse, the output buffer could be overflown for the final chunk because it
didn't have to have enough space for the trailing NUL(s) in it.
Fix both bugs and added unit tests for them.
Based on patch by Kuang-che Wu.
Closes#11486.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@62793 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
When converting a fixed number of characters we need to take any NULs inside
the buffer being converted into account for our return value -- but this
wasn't done and converting 2 characters 'x' and '\0' returned only 1, even if
the length 2 was explicitly specified.
Fix this bug and add a unit test checking for it.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@62141 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
These overloads allow not to worry about buffer lengths and just convert
between wxCharBuffer and wxWCharBuffer directly in a convenient way.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@61896 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775