Contrary to what a comment in wxTextInputStream::GetChar() said, it is
actually possible to get more than one wide character from a call to
wxMBConv::ToWChar(len+1) even if a previous call to ToWChar(len) failed
to decode anything at all. This happens with wxConvAuto because it keeps
returning an error while it doesn't have enough data to determine if the
input contains a BOM or not, but then returns all the characters
examined so far at once if it turns out that there was no BOM, after
all.
The simplest case in which this created problems was just input starting
with a NUL byte as it as this could be a start of UTF-32BE BOM.
The fix consists in keeping all the bytes read but not yet decoded in
the m_lastBytes buffer and retrying to decode them during the next
GetChar() call. This implies keeping track of how much valid data is
there in m_lastBytes exactly, as we can't discard the already decoded
data immediately, but need to keep it in the buffer too, in order to
allow implementing UngetLast(). Incidentally, UngetLast() was totally
broken for UTF-16/32 input (containing NUL bytes in the middle of the
characters) before and this change fixes this as a side effect.
Also add test cases for previously failing inputs.
Having NextChar() returning wxEOT only for GetChar() to turn it back to
NUL didn't make any sense, just return NUL directly and get rid of
GetChar/NextChar() distinction.
No real changes, just simplify the code.
On the platforms using UTF-16 for wchar_t we can't read nor write Unicode data
one wchar_t at a time as a single half of a surrogate character can't be
converted to or from the encoding of the stream.
To fix this, we may need to store the last wchar_t already read from the
stream but not returned yet in wxTextInputStream::NextChar() and store,
without writing it, the wchar_t passed to wxTextOutputStream::PutChar() until
the second half of the surrogate is written.
See #17070.
This port is not used and is not being worked on, so remove it to reduce the
amount of the code which needs to be updated for every global change.
Also remove tests for VisualAge compiler which isn't used since ages.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@76533 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
This keyword is not expanded by Git which means it's not replaced with the
correct revision value in the releases made using git-based scripts and it's
confusing to have lines with unexpanded "$Id$" in the released files. As
expanding them with Git is not that simple (it could be done with git archive
and export-subst attribute) and there are not many benefits in having them in
the first place, just remove all these lines.
If nothing else, this will make an eventual transition to Git simpler.
Closes#14487.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@74602 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
This change prepares the way for using wxGTK under Windows as this would
still define __WINDOWS__ but use __WXGTK__ instead of __WXMSW__.
Closes#14064.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@70796 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
The logic in this function depends on ToWChar() working correctly so check
that it doesn't return obviously wrong results, e.g. 0 output length for
non-empty input. This was mostly done to detect the problem in wxConvAuto
currently but it could also be useful with user-defined conversions and
shouldn't have a big performance effect on wxTextInputStream so leave these
checks in to facilitate debugging in the future.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@63244 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
wxConvAuto implicitly supposed that the chunk of data passed to it for
translation was big enough to allow it to at least detect the BOM from it.
However this isn't necessarily the case and never is with wxTextInputStream
which reads the bytes one by one.
Fix this by waiting until we have enough data to be able to detect the BOM.
This still doesn't fix the problem with streams without BOM and the
corresponding unit test still fails -- it will need to be fixed at the level
of wxTextInputStream itself later but handling correctly the cases when a BOM
is present is already better than before.
See #11570.
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@63064 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
2. this allows to use wxConvAuto() instead of wxConvUTF8 as default value
for this parameter in the classes which read text from the file: wxConvAuto
automatically recognizes the BOM at the start of file and uses the correct
conversion
3. don't use Windows for UTF-7 conversions as there is no way to make it
fail on invalid UTF-7 strings; use our own wxMBConvUtf7 instead
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@38570 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775