Fail to convert wide string with incomplete surrogates to UTF-8

Correctly fail if the wide string being converted is UTF-16 encoded (which can only happen on platforms using 16 bit wchar_t, i.e. MSW) and ends in the middle of a surrogate pair. Notice that other conversions still wrongly encode invalid wchar_t sequences such as 0xd800 not followed by anything, this will need to be fixed in the future, but for now at least make it work for the most commonly used conversion. See #17070.
2015-11-12 02:39:36 +01:00
parent 6602eb3384
commit 048ba4b509
2 changed files with 35 additions and 6 deletions
--- a/tests/mbconv/mbconvtest.cpp
+++ b/tests/mbconv/mbconvtest.cpp
@@ -203,6 +203,12 @@ private:
    void UTF8PUA_f4_80_82_a5() { UTF8PUA("\xf4\x80\x82\xa5", u1000a5); }
    void UTF8Octal_backslash245() { UTF8Octal("\\245", L"\\245"); }

+    // Test that converting string with incomplete surrogates in them fails
+    // (surrogates are only used in UTF-16, i.e. when wchar_t is 16 bits).
+#if SIZEOF_WCHAR_T == 2
+    void UTF8_fail_broken_surrogates();
+#endif // SIZEOF_WCHAR_T == 2
+
    // implementation for the utf-8 tests (see comments below)
    void UTF8(const char *charSequence, const wchar_t *wideSequence);
    void UTF8PUA(const char *charSequence, const wchar_t *wideSequence);
@@ -461,6 +467,12 @@ void MBConvTestCase::UTF8Tests()
        wxConvUTF8,
        1
        );
+
+#if SIZEOF_WCHAR_T == 2
+    // Can't use \ud800 as it's an invalid Unicode character.
+    const wchar_t wc = 0xd800;
+    CPPUNIT_ASSERT_EQUAL(wxCONV_FAILED, wxConvUTF8.FromWChar(NULL, 0, &wc, 1));
+#endif // SIZEOF_WCHAR_T == 2
 }

 void MBConvTestCase::UTF16LETests()