31 Commits

Author SHA1 Message Date
87c41c0947 html: add simple tokenizer and parser
Signed-off-by: Simon Rozman <simon@rozman.si>
2023-11-17 15:14:54 +01:00
424f297c7b sgml: sgml2wstr→sgml2str, wstr2sgml→str2sgml 🧨
This is analogous to string.hpp's strlen, strcpy, strcat, which use C++
polymorphism rather than function name decorations for char/wchar_t
flavors.

Signed-off-by: Simon Rozman <simon@rozman.si>
2023-11-17 15:14:53 +01:00
52d9956891 Cleanup
Signed-off-by: Simon Rozman <simon@rozman.si>
2023-11-17 15:14:53 +01:00
ee8f54ee5f Fix to compile for Linux
Signed-off-by: Simon Rozman <simon@rozman.si>
2023-11-08 13:48:41 +01:00
856be3a0d8 Revise #include to make each .hpp individually compilable
Mind that min/max Windows.h mess is Microsoft's problem, not ours.

Signed-off-by: Simon Rozman <simon@rozman.si>
2023-10-18 09:12:06 +02:00
ab8d37ee75 Turn assert() into _Analysis_assume_ on Release builds
While runtime asserts also served as MSVC Code Analysis hints, the lack
of asserts in Release builds provides no hints to Code Analysis which
rises a lot of warnings then.

Maybe I should learn how to use SAL to annotate <ptr, len> parameter
pairs to allow ptr==nullptr when and only when len==0? 😇

Signed-off-by: Simon Rozman <simon@rozman.si>
2023-10-10 16:43:07 +02:00
6f19e5250d parser: weasel winsock2.h support
This is a royal PITA to get compiled under various combinations of
WIN32_LEAN_AND_MEAN and _WINSOCKAPI_ combinations.

Signed-off-by: Simon Rozman <simon@rozman.si>
2023-10-10 16:42:04 +02:00
41d764eeef parser: fix compilation for macOS
Signed-off-by: Simon Rozman <simon@rozman.si>
2023-09-23 17:58:40 +02:00
50fea81f83 parser: cleanup
Signed-off-by: Simon Rozman <simon@rozman.si>
2023-09-20 08:07:50 +02:00
613bba9e05 parser: refine IBAN checking
Allow arbitrary spacing, minor optimizations...

Signed-off-by: Simon Rozman <simon@rozman.si>
2023-09-19 21:40:36 +02:00
b5984ea8f2 Port to macOS
Signed-off-by: Simon Rozman <simon@rozman.si>
2023-09-19 18:02:18 +02:00
27afd7afa5 parser: add IBAN, RF and SI support
Signed-off-by: Simon Rozman <simon@rozman.si>
2023-09-19 16:53:16 +02:00
edd480d64b macOS fixes
Signed-off-by: Simon Rozman <simon@rozman.si>
2023-09-14 15:57:55 +02:00
66f8a6c3b7 Re-add UTF-8 BOM XCode is removing
Visual Studio IDE really needs it on non-UTF-8 PCs.

Signed-off-by: Simon Rozman <simon@rozman.si>
2023-09-14 09:13:04 +02:00
83d7fd844d Port to macOS
Signed-off-by: Simon Rozman <simon@rozman.si>
2023-09-12 16:55:16 +02:00
2c2680dfb3 Resolve _WINSOCKAPI_ and WIN32_LEAN_AND_MEAN hell. Hopefully!
Signed-off-by: Simon Rozman <simon@rozman.si>
2023-08-25 03:56:27 +02:00
6bb4027553 parser::date_format_t: make classic enum
With scoped enum, bitwise operations in C++ require insane amount of
type-casting.

Signed-off-by: Simon Rozman <simon@rozman.si>
2023-08-22 17:03:24 +02:00
72ce0a03e5 system: add
Windows is very peculiar with #include <windows.h>. Besides we need some
OS primitive wrappers that are OS-specific.

Signed-off-by: Simon Rozman <simon@rozman.si>
2023-08-18 15:05:18 +02:00
3e69770585 parser: Fix basic_scientific_numeral detection
Signed-off-by: Simon Rozman <simon@rozman.si>
2023-07-24 09:46:18 +02:00
58caa542ac Add missing UTF-8 BOM
Many many many Windows out there are still using Windows-1252 and
similar ancient encodings.

Signed-off-by: Simon Rozman <simon@rozman.si>
2023-07-23 14:02:58 +02:00
82b25cc24a parser: add missing #include
Signed-off-by: Simon Rozman <simon@rozman.si>
2023-07-21 11:53:48 +02:00
c5f972971e parser: revise
Signed-off-by: Simon Rozman <simon@rozman.si>
2023-07-17 17:05:31 +02:00
aedb0921f2 parser: adopt changes from string
Signed-off-by: Simon Rozman <simon@rozman.si>
2023-07-17 12:57:23 +02:00
6cdcb08365 sgml: rename str -> wstr
The sgml.hpp is about converting between SGML and UTF-16/Unicode
actually. The "wstr" naming aligns better with std::wstring, wchar_t
etc.

Signed-off-by: Simon Rozman <simon@rozman.si>
2023-07-17 12:51:41 +02:00
1fb78a78f2 parser: Stabilize HTTP suite
Signed-off-by: Simon Rozman <simon@rozman.si>
2023-03-16 12:35:52 +01:00
b028c8772e parser: Cleanup
Signed-off-by: Simon Rozman <simon@rozman.si>
2023-03-16 12:35:08 +01:00
a59163733a parser: Add missing constructors to allow locale propagation
Classes using m_locale must allow locale configuration in their
constructor. Otherwise m_locale was always set to default by
basic_parser<> constructor.

Signed-off-by: Simon Rozman <simon@rozman.si>
2023-03-16 11:21:53 +01:00
38fac2837f parser: Duplicate locale
The Release testing revealed that compiler might free temporary
std::locale instances sooner than we thought, exposing UaF.

On 64-bit arch, a reference takes 8 bytes, a std::locale copy takes 16
bytes. So duplicating a locale in each parser instance is not such a big
deal to risk an UaF.

Signed-off-by: Simon Rozman <simon@rozman.si>
2023-03-16 11:02:23 +01:00
127704d2d8 parser: Rename "tester" to "parser"
Signed-off-by: Simon Rozman <simon@rozman.si>
2023-03-16 10:58:15 +01:00
33012e1513 parser: Use ranged for loops where appropriate
Signed-off-by: Simon Rozman <simon@rozman.si>
2023-03-16 09:38:53 +01:00
308f63490c Rename .h to .hpp
These files are C++ only. They should either have no extension like
standard C++ headers (which is cumbersome on Windows environments), or
.hpp.

.h is used for C and hybrid C/C++ headers.

Signed-off-by: Simon Rozman <simon@rozman.si>
2023-03-15 21:49:41 +01:00