Conversation
|
Minimum allowed coverage is Generated by 🐒 cobertura-action against 01c0a19 |
|
LGTM |
|
Thanks for the review, Mike. I plan to squash a few more of the static analysis issues before merging. |
|
I am also toying with the idea of adding a build system flag for the UTF8 feature. E.g. have it enabled by default, but allow building without it for a leaner library with: meson setup build -Denable-utf8=false |
cd14867 to
83372fd
Compare
|
I'm happy with the static analysis situation now. What remains are cognitive complexity, nested control structures, and a handful of spurious errors caused by SonarQube not understanding libcheck's START_TEST macros. I refactored a few more of those tangled error handling that used goto between blocks. This led to a couple of corner case bugs being found that are now fixed and unit tests added. |
127e1fd to
24f9f6d
Compare
Port the UTF-8 string manipulation modules from bstrlib to this fork. Credits to Paul Hsieh. utf8util is a standalone low-level module providing a forward iterator over UTF-8 byte sequences (utf8IteratorInit, utf8IteratorGetNextCodePoint, utf8IteratorGetCurrCodePoint, utf8ScanBackwardsForCodePoint) along with the cpUcs4/cpUcs2 type definitions and the isLegalUnicodeCodePoint macro. buniutil builds on top of it and bstrlib to provide four higher-level functions: buIsUTF8Content, buAppendBlkUcs4, buGetBlkUTF16, and buAppendBlkUTF16. Both modules are compiled into the main libbstring binary, enabled by default and controlled by the new enable-utf8 build option. Two adaptations were made to fit bstring's conventions: const_bstring was replaced with const bstring throughout (bstring dropped that typedef), and BSTR_PUBLIC visibility attributes were added to all public declarations. A new test module tests/testutf8.c was written from scratch, covering the full API surface including ASCII and multi-byte iteration, error recovery, surrogate pair encoding/decoding, BOM handling, and null/invalid-argument guards.
|



Port the UTF-8 string manipulation modules from bstrlib to this fork. Credits to Paul Hsieh.
utf8util is a standalone low-level module providing a forward iterator over UTF-8 byte sequences (utf8IteratorInit, utf8IteratorGetNextCodePoint, utf8IteratorGetCurrCodePoint, utf8ScanBackwardsForCodePoint) along with the cpUcs4/cpUcs2 type definitions and the isLegalUnicodeCodePoint macro.
buniutil builds on top of it and bstrlib to provide four higher-level functions: buIsUTF8Content, buAppendBlkUcs4, buGetBlkUTF16, and buAppendBlkUTF16. Both modules are compiled into the main libbstring binary, enabled by default and controlled by the new enable-utf8 build option.
Two adaptations were made to fit bstring's conventions: const_bstring was replaced with const bstring throughout (bstring dropped that typedef), and BSTR_PUBLIC visibility attributes were added to all public declarations.
A new test module tests/testutf8.c was written from scratch, covering the full API surface including ASCII and multi-byte iteration, error recovery, surrogate pair encoding/decoding, BOM handling, and null/invalid-argument guards.
compared to the original code by Paul Hsieh, the following additional improvements have been made