Bug 219847 - mbsnrtowcs(3) man page behavior with glibc incorrect (and POSIX.1-2024 incompatible)
Summary: mbsnrtowcs(3) man page behavior with glibc incorrect (and POSIX.1-2024 incomp...
Status: NEW
Alias: None
Product: Documentation
Classification: Unclassified
Component: man-pages (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: documentation_man-pages@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-03-06 11:14 UTC by explorer09
Modified: 2025-03-09 18:46 UTC (History)
1 user (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description explorer09 2025-03-06 11:14:31 UTC
mbsnrtowcs(3) man page has a part saying:

"According to POSIX.1, if the input buffer ends with an incomplete
character, it is unspecified whether conversion stops at the end
of the previous character (if any), or at the end of the input
buffer. The glibc implementation adopts the former behavior."

(https://man7.org/linux/man-pages/man3/mbsnrtowcs.3.html)
(Source: https://web.git.kernel.org/pub/scm/docs/man-pages/man-pages.git/tree/man/man3/mbsnrtowcs.3)

The problem:

It is POSIX.1-2008 and POSIX.1-2017 that leave it unspecified where the conversion stops.

POSIX.1-2024 now requires the _latter_ behavior, and the reason they cited about the change is, strangely, glibc. But this man page says that glibc uses the former behavior.

(https://pubs.opengroup.org/onlinepubs/9799919799/functions/mbsrtowcs.html)
(https://www.austingroupbugs.net/view.php?id=616)

For my curiosity, I did test with the code included in the Austin Group Issue report (also pasted below, with my personal modifications), in Devuan GNU/Linux 5 (glibc 2.36-9+deb12u9).

Glibc's behavior is close to the latter, but I would rather like to clarify the behavior as follows:

"If the input buffer (up to the `nmc` limit) ends with an incomplete character, conversion stops at the `nmc` byte index of the input buffer. However, if a null byte ('\0') is encountered in the input buffer before the `nmc` limit, then the incomplete sequence is treated as invalid instead, and `*src` would point to the start of that invalid byte sequence."

(The behavior of treating the incomplete sequence before '\0' makes the behavior of `mbsnrtowcs(dest, src, SIZE_MAX, size, ps)` identical to `mbsrtowcs(dest, src, size, ps)` so mbsrtowcs(3) can be directly implemented using mbsnrtowcs(3).)

My wording isn't great, so please revise the wording when you can.

```c
#include <wchar.h>
#include <locale.h>
#include <string.h>
#include <stdio.h>

wchar_t wcs[100];
char mbs[100];

int main()
{
        mbstate_t state; const char *s;
        setlocale(LC_CTYPE, "en_US.UTF-8");

        // U+754C U+7DDA
        memset(&state, 0, sizeof(state));
        memcpy(mbs, "\xe7\x95\x8c\xe7\xb7\x9a", 7);
        s = mbs;
        printf("%u ", (unsigned)mbsnrtowcs(wcs, &s, 5, 100, &state));
        printf("%u\n", (unsigned)(s - mbs));
        // Output: "1 5"
        // (If conversion stops at character boundary, the output would be "1 3".)

        memset(&state, 0, sizeof(state));
        memcpy(mbs, "\xe7\x95\x8c\xe7\xb7", 6);
        s = mbs;
        printf("%u ", (unsigned)mbsnrtowcs(wcs, &s, 6, 100, &state));
        printf("%u\n", (unsigned)(s - mbs));
        // Output: "4294967295 3"
}
```
Comment 1 Alejandro Colomar 2025-03-09 18:46:00 UTC
Would you mind sending a patch to the mailing list, and CC libc-help@?

See also:
<https://web.git.kernel.org/pub/scm/docs/man-pages/man-pages.git/tree/CONTRIBUTING.d>
<https://web.git.kernel.org/pub/scm/docs/man-pages/man-pages.git/tree/CONTRIBUTING>

Note You need to log in before you can comment on or make changes to this bug.