Bug 217059

Summary: Please document behaviour of iconv(3) when input is untranslatable
Product: Documentation Reporter: Reuben Thomas (rrt)
Component: man-pagesAssignee: documentation_man-pages (documentation_man-pages)
Status: NEEDINFO ---    
Severity: normal    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:

Description Reuben Thomas 2023-02-19 10:31:37 UTC
See https://sourceware.org/bugzilla/show_bug.cgi?id=29913

The issue is that the man page does not fully reflect the behaviour of glibc's iconv. The man page says:

The conversion can stop for four reasons:

       1. An invalid multibyte sequence is encountered in the input.  In this case, it sets errno to EILSEQ and
          returns (size_t) -1.  *inbuf is left pointing to the beginning of the invalid multibyte sequence.

The phrase "An invalid multibyte sequence is encountered in the input" is confusing, because it suggests to me (and other readers, see the bug above!) that it refers only to the validity of the input per se (e.g. a non-UTF-8 sequence in input purporting to be UTF-8).

However, according to the original author of the man page, Bruno Haible (see the bug above), it also refers to input that cannot be translated to the desired output encoding; and indeed, glibc's iconv returns EILSEQ when the input cannot be translated, even though it is valid.

Please clarify the man page to reflect this behaviour. On the one hand, it is confusing and surprising when compared to the POSIX standard (again, for this reader and others); on the other hand, it is useful (because it enables untranslatable input to be detected).
Comment 1 Alejandro Colomar 2023-05-19 12:09:31 UTC
Would you mind sending a patch according to the ./CONTRIBUTING file in
the repo?
Comment 2 Reuben Thomas 2023-05-20 11:18:19 UTC
I've sent a patch.