Bug 215737 - uri.7: DESCRIPTION: Character encoding: Reference to obsolete IETF RFCs 2718 and 2279
Summary: uri.7: DESCRIPTION: Character encoding: Reference to obsolete IETF RFCs 2718 ...
Status: RESOLVED DOCUMENTED
Alias: None
Product: Documentation
Classification: Unclassified
Component: man-pages (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: documentation_man-pages@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-03-24 12:01 UTC by Alejandro Colomar
Modified: 2022-03-28 18:57 UTC (History)
0 users

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Alejandro Colomar 2022-03-24 12:01:37 UTC
[uri(7)::DESCRIPTION::Character encoding] reads as:

```
   Character encoding
       URIs use a limited number of characters so that they  can
       be typed in and used in a variety of situations.

       The  following characters are reserved, that is, they may
       appear in a URI but their use is  limited  to  their  re-
       served  purpose  (conflicting data must be escaped before
       forming the URI):

                  ; / ? : @ & = + $ ,

       Unreserved characters may be included in  a  URI.   Unre-
       served  characters  include uppercase and lowercase Latin
       letters, decimal digits, and the following limited set of
       punctuation marks and symbols:

                  - _ . ! ~ * ' ( )

       All other characters must be escaped.  An  escaped  octet
       is encoded as a character triplet, consisting of the per-
       cent character "%" followed by the two hexadecimal digits
       representing  the  octet  code  (you can use uppercase or
       lowercase letters for the hexadecimal digits).  For exam-
       ple, a blank space must be escaped as "%20", a tab  char-
       acter  as  "%09", and the "&" as "%26".  Because the per-
       cent "%" character always has the reserved purpose of be-
       ing the escape indicator, it must be  escaped  as  "%25".
       It  is  common practice to escape space characters as the
       plus symbol (+) in query text; this practice  isn't  uni-
       formly  defined in the relevant RFCs (which recommend %20
       instead) but any tool  accepting  URIs  with  query  text
       should  be  prepared  for them.  A URI is always shown in
       its "escaped" form.

       Unreserved characters can be escaped without changing the
       semantics of the URI, but this should not be done  unless
       the  URI  is  being used in a context that does not allow
       the unescaped character to appear.  For example, "%7e" is
       sometimes used instead of "~" in an HTTP  URL  path,  but
       the two are equivalent for an HTTP URL.

       For  URIs  which  must  handle  characters outside the US
       ASCII character set, the HTML 4.01 specification (section
       B.2) and IETF RFC 2718 (section 2.2.5) recommend the fol-
       lowing approach:

       1.  translate the character sequences  into  UTF-8  (IETF
           RFC 2279)--see utf-8(7)--and then

       2.  use  the URI escaping mechanism, that is, use the %HH
           encoding for unsafe octets.
```

It refers to obsolete RFCs[1][2].  We should update the info there.


[1]: <https://www.rfc-editor.org/rfc/rfc2718>
[2]: <https://www.rfc-editor.org/rfc/rfc2279>
Comment 1 Alejandro Colomar 2022-03-24 12:16:12 UTC
Should check the following RFCs that replace them:

2279:
  <https://www.rfc-editor.org/rfc/rfc3629>

2718:
  <https://www.rfc-editor.org/rfc/rfc7595>
    <https://www.rfc-editor.org/rfc/rfc8615> (updates 7595)
Comment 2 Alejandro Colomar 2022-03-28 18:57:46 UTC
2279:
  2718:
    - It doesn't introduce changes relevant for the manual page.
      Update the reference number with no other changes.

2718:
  7595:
    - It deprecates URL in favor of URI.
      Replace the terms in the manual page,
      which were (sometimes) ambiguously used.
    - It stays the same regarding character encoding.
      Update the reference number with no other changes.
  8615:
    - It doesn't introduce changes relevant for the manual page.

Note You need to log in before you can comment on or make changes to this bug.