Ticket Change Details
Not logged in
Overview

Artifact ID: a38c4b4972a53f22a8090720686682070e8c7904
Ticket: 67aa9a207037ae67f9014b544c3db34fa732f2dc
Security: Invalid UTF-8 can inject unexpected characters
User & Date: jan.nijtmans 2017-05-31 11:44:32
Changes

  1. assignee changed to: "nobody"
  2. closer changed to: "nobody"
  3. cmimetype changed to: "text/x-fossil-wiki"
  4. comment changed to:
    Example:
    <pre>
       encoding convertfrom utf-8 \x3c\xc0\xbc
       <<
    </pre>
    So, the byte sequence \xc0\xbc produces the same character as \x3c. This is know as overly long UTF=8 sequences, and it is dangerous. For example, a HTML file can be constructed containing the sequence "\xc0\xbcscript ...". When Tcl reads this file and outputs it again in UTF-8, the sequence becomes "<script ...", which can actually run something!
    
    Most UTF-8 decoders handle this the same as other invalid UTF-8 sequences: Just output valid UTF-8 corresponding with the individual bytes. The original example then becomes:
    <pre>
       encoding convertfrom utf-8 \x3c\xc0\xbc
       <À¼
    </pre>
    
    The characters "À¼" are 'safe' in HTML, since no characters > \x7f have a special meaning.
    
  5. foundin changed to: "8.6"
  6. is_private changed to: "0"
  7. login: "jan.nijtmans"
  8. priority changed to: "5 Medium"
  9. resolution changed to: "None"
  10. severity changed to: "Important"
  11. status changed to: "Open"
  12. submitter changed to: "jan.nijtmans"
  13. subsystem changed to: "44. UTF-8 Strings"
  14. title changed to:
    Security: Invalid UTF-8 can inject unexpected characters
    
  15. type changed to: "Bug"