Tcl Source Code: Ticket Change Details

Overview

Artifact ID:	a38c4b4972a53f22a8090720686682070e8c7904
Ticket:	67aa9a207037ae67f9014b544c3db34fa732f2dc Security: Invalid UTF-8 can inject unexpected characters
User & Date:	jan.nijtmans 2017-05-31 11:44:32

Changes

assignee changed to: "nobody"
closer changed to: "nobody"
cmimetype changed to: "text/x-fossil-wiki"

comment changed to:

Example:
<pre>
   encoding convertfrom utf-8 \x3c\xc0\xbc
   <<
</pre>
So, the byte sequence \xc0\xbc produces the same character as \x3c. This is know as overly long UTF=8 sequences, and it is dangerous. For example, a HTML file can be constructed containing the sequence "\xc0\xbcscript ...". When Tcl reads this file and outputs it again in UTF-8, the sequence becomes "<script ...", which can actually run something!

Most UTF-8 decoders handle this the same as other invalid UTF-8 sequences: Just output valid UTF-8 corresponding with the individual bytes. The original example then becomes:
<pre>
   encoding convertfrom utf-8 \x3c\xc0\xbc
   <À¼
</pre>

The characters "À¼" are 'safe' in HTML, since no characters > \x7f have a special meaning.

foundin changed to: "8.6"
is_private changed to: "0"
login: "jan.nijtmans"
priority changed to: "5 Medium"
resolution changed to: "None"
severity changed to: "Important"
status changed to: "Open"
submitter changed to: "jan.nijtmans"
subsystem changed to: "44. UTF-8 Strings"

title changed to:

Security: Invalid UTF-8 can inject unexpected characters

type changed to: "Bug"