Overview
| Artifact ID: | a38c4b4972a53f22a8090720686682070e8c7904 |
|---|---|
| Ticket: | 67aa9a207037ae67f9014b544c3db34fa732f2dc
Security: Invalid UTF-8 can inject unexpected characters |
| User & Date: | jan.nijtmans 2017-05-31 11:44:32 |
Changes
- assignee changed to: "nobody"
- closer changed to: "nobody"
- cmimetype changed to: "text/x-fossil-wiki"
- comment changed to:
Example: <pre> encoding convertfrom utf-8 \x3c\xc0\xbc << </pre> So, the byte sequence \xc0\xbc produces the same character as \x3c. This is know as overly long UTF=8 sequences, and it is dangerous. For example, a HTML file can be constructed containing the sequence "\xc0\xbcscript ...". When Tcl reads this file and outputs it again in UTF-8, the sequence becomes "<script ...", which can actually run something! Most UTF-8 decoders handle this the same as other invalid UTF-8 sequences: Just output valid UTF-8 corresponding with the individual bytes. The original example then becomes: <pre> encoding convertfrom utf-8 \x3c\xc0\xbc <À¼ </pre> The characters "À¼" are 'safe' in HTML, since no characters > \x7f have a special meaning.
- foundin changed to: "8.6"
- is_private changed to: "0"
- login: "jan.nijtmans"
- priority changed to: "5 Medium"
- resolution changed to: "None"
- severity changed to: "Important"
- status changed to: "Open"
- submitter changed to: "jan.nijtmans"
- subsystem changed to: "44. UTF-8 Strings"
- title changed to:
Security: Invalid UTF-8 can inject unexpected characters
- type changed to: "Bug"