View Ticket
Not logged in
Ticket UUID: 0265750233183103c98714f6005a9b842bdfee9c
Title: invalid read in cmdAH-4.3.13.C1.solo.utf-8.tcl8.a
Type: Bug Version: core-8-6-branch
Submitter: dgp Created on: 2023-03-20 18:03:45
Subsystem: 16. Commands A-H Assigned To: jan.nijtmans
Priority: 5 Medium Severity: Important
Status: Closed Last Modified: 2023-03-22 18:03:18
Resolution: Fixed Closed By: jan.nijtmans
    Closed on: 2023-03-22 18:03:18
Description:
---- cmdAH-4.3.13.C1.solo.utf-8.tcl8.a start
==13304== Invalid read of size 1
==13304==    at 0x5A9234: Tcl_UtfToUniChar (tclUtf.c:467)
==13304==    by 0x501B29: UtfToUtfProc (tclEncoding.c:2617)
==13304==    by 0x4FF867: Tcl_ExternalToUtfDStringEx (tclEncoding.c:1320)
==13304==    by 0x448C3C: EncodingConvertfromObjCmd (tclCmdAH.c:693)
==13304==    by 0x4359E2: Dispatch (tclBasic.c:4997)
==13304==    by 0x435A6A: TclNRRunCallbacks (tclBasic.c:5037)
==13304==    by 0x435336: Tcl_EvalObjv (tclBasic.c:4756)
==13304==    by 0x437998: TclEvalEx (tclBasic.c:5913)
==13304==    by 0x55AF73: Tcl_FSEvalFileEx (tclIOUtil.c:1782)
==13304==    by 0x56799C: Tcl_MainEx (tclMain.c:405)
==13304==    by 0x410188: main (tclAppInit.c:94)
==13304==  Address 0xc754bbd is 0 bytes after a block of size 13 alloc'd
==13304==    at 0x4C2C291: realloc (vg_replace_malloc.c:836)
==13304==    by 0x60C36F: TclpRealloc (tclAlloc.c:747)
==13304==    by 0x444E22: Tcl_Realloc (tclCkalloc.c:1113)
==13304==    by 0x43F49C: Tcl_SetByteArrayLength (tclBinary.c:620)
==13304==    by 0x440498: BinaryFormatCmd (tclBinary.c:1202)
==13304==    by 0x4359E2: Dispatch (tclBasic.c:4997)
==13304==    by 0x435A6A: TclNRRunCallbacks (tclBasic.c:5037)
==13304==    by 0x435336: Tcl_EvalObjv (tclBasic.c:4756)
==13304==    by 0x437998: TclEvalEx (tclBasic.c:5913)
==13304==    by 0x55AF73: Tcl_FSEvalFileEx (tclIOUtil.c:1782)
==13304==    by 0x56799C: Tcl_MainEx (tclMain.c:405)
==13304==    by 0x410188: main (tclAppInit.c:94)
==13304== 
++++ cmdAH-4.3.13.C1.solo.utf-8.tcl8.a PASSED
User Comments: jan.nijtmans added on 2023-03-22 18:03:18:

Fixed [2ffcb8bcf4e7d0d0|here]. Same bug was in core-8-6-branch too. And the same bug was present in the function Tcl_UtfToChar16() too.

Should all be fixed now.

> I have to trust you know best how to coordinate the needs of all the players in the arena of reformed "utf" routines.

Thanks!


dgp added on 2023-03-22 16:56:45:
I can see that change will also solve the immediate problem.

I have to trust you know best how to coordinate the needs of
all the players in the arena of reformed "utf" routines.

Thanks for the fix.

jan.nijtmans added on 2023-03-22 16:41:09:

Good catch! Actually, Tcl_UtfCharComplete is correct here. If the first byte is \xC1, Tcl_UtfToUniChar() shouldn't read the second byte: for whatever value, calculating the unicode value will result in an underflow later.

Proposed solution [4055888f8f267de4|here]


dgp added on 2023-03-22 16:15:59:
The problem here drills down to an error in Tcl_UtfCharComplete().

  Tcl_UtfCharComplete("\xC1", 1)

returns true, indicating that the single byte C1 is a complete UTF-8
character.  It isn't.  Also acting on that answer is incompatible with
how Tcl_UtfToUniChar() is coded, and protection of calls to that routine
are the primary reason Tcl_UtfCharComplete() exists.

The fix is just to change the value of complete[0xC1] from 1 to 2.

The value of totalBytes[0xC1] is also 1 instead of 2 which hints at
the possibility of more problems.

I have not yet drilled into the history to see when these values arrived
and what competing needs they are meant to serve.