|
2022-07-29
| ||
| 10:55 | amend for [4eb3a155ac] SF-fix: 8.7 specific implementation and warnings silencing check-in: ae45d3af73 user: sebres tags: core-8-branch | |
| 09:55 | • Closed ticket [4eb3a155ac]: apply causes memory corruption / crash plus 5 other changes artifact: c4d436886c user: sebres | |
| 09:51 | merge branch bug-4eb3a155ac--procptr-bytecode, fixes SF [4eb3a155ac] and similar corner cases: reset... check-in: 73a6af23a5 user: sebres tags: core-8-6-branch | |
|
2022-07-28
| ||
| 08:56 | • Ticket [4eb3a155ac] apply causes memory corruption / crash status still Pending with 3 other changes artifact: 5283a0aa77 user: jan.nijtmans | |
|
2022-07-27
| ||
| 16:09 | • Pending ticket [4eb3a155ac]. artifact: 64302ef8f4 user: sebres | |
| 15:56 | fixes [4eb3a155ac] and similar segfaults: reset corresponding bodyPtr->procPtr if procPtr gets relea... Closed-Leaf check-in: d717a34f4c user: sebres tags: bug-4eb3a155ac--procptr-bytecode | |
| 15:29 | • Ticket [4eb3a155ac] apply causes memory corruption / crash status still Open with 3 other changes artifact: 3fa66da561 user: sebres | |
|
2022-07-25
| ||
| 21:19 | • Ticket [4eb3a155ac]: 3 changes artifact: 24c744e61a user: jan.nijtmans | |
|
2022-07-23
| ||
| 04:30 | • Ticket [4eb3a155ac]: 3 changes artifact: 4441a7cbdc user: apnadkarni | |
| 04:24 | • Open ticket [4eb3a155ac]. artifact: e4b85930b3 user: apnadkarni | |
|
2022-07-20
| ||
| 17:42 | • Deleted ticket [4eb3a155ac]. artifact: cb86ea26bd user: apnadkarni | |
| 17:18 | • Ticket [4eb3a155ac]: 3 changes artifact: cf31d63047 user: apnadkarni | |
| 17:17 | • New ticket [4eb3a155ac]. artifact: 918eb0a5b2 user: apnadkarni | |
| Ticket UUID: | 4eb3a155ac6f73f48b270a19e543841d26bf2e6 | |||
| Title: | apply causes memory corruption / crash | |||
| Type: | Bug | Version: | 8.6 | |
| Submitter: | apnadkarni | Created on: | 2022-07-20 17:17:42 | |
| Subsystem: | 47. Bytecode Compiler | Assigned To: | nobody | |
| Priority: | 7 High | Severity: | Important | |
| Status: | Closed | Last Modified: | 2022-07-29 09:55:09 | |
| Resolution: | Fixed | Closed By: | sebres | |
| Closed on: | 2022-07-29 09:55:09 | |||
| Description: |
The [apn-apply-bug](https://core.tcl-lang.org/tcl/timeline?r=apn-apply-bug) branch contains a demonstration and proposed fix for the bug.
The bug is triggered when the apply command is passed a "lambda" argument which is is a two element list (i.e. not yet shimmered to the `lambdaType` internal representation) but the second element of the list (the lambda body) is unshared **and** already has a internal `tclByteCodeType` representation.
The test command [testapplylambda](https://core.tcl-lang.org/tcl/artifact?name=f0202c540752a8dc&ln=8127) is an example of the above.
Analysis:
When `TclNRApplyObjCmd` is called with an argument that is a list of two strings, it first [converts](https://core.tcl-lang.org/tcl/artifact?name=37dbfa89bf70d3f9&ln=2683) the argument to the internal `lambdaType` in `procPtr`. It then compiles the second element of the argument, the body, by calling [TclPushProcCallFrame](https://core.tcl-lang.org/tcl/artifact?name=37dbfa89bf70d3f9&ln=2720). Crucially, this compilation updates the `procPtr->numCompiledLocals` field with the number of locals used in the body.
The next time `TclNRApplyObjCmd` is called with the same argument, the argument object is already shimmered to `lambdaType` so no conversion is needed. It also already contains the right value of `procPtr->numCompiledLocals` from the previous conversion. So all is well.
The bug occurs when the lambda argument is a list but the body element already has a `tclByteCodeType` internal type. In this case, again `TclNRApplyObjCmd` calls `TclGetLambdaFromObj` to shimmer the argument to a `lambdaType` and get a new `procPtr`. However, when `TclPushProcCallFrame` is called, it finds the body element is already of the `tclByteCodeType` type and [does not recompile](https://core.tcl-lang.org/tcl/artifact?name=37dbfa89bf70d3f9&ln=1579) the body script. **Consequently `procPtr->numCompiledLocals` in the newly allocated `procPtr` does not get updated.** Subsequently, when the byte code is executed it tries to access the expected locations for local variables which has not actually been allocated and crashes.
Proposed fix:
The proposed [fix](https://core.tcl-lang.org/tcl/artifact?name=37dbfa89bf70d3f9&ln=2461) is to always free the internal representation of the body element `Tcl_Obj` of a lambda argument when the latter is shimmered to a `lambdaType`. This ensures recompilation.
This is the simplest and cleanest fix I can come up with given my limited knowledge of this area. An alternative more efficient alternative might be to "remember" the value of numCompiledLocals and stash it away in the `tclByteCodeType` internal rep to be used to reinitialize `procPtr->numCompiledLocals` when the body itself does not need recompilation. I am reluctant to explore that being unsure of what other compilation side-effects also would need to be accounted for. Someone with more expertise might explore that path.
Why this has not shown up:
Reproducing this at the script level is hard (impossible?) in the current Tcl implementation because it requires certain conditions - the lambda itself must not already have an internal lambda rep AND the body element must already have byteCode internal rep AND the body element must be unshared (shared Tcl_Obj are always recompiled). The shimmering and reference counting that happens in script level commands means these conditions are rare (never occur?). Nevertheless this seems to be a genuine bug as illustrated in the C code.
Why this fix has more relevance now:
With TIP 625 list optimizations, this can be reproduced at the script level with the following code.
```
set lambda [list {} {set x 42}]
apply [lrange $lambda 0 end]
apply [lrange $lambda 0 end]
```
With TIP 625, the `lrange` commands return a new Tcl_Obj (because the string representation must be canonical) but share the same internal list representation since the entire list is being returned. Consequently the second apply produces the conditions required for the crash.
Pre-TIP 625, the `lrange` commands would not only return a new Tcl_Obj but would also return an entirely new internal list representation even when the entire list was covered by the range (this is sub-optimal). The conditions for the bug are thus not met and the bug is not triggered.
| |||
| User Comments: |
sebres added on 2022-07-29 09:55:09:
merged in [73a6af23a5e23f40] jan.nijtmans added on 2022-07-28 08:56:39: @sebres. According to Github ACTIONS, your bug-fix works. Thanks! sebres added on 2022-07-27 16:09:06: Branch bug-4eb3a155ac--procptr-bytecode must fix it. sebres added on 2022-07-27 15:29:13: As discussed in tclchat (IRC), the fix of Ashok covers only one case, as well as basically too heavy because would always reset the body of lambda, thus could unintentionally lost large internal representation of some object if body is evaluated as a list, e. g. basically it wouldn't be able anymore evaluate lambdas as pure-list (using Tcl_EvalObjv/TclNREvalObjv), because will always shimmer to string. Anyway, I could find the real reason of the issue and it looks much more grave as assumed - the procPtr object (of type tclProcBodyType) is stored (but not referenced) in its bytecode (in object of type tclByteCodeType), but if lambda/proc/whatever becomes free, codePtr->procPtr remains unchanged, although procPtr doesn't exists anymore, because released in TclProcCleanupProc. And since ByteCode would not be recompiled later, with that constellation I see dozen cases were it can segfault later, Ashok found only one case. I rebased the branch apn-apply-bug to 8.6 (where it is also pretty reproducible) and will provide a common solution for the issue (reset codePtr->procPtr in TclProcCleanupProc). jan.nijtmans added on 2022-07-25 21:19:12: I think the apn-apply-bug is OK to be merge to core-8-6-branch. Thanks, @Ashhok, for analysing and fixing this bug! It might be that a better fix is possible, but since this bug is so rare it doesn't really matter; low priority. I leave it to you, further. apnadkarni added on 2022-07-23 04:30:17: *Comments from sebres on the chat:* <sebres>: apn: nice catch... just as for the fix [072cffce6f] - it is a bit too heavy, I mean always to shimmer to pure string body (for instance it'd never use a list eval, if body is list), also could lost some large object internal representations (apply [list args [list ... $largeObj]]), and would need to recreate it by the evaluation e. g. shimmer back to expected type) <sebres>: apn: isn't possible to correct procPtr->numCompiledLocals or even force that recompilation only in case where it is wrong (we assuming invocation of byteCode vs. lambda/proc byteCode)? *My response on the chat:* @sebres, regarding [072cffce6f], I was just glad I was not completely off-track a few days ago when I logged a bogus ticket about bug in apply! Regarding the fix, my first thought had been that it might be potentially heavyweight as you say because of the internal rep being discarded. But then rethinking, I concluded any internal rep is going to get shimmered to a ByteCode object anyways when the script (in bodyPtr) is compiled by TclCompileScript. So it is only being discarded a little earlier. This was the cleanest solution I could find but this is my foray into the byte code world so I could be wrong about that. It is also a rare case that the lambda object does not have a internal Lambda representation but the body object has an internal ByteCode representation. "apn: isn't possible to correct procPtr->numCompiledLocals or even force that recompilation only in case where it is wrong (we assuming invocation of byteCode vs. lambda/proc byteCode)?" - I went down that path too (have body store that value and update procPtr in the case the body is not recompiled) but beyond my abilities at this time. I'm not sure what other side effects the body compilation has and what else might need to be fixed/restored. | |||