Tcl Source Code: View Ticket

Ticket UUID:	c6897e6e6a8a65a995ec35c9c297b926bf1b6783
Title:	select notifier crashes/misbehaves on fd >= FD_SETSIZE
Type:	Bug	Version:	8.6.9
Submitter:	obj	Created on:	2019-11-05 04:31:45
Subsystem:	01. Notifier	Assigned To:	nobody
Priority:	5 Medium	Severity:	Important
Status:	Open	Last Modified:	2019-11-11 22:03:15
Resolution:	None	Closed By:	nobody
		Closed on:
Description:	The select notifier crashes or misbehaves if given a file descriptor that's >= FD_SETSIZE. This is because it's using the FD_* macros blindly without checking whether the descriptor is in range, which trashes the stack. This has security implications. The overrun doesn't seem directly exploitable, but it is at a minimum a cheap DoS attack against a typical server that's accepting connections on a socket - establish ~1000 connections and the process will crash unavoidably. To reproduce on Linux (FD_SETSIZE is 1024): $ ulimit -n 4096 $ ./crashme.tcl 600 # 6002 > 1024 [...] buffer overflow detected *: tclsh terminated Aborted (core dumped) On FreeBSD, there's no immediate crash but the notifier starts spinning without making progress. From inspection it looks like the select notifier code in 8.7 still has the same problem. --- 8< --- #!/usr/bin/env tclsh proc dummy {args} {} proc accept_one {sock addr port} { fileevent $sock readable dummy puts -nonewline stderr "+" } proc crashme {count} { puts -nonewline stderr "connecting $count sockets " set listener [socket -server accept_one 0] lassign [chan configure $listener -sockname] listenIp listenHost listenPort for {set i 0} {$i < $count} {incr i} { set client [socket -async $listenIp $listenPort] fileevent $client readable dummy puts -nonewline stderr "." update } puts stderr " done." } if {!$tcl_interactive} { crashme [lindex $::argv 0] } --- 8< ---
User Comments:	chw added on 2019-11-11 22:03:15: Please see in https://www.androwish.org/home/artifact?udc=1&ln=1290-1300&name=0b9ce4d422fe17e2 that the added panic is for the error case in the initialization of the notifier thread, i.e. catching the condition that the IPC pipe of the notifier does not fit into the projected size of the select masks of the notifier thread. Otherwise problems occur later in the main loop of the notifier thread since the file descriptor of the receive end of the pipe is beyond the upper bound of the select bitmask for reading. So yes, the panic is unavoidable here. sebres added on 2019-11-11 16:28:38: @chw: as for your PoC (androwish implementation), I don't think panic is that what is expected if it reached NOFILES/FD_SETSIZE limits - it is possible to handle issues like this without immediate hard exit from a program (what panic basically does). As already said, current implementation of branch bug-c6897e6e6a is enough to handle `fd >= FD_SETSIZE` situation properly (generates an error as long as some fds get free later), exactly the same manner as in case of NOFILES limit is reached (so in `EMFILE` state), and in both cases regardless odd or even fds produced in listener/acceptor or on simple connector side. And if we get async-handling/notifier for `fd >= FD_SETSIZE` implementing properly (e. g. as suggested with poll/epoll or some other workaround), it'd be enough to delete the code-blocks actually enclosed in `ifdef CHECK_FD_SETSIZE_LIMIT` to fix it ultimately. In any case (unless it's not completelly fixed) I'd prefer the script/API generates an error - panic is simply too hard for such purposes. chw added on 2019-11-10 19:17:54: The OP's script needs some small mods due to startup conditions and due to two problems which can occur: 1. fd >= FD_SETSIZE and 2. hit of errno == EMFILE. Since each iteration requires two fds (one for the client and one for the accepted socket) it depends if an even or odd number of fds was in use when the script performs its very first step. --- 8< --- #!/usr/bin/env tclsh # uncomment for odd/even start condition. # open /dev/null proc dummy_server {sock} { puts -nonewline stderr "S" close $sock } proc dummy_client {sock} { puts -nonewline stderr "C" close $sock # consume one file descriptor for a while after 1 [list close [open /dev/null]] } proc accept_one {sock addr port} { fileevent $sock readable [list dummy_server $sock] puts -nonewline stderr "+" } proc crashme {count} { puts -nonewline stderr "connecting $count sockets " set listener [socket -server accept_one 0] # non-blocking listen socket, else accept can block chan configure $listener -blocking 0 lassign [chan configure $listener -sockname] listenIp listenHost listenPort for {set i 0} {$i < $count} {incr i} { set client [socket -async $listenIp $listenPort] fileevent $client readable [list dummy_client $client] puts -nonewline stderr "." update } puts stderr " done." } if {!$tcl_interactive} { crashme [lindex $::argv 0] } --- 8< --- chw added on 2019-11-08 23:25:32: I think it mainly boils down to this https://www.androwish.org/home/artifact?udc=1&name=e337a3b2239dcc19&ln=1727,1737 which seems to be a design flaw in the socket layer, i.e. listen() allows for a backlog which accept() cannot satisfy due to other process/system constraints. In the shiny modern world of multi threads/cores and a single table of file descriptors per process this introduces a race which properly could be resolved only by kernel support in the spirit of dup2(), i.e. atomically swap an old file descriptor with the new one obtained through accept(). Thus, I blatantly propose to name this new system call either dupcept() or accdupt(). I bet, the lack of it already burned legions of software engineers. chw added on 2019-11-08 23:03:13: Forgot the latest: https://www.androwish.org/home/info/56ed33b76ae2c937 chw added on 2019-11-08 22:52:27: Another proof of concept can be found in these AndroWish check-ins (for the 8.6.9 select based notifier only): https://www.androwish.org/home/info/9ba84a8a765c4da6 https://www.androwish.org/home/info/5362b4849e8f8673 sebres added on 2019-11-07 11:51:04: a) This is just PoC, not a fully correct solution (also note it still doesn't work for macOS/Xcode, see travis-ci/608613091, EBADFD is undeclared there). b&c) Although I think fixing of known issue (also if it firstly looks like a workaround) is better as to blame on system restriction and even such simple stupid reject is better as a busy wait or even crash, but as already said better would be extend notifier with poll for FD >= FD_SETSIZE and then deactivate CHECK_FD_SETSIZE_LIMIT (see [c2d030bb31a977a4])... So if ultimately it'd work for sockets (for FD exceeding FD_SETSIZE), it'd basically work for every other descriptor (like files, whatever). > Can we maybe synthesize errors for channels that exceed FD_SETSIZE at the point where the notifier actually gets them (and not add them to the fd_set at all)? I guess the emphasis is on "at the point"? Well sure it is possible (somehow together with many other places where FD_SET/FD_CLR/FD_COPY/FD_whatever are called), just I'd like to fix it ultimately and not to improve the workaround. obj added on 2019-11-07 01:48:08: a) There are other non-socket paths that can introduce fds to the notifier. Notably, I believe external code can call Tcl_CreateFileHandler(). How are you going to handle that? b) In 8.7, AIUI, there are several possible notifiers so restricting the valid FD range at FD creation time seems like it will artificially restrict other notifier types that aren't limited by FD_SETSIZE c) what about code that uses many FDs but does not run the event loop? This is probably an uncommon case but it would be surprising for that to break I tend to agree that this needs to be dealt with in the notifier somehow. Can we maybe synthesize errors for channels that exceed FD_SETSIZE at the point where the notifier actually gets them (and not add them to the fd_set at all)? sebres added on 2019-11-06 17:41:13: I prefer epoll/poll here, because the applying of subset like `FD_SET(fd%FD_SETSIZE, &fds_array[fd/FD_SETSIZE])` is hardly possible at the moment (for example for all asynchronous TSD with states/handlers across threads). For example if FD_SETSIZE is 1024 and we have some fds containing 2 "parallel" (in sense of subset) fd's like `[10, 1034)`, such shifting to every subset (if applied directly over TSD-array) will always intermix states of descriptors there, so some threads awaiting for both could get wrong (confused) results. This way it is almost impossible to implement without large change of notifier handling or even its architecture. sebres added on 2019-11-06 15:27:13: Sure, it's correct and I can follow your explanation (just the crash was even not reproducible to me)... Anyway, the branch is extended now and [001f178818] contains a first shot of proper fix for this - now it retards accepting in case of out of FD but continue to accept if some descriptors arrive again (gets closed later). As wewll as test cases covering both situations: - 2.14 - no free FD available - 2.15 - FD limit violates (exceeds) FD_SETSIZE Both tests cover continuation of accept in server-listener, if free descriptors become avaialable later. I see pretty well a busy wait for unpatched 8.6 (it hangs there) and 4 tests passed with my branch: `$ ./tcltest ../tests/all.tcl -file socket.test -match '*-2.1[45]' ... all.tcl: Total 169 Passed 4 Skipped 165 Failed 0` Now why it is a first shot: although it properly fixes the issue (error on limit & continue to accept later), but in case of NOFILE-limit exceeding FD_SETSIZE it simply rejects the connection (what is not optimal). Better would be to extend notifier or async event handling to use epoll/poll (for all fd's `fd >= FD_SETSIZE`) or to rewrite waiting select to a bit hackish solution from SO/a/7977082/7161854 (e. g. `FD_SET(fd%FD_SETSIZE, &fds_array[fd/FD_SETSIZE])`) and then if it works remove all 3 extra prevention codes like cb2b27ad5d7eff89, line 1216-1222 that I used at the moment against violation of the set-size limit. obj added on 2019-11-06 12:47:35: With your branch, it still crashes for me here: #7 0x00007f887d51df3e in Tcl_CreateFileHandler (fd=1024, mask=2, proc=proc@entry=0x7f887d51a3ac <WrapNotify>, clientData=0x5565df082ca0) at /home/oliver/tclthing/tcl-56620b4bd5/unix/tclUnixNotfy.c:634 634 FD_SET(fd, &tsdPtr->checkMasks.readable); .. which is entirely unsurprising, because your branch doesn't actually change anything in the problem case, which is where accept() succeeds but returns a fd >= FD_SETSIZE. The crash vs busy wait thing is likely to be a build issue. Try building without --enable-symbols, or set CFLAGS=-O. I get a busy-wait instead of a crash if I build with --enable-symbols, I suspect because the bounds-checking code gets disabled in that case. sebres added on 2019-11-06 11:38:12: Sure, I have limited maximum number of fd... As already said no crash, only busy wait. WiP. obj added on 2019-11-06 00:56:03: You're looking at the wrong problem. I was going to raise a separate ticket about server sockets misbehaving when they get EMFILE, but that's a different thing. Ensure that you have a RLIMIT_NOFILE (note the "ulimit -n" in my reproduction steps) that's large enough to handle the number of sockets created by the test (so that you exceed FD_SETSIZE before you get EMFILE) and you should see the crash. sebres added on 2019-11-05 19:57:06: Well, I think I know what could cause a crash: the test-script is a bit weird, because even if noone socket gets accepted, it can explode endless "readable" events (accept-events of listener), but in state of descriptor exhaustion accept can't create new sockets. Suggested fix of me can indeed "solve" that, but unfortunately it marks the listener as useless, because it would never accept anymore, also if some free descriptors would arrive (sockets or files get closed). Proper fix should retard readable acceptor-handler of listener in situation of ENOBUFS/EMFILE (to avoid endless busy-wait), so it were able to continue later, if new descriptors arrive (accept succeeds). Unfortunately we don't really have any socket state that can be used for such purposes to recognize out of limit at script level or to notify connecting async socket about "too many open files", because normally this socket would connect to remote listener (other thread/process) and "accept" knows only counterpart of connection (which is not really built). In normal use-case the accept would fail if listener process exceeds the limit of max file descriptors, so for example if TCP connection sending a FIN to other side, at other connection end we'll get readable event with reject. Anyway, I have to rewrite my fix (to avoid perdition of listener). I'll try to use something other in order to avoid busy-wait, as well as provide test cases covering both situation (errors on EMFILE and work continuation after close of some descriptors). sebres added on 2019-11-05 14:18:25: Hmm... I could repoduce a busy-wait loop, but not the crash (may be it is platform related)... The branch bug-c6897e6e6a should fix it (at least busy-wait is away, and I see correctly produced "couldn't open socket: too many open files" after 1024 used descriptors)... Can you try this, whether the crash is solved too?