Tcl Source Code

View Ticket
Login
Ticket UUID: c6897e6e6a8a65a995ec35c9c297b926bf1b6783
Title: select notifier crashes/misbehaves on fd >= FD_SETSIZE
Type: Bug Version: 8.6.9
Submitter: obj Created on: 2019-11-05 04:31:45
Subsystem: 01. Notifier Assigned To: nobody
Priority: 5 Medium Severity: Important
Status: Open Last Modified: 2019-11-11 22:03:15
Resolution: None Closed By: nobody
    Closed on:
Description:
The select notifier crashes or misbehaves if given a file descriptor that's >=
FD_SETSIZE. This is because it's using the FD_* macros blindly without checking
whether the descriptor is in range, which trashes the stack.

This has security implications. The overrun doesn't seem directly exploitable,
but it is at a minimum a cheap DoS attack against a typical server that's
accepting connections on a socket - establish ~1000 connections and the process
will crash unavoidably.

To reproduce on Linux (FD_SETSIZE is 1024):

$ ulimit -n 4096
$ ./crashme.tcl 600      # 600*2 > 1024
[...]
*** buffer overflow detected ***: tclsh terminated
Aborted (core dumped)

On FreeBSD, there's no immediate crash but the notifier starts spinning without
making progress.

From inspection it looks like the select notifier code in 8.7 still has the
same problem.

--- 8< ---

#!/usr/bin/env tclsh

proc dummy {args} {}

proc accept_one {sock addr port} {
    fileevent $sock readable dummy
    puts -nonewline stderr "+"
}

proc crashme {count} {
    puts -nonewline stderr "connecting $count sockets "

    set listener [socket -server accept_one 0]
    lassign [chan configure $listener -sockname] listenIp listenHost listenPort

    for {set i 0} {$i < $count} {incr i} {
        set client [socket -async $listenIp $listenPort]
        fileevent $client readable dummy
        puts -nonewline stderr "."
        update
    }

    puts stderr " done."
}

if {!$tcl_interactive} {
    crashme [lindex $::argv 0]
}

--- 8< ---
User Comments: chw added on 2019-11-11 22:03:15:
Please see in

  https://www.androwish.org/home/artifact?udc=1&ln=1290-1300&name=0b9ce4d422fe17e2

that the added panic is for the error case in the initialization
of the notifier thread, i.e. catching the condition that the IPC
pipe of the notifier does not fit into the projected size of the
select masks of the notifier thread. Otherwise problems occur
later in the main loop of the notifier thread since the file
descriptor of the receive end of the pipe is beyond the upper
bound of the select bitmask for reading. So yes, the panic is
unavoidable here.

sebres added on 2019-11-11 16:28:38:

@chw: as for your PoC (androwish implementation), I don't think panic is that what is expected if it reached NOFILES/FD_SETSIZE limits - it is possible to handle issues like this without immediate hard exit from a program (what panic basically does).

As already said, current implementation of branch bug-c6897e6e6a is enough to handle fd >= FD_SETSIZE situation properly (generates an error as long as some fds get free later), exactly the same manner as in case of NOFILES limit is reached (so in EMFILE state), and in both cases regardless odd or even fds produced in listener/acceptor or on simple connector side.
And if we get async-handling/notifier for fd >= FD_SETSIZE implementing properly (e. g. as suggested with poll/epoll or some other workaround), it'd be enough to delete the code-blocks actually enclosed in ifdef CHECK_FD_SETSIZE_LIMIT to fix it ultimately.

In any case (unless it's not completelly fixed) I'd prefer the script/API generates an error - panic is simply too hard for such purposes.


chw added on 2019-11-10 19:17:54:
The OP's script needs some small mods due to startup conditions
and due to two problems which can occur: 1. fd >= FD_SETSIZE and
2. hit of errno == EMFILE. Since each iteration requires two fds
(one for the client and one for the accepted socket) it depends
if an even or odd number of fds was in use when the script performs
its very first step.

--- 8< ---

#!/usr/bin/env tclsh

# uncomment for odd/even start condition.
# open /dev/null

proc dummy_server {sock} {
    puts -nonewline stderr "S"
    close $sock
}

proc dummy_client {sock} {
    puts -nonewline stderr "C"
    close $sock
    # consume one file descriptor for a while
    after 1 [list close [open /dev/null]]
}

proc accept_one {sock addr port} {
    fileevent $sock readable [list dummy_server $sock]
    puts -nonewline stderr "+"
}

proc crashme {count} {
    puts -nonewline stderr "connecting $count sockets "

    set listener [socket -server accept_one 0]
    # non-blocking listen socket, else accept can block
    chan configure $listener -blocking 0
    lassign [chan configure $listener -sockname] listenIp listenHost listenPort

    for {set i 0} {$i < $count} {incr i} {
        set client [socket -async $listenIp $listenPort]
        fileevent $client readable [list dummy_client $client]
        puts -nonewline stderr "."
        update
    }

    puts stderr " done."
}

if {!$tcl_interactive} {
    crashme [lindex $::argv 0]
}

--- 8< ---

chw added on 2019-11-08 23:25:32:
I think it mainly boils down to this

  https://www.androwish.org/home/artifact?udc=1&name=e337a3b2239dcc19&ln=1727,1737

which seems to be a design flaw in the socket layer, i.e.
listen() allows for a backlog which accept() cannot satisfy
due to other process/system constraints. In the shiny modern
world of multi threads/cores and a single table of file
descriptors per process this introduces a race which properly
could be resolved only by kernel support in the spirit of dup2(),
i.e. atomically swap an old file descriptor with the new one
obtained through accept(). Thus, I blatantly propose to name
this new system call either dupcept() or accdupt(). I bet, the
lack of it already burned legions of software engineers.

chw added on 2019-11-08 23:03:13:
Forgot the latest:

  https://www.androwish.org/home/info/56ed33b76ae2c937

chw added on 2019-11-08 22:52:27:
Another proof of concept can be found in these AndroWish check-ins (for the
8.6.9 select based notifier only):

  https://www.androwish.org/home/info/9ba84a8a765c4da6
  https://www.androwish.org/home/info/5362b4849e8f8673

sebres added on 2019-11-07 11:51:04:

a) This is just PoC, not a fully correct solution (also note it still doesn't work for macOS/Xcode, see travis-ci/608613091, EBADFD is undeclared there).

b&c) Although I think fixing of known issue (also if it firstly looks like a workaround) is better as to blame on system restriction and even such simple stupid reject is better as a busy wait or even crash, but as already said better would be extend notifier with poll for FD >= FD_SETSIZE and then deactivate CHECK_FD_SETSIZE_LIMIT (see [c2d030bb31a977a4])...
So if ultimately it'd work for sockets (for FD exceeding FD_SETSIZE), it'd basically work for every other descriptor (like files, whatever).

> Can we maybe synthesize errors for channels that exceed FD_SETSIZE at the point where the notifier actually gets them (and not add them to the fd_set at all)?

I guess the emphasis is on "at the point"?
Well sure it is possible (somehow together with many other places where FD_SET/FD_CLR/FD_COPY/FD_whatever are called), just I'd like to fix it ultimately and not to improve the workaround.


obj added on 2019-11-07 01:48:08:

a) There are other non-socket paths that can introduce fds to the notifier. Notably, I believe external code can call Tcl_CreateFileHandler(). How are you going to handle that?

b) In 8.7, AIUI, there are several possible notifiers so restricting the valid FD range at FD creation time seems like it will artificially restrict other notifier types that aren't limited by FD_SETSIZE

c) what about code that uses many FDs but does not run the event loop? This is probably an uncommon case but it would be surprising for that to break

I tend to agree that this needs to be dealt with in the notifier somehow. Can we maybe synthesize errors for channels that exceed FD_SETSIZE at the point where the notifier actually gets them (and not add them to the fd_set at all)?


sebres added on 2019-11-06 17:41:13:

I prefer epoll/poll here, because the applying of subset like
FD_SET(fd%FD_SETSIZE, &fds_array[fd/FD_SETSIZE])
is hardly possible at the moment (for example for all asynchronous TSD with states/handlers across threads). For example if FD_SETSIZE is 1024 and we have some fds containing 2 "parallel" (in sense of subset) fd's like [10, 1034), such shifting to every subset (if applied directly over TSD-array) will always intermix states of descriptors there, so some threads awaiting for both could get wrong (confused) results.
This way it is almost impossible to implement without large change of notifier handling or even its architecture.


sebres added on 2019-11-06 15:27:13:

Sure, it's correct and I can follow your explanation (just the crash was even not reproducible to me)...

Anyway, the branch is extended now and [001f178818] contains a first shot of proper fix for this - now it retards accepting in case of out of FD but continue to accept if some descriptors arrive again (gets closed later).

As wewll as test cases covering both situations:

- 2.14 - no free FD available
- 2.15 - FD limit violates (exceeds) FD_SETSIZE

Both tests cover continuation of accept in server-listener, if free descriptors become avaialable later.

I see pretty well a busy wait for unpatched 8.6 (it hangs there) and 4 tests passed with my branch:

    $ ./tcltest ../tests/all.tcl -file socket.test -match '*-2.1[45]'
    ...
    all.tcl:        Total   169     Passed  4       Skipped 165     Failed  0
Now why it is a first shot: although it properly fixes the issue (error on limit & continue to accept later), but in case of NOFILE-limit exceeding FD_SETSIZE it simply rejects the connection (what is not optimal).
Better would be to extend notifier or async event handling to use epoll/poll (for all fd's fd >= FD_SETSIZE) or to rewrite waiting select to a bit hackish solution from SO/a/7977082/7161854 (e. g. FD_SET(fd%FD_SETSIZE, &fds_array[fd/FD_SETSIZE])) and then if it works remove all 3 extra prevention codes like cb2b27ad5d7eff89, line 1216-1222 that I used at the moment against violation of the set-size limit.


obj added on 2019-11-06 12:47:35:
With your branch, it still crashes for me here:

#7  0x00007f887d51df3e in Tcl_CreateFileHandler (fd=1024, mask=2, proc=proc@entry=0x7f887d51a3ac <WrapNotify>, clientData=0x5565df082ca0) at /home/oliver/tclthing/tcl-56620b4bd5/unix/tclUnixNotfy.c:634
634		    FD_SET(fd, &tsdPtr->checkMasks.readable);

.. which is entirely unsurprising, because your branch doesn't actually change anything in the problem case, which is where accept() succeeds but returns a fd >= FD_SETSIZE.

The crash vs busy wait thing is likely to be a build issue. Try building without --enable-symbols, or set CFLAGS=-O. I get a busy-wait instead of a crash if I build with --enable-symbols, I suspect because the bounds-checking code gets disabled in that case.

sebres added on 2019-11-06 11:38:12:
Sure, I have limited maximum number of fd... As already said no crash, only busy wait.
WiP.

obj added on 2019-11-06 00:56:03:

You're looking at the wrong problem. I was going to raise a separate ticket about server sockets misbehaving when they get EMFILE, but that's a different thing.

Ensure that you have a RLIMIT_NOFILE (note the "ulimit -n" in my reproduction steps) that's large enough to handle the number of sockets created by the test (so that you exceed FD_SETSIZE before you get EMFILE) and you should see the crash.


sebres added on 2019-11-05 19:57:06:

Well, I think I know what could cause a crash: the test-script is a bit weird, because even if noone socket gets accepted, it can explode endless "readable" events (accept-events of listener), but in state of descriptor exhaustion accept can't create new sockets.

Suggested fix of me can indeed "solve" that, but unfortunately it marks the listener as useless, because it would never accept anymore, also if some free descriptors would arrive (sockets or files get closed).

Proper fix should retard readable acceptor-handler of listener in situation of ENOBUFS/EMFILE (to avoid endless busy-wait), so it were able to continue later, if new descriptors arrive (accept succeeds). Unfortunately we don't really have any socket state that can be used for such purposes to recognize out of limit at script level or to notify connecting async socket about "too many open files", because normally this socket would connect to remote listener (other thread/process) and "accept" knows only counterpart of connection (which is not really built).

In normal use-case the accept would fail if listener process exceeds the limit of max file descriptors, so for example if TCP connection sending a FIN to other side, at other connection end we'll get readable event with reject.

Anyway, I have to rewrite my fix (to avoid perdition of listener). I'll try to use something other in order to avoid busy-wait, as well as provide test cases covering both situation (errors on EMFILE and work continuation after close of some descriptors).


sebres added on 2019-11-05 14:18:25:

Hmm... I could repoduce a busy-wait loop, but not the crash (may be it is platform related)...

The branch bug-c6897e6e6a should fix it (at least busy-wait is away, and I see correctly produced "couldn't open socket: too many open files" after 1024 used descriptors)...
Can you try this, whether the crash is solved too?