In this chapter we’re taking a look at the select()
function. This is a function that looks at a whole set of sockets and lets you know which ones have sent you data. That is, which ones are ready to call recv()
on.
This enables us to wait for data on a large number of sockets at the same time.
Let’s say the server is connected to three clients. It wants to recv()
data from whichever client sends it next.
But the server has no way of knowing which client will send data next.
In addition, when the server calls recv()
on a socket with no data ready to read, the recv()
call blocks, preventing anything else from running.
To block means that the process will stop executing here and goes to sleep until some condition is met. In the case of
recv()
, the process goes to sleep until there’s some data to receive.
So we have this problem where if we do something like this:
= s1.recv(4096)
data1 = s2.recv(4096)
data2 = s3.recv(4096) data3
but there’s no data ready on s1
, then the process will block there and not call recv()
on s2
or s3
even if there’s data to be received on those sockets.
We need a way to monitor s1
, s2
, and s3
at the same time, determine which of them have data ready to receive, and then call recv()
only on those sockets.
The select()
function does this. Calling select()
on a set of sockets will block until one or more of those sockets is ready-to-read. And then it returns to you which sockets are ready and you can call recv()
specifically on those.
select()
First of all, you need the select
module.
import select
If you have a bunch of connected sockets you want to test for being ready to recv()
, you can add those to a set()
and pass that to select()
. It will block until one is ready to read.
This set can be used as your canonical list of connected sockets. You need to keep track of them all somewhere, and this set is a good place. As you get new connections, you add them to the set, and as the connections hang up, you remove them from the set. In this way, it always holds the sockets of all the current connections.
Here’s an example. select()
takes three arguments and return three values. We’ll just look at the first of each of these for now, ignoring the other ones.
= {s1, s2, s3}
read_set
= select.select(read_set, {}, {}) ready_to_read, _, _
At this point, we can go through the sockets that are ready and receive data.
for s in ready_to_read:
= s.recv(4096) data
select()
with Listening SocketsIf you’ve been looking closely, you might have the following question: if the server is blocked on a select()
call waiting for incoming data, how can it also call accept()
to accept incoming connections? Won’t the incoming connections have to wait? Furthermore, accept()
blocks… how will we get back to the select()
if we’re blocked on that?
Fortunately, select()
provides us with an answer: you can add a listening socket to the set! When the listening socket shows up as “ready-to-read”, it means there’s a new incoming connection to accept()
.
Putting it all together, we get the core of any main loop that uses select()
:
add the listener socket to the set
main loop:
call select() and get the sockets that are ready to read
for all sockets that are ready to read:
if the socket is the listener socket:
accept() a new connection
add the new socket to our set!
else the socket is a regular socket:
recv() the data from the socket
if you receive zero bytes
the client hung up remove the socket from tbe set!
select()
?select()
actually takes three arguments. (Though for this project we only need to use the first one, so this section is purely informational.)
They correspond to:
And the return values map to these, as well.
= select.select(read_set, write_set, exc_set) read, write, exc
But again, for this project, we just use the first and ignore the rest.
= select.select(read_set, {}, {}) read, _, _
I told a bit of a lie. There’s an optional fourth arguments, the timeout
. It’s a floating point number of seconds to wait for an event to occur; if nothing occurs in that timeframe, select()
returns and none of the returned sockets are shown as ready.
You can also specify a timeout of 0
if you want to just poll the sockets.
Why can’t we just call recv()
on all the connected sockets? What does select()
buy us?
When select()
shows a socket “ready-to-read”, what does it mean if the socket is a listening socket versus a non-listening socket?
Why do we have to add the listener socket to the set, anyway? Why not just call accept()
and then call select()
?
select()
with send()
If your computer tries to send too much too fast, the call to send()
might block. That is, the OS will have it sleep while it processes the backlog of data to be sent.
But let’s say you really don’t want to have the call block and want to keep processing.
You can query with select()
to make sure a socket won’t block with a send()
call by passing a set containing the socket descriptor as the second argument.
And it’ll work in much the same way as the “ready to read” set works, above.