WSARecv & IOCP...when exactly is the notification sent???

Discussion:

(too old to reply)

George

2007-11-22 22:20:00 UTC

I have searched all over in newsgroups and web but can't figure out what is
the criteria that triggers the recv notification to be posted by IOCP.

For example, if I post a WSARecv with 8K buffer and, lets suppose, my server
is connected with a client over a slow connection therefore it will take the
client a while to send 8K. How does winsock handles these situations? would
it wait for 8K buffer to be filled? would it just receive whatever bytes it
does and post a notification? or does it wait for a particular time-period
and fills buffers till then before posting a notification to IOCP?

I also read about posting zero-byte buffers on WSARecv to avoid memory being
locked into non-paged pools. Same question, when is the notification for
zero-byte buffers sent? let assume, the client sends 8K of data, now would
the notification be sent as soon as any data arrives on the socket even 100
bytes or would it wait for the 8K to complete or the socket buffer to fill
before posting a notification?

I would really appreciate if anyone with the detailed know-how of Windows
Sockets reply to this thread!

Vladimir Petter [MSFT]

2007-11-23 21:29:34 UTC

Permalink

- The operation will be completed with whatever number of bytes are
available. It would not wait for all bytes to arrive.
- Operation will be completed as soon as there is a single byte available.
Vladimir

Post by George
I have searched all over in newsgroups and web but can't figure out what is
the criteria that triggers the recv notification to be posted by IOCP.
For example, if I post a WSARecv with 8K buffer and, lets suppose, my server
is connected with a client over a slow connection therefore it will take the
client a while to send 8K. How does winsock handles these situations? would
it wait for 8K buffer to be filled? would it just receive whatever bytes it
does and post a notification? or does it wait for a particular time-period
and fills buffers till then before posting a notification to IOCP?
I also read about posting zero-byte buffers on WSARecv to avoid memory being
locked into non-paged pools. Same question, when is the notification for
zero-byte buffers sent? let assume, the client sends 8K of data, now would
the notification be sent as soon as any data arrives on the socket even 100
bytes or would it wait for the 8K to complete or the socket buffer to fill
before posting a notification?
I would really appreciate if anyone with the detailed know-how of Windows
Sockets reply to this thread!

George

2007-11-24 14:07:00 UTC

Permalink

my understanding is that tcp data arrives in packets with MTU around 1500
bytes. Here is what i'm confused at: when the remote socket calls send() with
4KB of data then would zero-byte recv return as soon as the first packet
(transmission unit) is received? Would it be a wise-choice to use small
buffers 4K size of use with zero-byte recvs?

For first one, you mentioned that wsarecv with return with whatever bytes
there are available. So it is exactly similar to zero-byte recv with only
advantage being that the data is copied to provided buffer?

Post by Vladimir Petter [MSFT]
- The operation will be completed with whatever number of bytes are
available. It would not wait for all bytes to arrive.
- Operation will be completed as soon as there is a single byte available.
Vladimir

Vladimir Petter [MSFT]

2007-11-24 15:20:26 UTC

Permalink

0 bytes read has advantage of saving on locked pages and disadvantage
because you might double number of IO operations. IMO the answer if you
should use 0 bytes receive completely depends on your scenario. Its sounds
like a great idea if data are send rarely but it could be a bad idea if
data are coming at a constant rate often (streaming video). You should
experiment and see which one works better for a particular protocol. When
all you sends are of 4K size that sounds like a smart idea to issue a read
of 4K. In any case even if you issue a read of 1 byte a 4K (page) will be
locked. You should not make any assumptions on if the entire 4K will become
available in one read. Data will get fragmented so your application need to
handle this.
Vladimir

Post by George
my understanding is that tcp data arrives in packets with MTU around 1500
bytes. Here is what i'm confused at: when the remote socket calls send() with
4KB of data then would zero-byte recv return as soon as the first packet
(transmission unit) is received? Would it be a wise-choice to use small
buffers 4K size of use with zero-byte recvs?
For first one, you mentioned that wsarecv with return with whatever bytes
there are available. So it is exactly similar to zero-byte recv with only
advantage being that the data is copied to provided buffer?

2007-11-24 21:02:02 UTC

Permalink

The part that is confusing you is that you are forgetting about protocol
processing in the network stack. When a packet comes in off the wire (or
thru the air), it is processed by each protocol driver bound to that
interface. Each protocol will discard any packets that it doesn't
understand (because they are part of another protocol etc. - which it how it
is possible to run NetBIOS & IP & IPX etc. on the same network). Look at
NDIS in the DDK if you want more info on what a protocol driver does and how
it interfaces to the hardware.

In the specific case of TCP / IP sockets on Windows, Microsoft has provided
an NDIS protocol driver, and a bunch of other drivers and interfaces, to
process the IP and TCP protocols in kernel mode. What you get out of
WSARecv is access to the TCP stream as it is assembled by the protocol
driver. This is an oversimplification, but is a good way of thinking about
the process.

When deciding how to interface with the Winsock API, remember this:

- A call to WSARecv returns when some new piece of the TCP stream
becomes ready for the application

- The size of the data returned by a call to WSARecv is arbitrarily
determined by the protocol processing and ranges from 1 byte to the full
size of the supplied buffer

- There is no correlation between the size of the data returned by
WSARecv and the size of buffered supplied to WSASend

- There is no direct correlation between the arrival of a packet and
a return from WSARecv. A call to WSARecv could be satisfied from data that
has already arrived and been assembled; a single packet could provide data
for multiple WSARecv's, or a single WSARecv could return data from multiple
packets.

- The protocol processing uses buffer space in kernel mode (usually
non-paged memory) to perform the protocol processing. If you have a pending
call to WSARecv, then it will use the buffer that you supply directly; if
there are no pending calls to WSARecv, then the driver will allocate its own
buffer and then later copy the data to the one you supply in WSARecv. As
the driver's buffer fills, it decreases the TCP window size and the data
transmission rate slows (again an oversimplification, but you get the
point - you can lookup the details in the RFC)

So . all of this is to say that, unless you have an extremely large number
of connections (10,000+), pending a 4KB read to each will improve
performance (by eliminating a memory copy & preventing network stalls) for
most data rates on most hardware.

I recommend posting multiple buffers (with multiple calls to WSARecv) rather
than posting larger buffers if this scheme doesn't meet performance
requirements, even though is uses more system resources, because it allows
the driver to continue to use a user buffer to assemble the TCP stream while
the results from a WSARecv call are being processed by the application. The
complication with this design is that, while the buffers are filled FIFO as
the driver 'sees' them, if the IO is issued from multiple threads (as in
most IOCP based designs), careful sync is required to ensure that the
buffers are put back together in the same order as they were filled by the
driver.

George

2007-11-26 16:40:01 UTC

Permalink

Thanks Vladimir & m, your posts were really helpful!

Post by m
The part that is confusing you is that you are forgetting about protocol
processing in the network stack. When a packet comes in off the wire (or
thru the air), it is processed by each protocol driver bound to that
interface. Each protocol will discard any packets that it doesn't
understand (because they are part of another protocol etc. - which it how it
is possible to run NetBIOS & IP & IPX etc. on the same network). Look at
NDIS in the DDK if you want more info on what a protocol driver does and how
it interfaces to the hardware.
In the specific case of TCP / IP sockets on Windows, Microsoft has provided
an NDIS protocol driver, and a bunch of other drivers and interfaces, to
process the IP and TCP protocols in kernel mode. What you get out of
WSARecv is access to the TCP stream as it is assembled by the protocol
driver. This is an oversimplification, but is a good way of thinking about
the process.
- A call to WSARecv returns when some new piece of the TCP stream
becomes ready for the application
- The size of the data returned by a call to WSARecv is arbitrarily
determined by the protocol processing and ranges from 1 byte to the full
size of the supplied buffer
- There is no correlation between the size of the data returned by
WSARecv and the size of buffered supplied to WSASend
- There is no direct correlation between the arrival of a packet and
a return from WSARecv. A call to WSARecv could be satisfied from data that
has already arrived and been assembled; a single packet could provide data
for multiple WSARecv's, or a single WSARecv could return data from multiple
packets.
- The protocol processing uses buffer space in kernel mode (usually
non-paged memory) to perform the protocol processing. If you have a pending
call to WSARecv, then it will use the buffer that you supply directly; if
there are no pending calls to WSARecv, then the driver will allocate its own
buffer and then later copy the data to the one you supply in WSARecv. As
the driver's buffer fills, it decreases the TCP window size and the data
transmission rate slows (again an oversimplification, but you get the
point - you can lookup the details in the RFC)
So . all of this is to say that, unless you have an extremely large number
of connections (10,000+), pending a 4KB read to each will improve
performance (by eliminating a memory copy & preventing network stalls) for
most data rates on most hardware.
I recommend posting multiple buffers (with multiple calls to WSARecv) rather
than posting larger buffers if this scheme doesn't meet performance
requirements, even though is uses more system resources, because it allows
the driver to continue to use a user buffer to assemble the TCP stream while
the results from a WSARecv call are being processed by the application. The
complication with this design is that, while the buffers are filled FIFO as
the driver 'sees' them, if the IO is issued from multiple threads (as in
most IOCP based designs), careful sync is required to ensure that the
buffers are put back together in the same order as they were filled by the
driver.