August 27, 2009, 1:27 pm
Doug Eadline looks at this question in a post at Cluster Connection from yesterday. In the post he gives a lucid description of the difference between user and kernel space communications, and why that difference impacts performance
When interconnects are used in HPC the best performance comes from a “user space” mode. Communication over a network normally takes place through the kernel. (i.e. the kernel manages, and in a sense guarantees, data will get to where it is supposed to go). This communication path, however, requires memory to be copied from the users program space to a kernel buffer. The kernel then manages the communication. On the receiver node, the kernel will accept the data and place it in a kernel buffer. The buffer is then copied to the users program space. The excess copying often adds to the latency for a given network. In addition, the kernel must process the TCP/IP stack for each communication. For applications that require low latency, the extra copying from user program space to kernel buffer on the sending node and then from kernel buffer to user program space on the receiving node can be very inefficient…