Myrinet 10G性能情報
Myri-10G
10-Gigabit Ethernet
Performance Measurements
We report performance measurements for Myri-10G NICs using our 10-Gigabit Ethernet driver, Myri10GE, on Linux, Windows, Solaris, MacOSX, and FreeBSD.
Linux | Windows | Solaris GLDv2 | Solaris GLDv3 | MacOSX | FreeBSD
| Benchmark: | netperf version 2.4.5 |
| OS: | Centos5 x86_64 2.6.18-128.1.16.el5 kernel |
| NICs: | Myri-10G 10G-PCIE-8B |
| Driver: | Myri10GE version 1.5.0 |
| Interrupt Coalescing: | 75 µs |
| TCP Segmentation Offload (TSO): | enabled |
| Large Receive Offload (LRO): | enabled |
| Hosts: | Asus RS500-E6-PS4 systems with dual Intel quad-core 2.93GHz Xeon X5570s (8 2.93GHz Nehalem cores) |
| Topology: | point-to-point (switchless) |
For these Linux tests, TCP buffer sizes were increased and TCP timestamps were disabled as recommended in the Performance Tuning section of the Linux Myri10GE README, and the netserver was run without options. Performance is measured using 9000-byte (jumbo) frames and 1500-byte (standard) frames, and bandwidth (BW) is measured in Megabits/second.
Netperf Results, MTU 9000
Commands:
$ netperf -H asus02-m -t TCP_STREAM -C -c -l 60
$ netperf -H asus02-m -t TCP_SENDFILE -l 60 -C -c -F /boot/vmlinuz-2.6.18-
128.1.16.el5
$ netperf -H asus02-m -t UDP_STREAM -l 60 -C -c -- -m 8972 -s 4M -S 4M
Results:
Netperf Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
TCP_STREAM 9000 9910.33 4.52 2.84
TCP_SENDFILE 9000 9910.32 2.71 2.82
UDP_STREAM_TX 9000 9924.70 5.73 0.00
UDP_STREAM_RX 9000 9924.70 0.00 3.66
Netperf Results, MTU 1500
Commands:
$ netperf -H asus02-m -t TCP_STREAM -C -c -l 60
$ netperf -H asus02-m -t TCP_SENDFILE -l 60 -C -c -F /boot/vmlinuz-2.6.18-
128.1.16.el5
$ netperf -H asus02-m -t UDP_STREAM -l 60 -C -c -- -m 1472 -s 4M -S 4M
Results:
Netperf Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
TCP_STREAM 1500 9477.10 4.62 5.57
TCP_SENDFILE 1500 9452.54 2.56 5.63
UDP_STREAM_TX 1500 9249.00 12.51 0.00
UDP_STREAM_RX 1500 9249.00 0.00 11.59
Notes:
- If you are unable to reproduce these performance results, refer to this FAQ entry for Linux performance tuning suggestions, as well as the Test Results with Myri-10G NICs and PCI-Express Motherboards web page for comparative results with different chipsets and motherboards.
- CPU utilization (CPU %) is shown as the percentage of all CPUs as reported by netperf with the -c and -C options. I.e. a value of 100 would mean all CPUs are fully utilized.
| Benchmark: | ntttcps and ntttcpr (from the Windows 2003 DDK) |
| OS: | Windows Server 2003 x64 SP1 Edition |
| NICs: | Myri-10G 10G-PCIE-8A |
| Driver: | Myri10GE AMD64 version 1.0.1 |
| Interrupt Coalescing: | 25 µs |
| TCP Segmentation Offload (TSO): | enabled |
| Checksum Offload: | enabled |
| Flow Control: | enabled |
| Hosts: | Sender: Tyan S2895 motherboard with AMD single-core dual-processor 2.6GHz Opteron |
| Receiver: Dell PowerEdge 2950 | |
| Topology: | point-to-point (switchless) |
For these Windows tests, no registry entries were added to the Windows 2003-based machines. Bandwidth (BW) is measured in Megabits/second.
One ntttcps process was run on one Windows host connected to one Windows host running one ntttcpr process.
Ntttcp Results, MTU 9000
Commands:
Sender: ntttcps -m 1,1,10.0.130.50 -l 1048576 -n 100000 -w -v -a 8
Receiver: ntttcpr -m 1,1,10.0.130.50 -l 1048576 -rb 2097152 -n 1000000 -w -v -a
8
Results on the Sender:
-----------------------------------------------------------------
| Estimated Time to Complete Test at line speed (seconds) |
-----------------------------------------------------------------
1000 Base-T 622 OC-12(ATM) 155 OC-3(ATM) 100 Base-T 10 Base-T
=========== ============== ============= ========== =========
419 369 1408 2128 25000
------------------------------------------------------
| Output Summary |
------------------------------------------------------
Thread Realtime(s) Throughput(KB/s) Throughput(Mbit/s)
====== =========== ================ ==================
0 85.500 1226404.678 9811.237
Total Bytes(MEG) Realtime(s) Average Frame Size Total Throughput(Mbit/s)
================ =========== ================== ========================
104857.600000 85.500 60667.263 9811.237
Total Buffers Throughput(Buffers/s) Pkts(sent/intr) Intr(count/s) Cycles/Byte
============= ===================== =============== ============= ===========
100000.000 1169.591 1 23467.10 0.5
Packets Sent Packets Received Total Retransmits Total Errors Avg. CPU %
============ ================ ================= ============ ==========
1728405 281845 2 0 10.70
Results on the Receiver:
-----------------------------------------------------------------
| Estimated Time to Complete Test at line speed (seconds) |
-----------------------------------------------------------------
1000 Base-T 622 OC-12(ATM) 155 OC-3(ATM) 100 Base-T 10 Base-T
=========== ============== ============= ========== =========
419 369 1408 2128 25000
------------------------------------------------------
| Output Summary |
------------------------------------------------------
Thread Realtime(s) Throughput(KB/s) Throughput(Mbit/s)
====== =========== ================ ==================
0 85.735 1223043.098 9784.345
Total Bytes(MEG) Realtime(s) Average Frame Size Total Throughput(Mbit/s)
================ =========== ================== ========================
104857.600000 85.735 8959.587 9784.345
Total Buffers Throughput(Buffers/s) Pkts(recv/intr) Intr(count/s) Cycles/Byte
============= ===================== =============== ============= ===========
100000.000 1166.385 29 4610.68 2.7
Packets Sent Packets Received Total Retransmits Total Errors Avg. CPU %
============ ================ ================= ============ ==========
281837 11703396 0 0 27.27
Notes:
- NTTTCP is a closed source benchmark available from Microsoft at http://www.microsoft.com/whdc/device/network/TCP_tool.mspx. Windows OSes benefit from overlapping socket communication using Winsock2. The benchmark is based on the original ttcp benchmark.
- The performance results can vary and are dependent on CPU type and the Windows operating system version. Tweaking can be done for example by changing the message size and the socket sizes on the receive side (-l, -rb). When using version 2.5, an optional -fr argument can also improve performance.
- A "Frame" in the ntttcps output refers to a unit passed to the socket (1MB), and a "Packet" refers to a unit passed from the TCP stack to the Ethernet driver (64KB, since the TSO is enabled).
- If you're using a version of the Myri10GE driver prior to 1.0.3, it is possible to achieve higher throughput and lower CPU utilization by using a driver that configures the PCI Express chipset in a mode better suited to Myri-10G NICs. However, this kind of re-configuration is not allowed in a WHQL-certified driver. Contact help@myri.com for details.
If you're using Windows 2000, XP, or 2003, you will need to add the following two registry entries:
HKLM\System\CurrentControlSet\Services\Tcpip\Parameters:
- Tcp1323Opts, type REG_DWORD, value set to 1.
- TcpWindowSize, type REG_DWORD, value set to 512K.
-
For a detailed list of Performance Tuning Guidelines for Windows Server 2003 and 2008 refer to this FAQ entry.
| Benchmark: | netperf version 2.4.5 |
| OS: | OpenSolaris 2008.11 (snv_101b_rc2) |
| NICs: | Myri-10G 10G-PCIE-8B |
| Driver: | Myri10GE version AMD64 1.0.4 |
| Interrupt Coalescing: | 30 µs |
| Large Receive Offload (LRO): | enabled |
| Hosts: | Asus RS500-E6-PS4 systems with dual Intel quad-core 2.93GHz Xeon X5570s (8 2.93GHz Nehalem cores) |
| Topology: | point-to-point (switchless) |
For these Solaris GLDv2 tests, the netserver was run without options. Performance is measured using 9000-byte (jumbo) frames and 1500-byte (standard) frames, and bandwidth (BW) is measured in Megabits/second.
Netperf Results, MTU 9000
Commands:
$ netperf -H asus2-m -t TCP_STREAM -C -c -l 60 -T loc,remote -- -s 512K
-S 512K
$ netperf -H asus2-m -t TCP_SENDFILE -F/var/tmp/scratch -C -c -l 60 -T loc,
remote -- -s 512K -S 512K
$ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c -T loc,remote -- -m 8972
-s 512K -S 512K
Results:
Netperf Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
TCP_STREAM 9000 9877.62 9.91 10.08
TCP_SENDFILE 9000 9887.49 11.83 10.34
UDP_STREAM_TX 9000 9880.90 17.51 00.00
UDP_STREAM_RX 9000 9880.90 00.00 17.93
Netperf Results, MTU 1500
Commands:
$ netperf -H asus2-m -t TCP_STREAM -C -c -l 60 -T loc,remote -- -s 1M -S 1M
$ netperf -H asus2-m -t TCP_SENDFILE -F/var/tmp/scratch -C -c -l 60 -T
loc,remote -- -s 1M -S 1M
$ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c -T loc,remote -- -m 1472 -s 1M
-S 1M
Results:
Netperf Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
TCP_STREAM 1500 7787.70 17.59 19.51
TCP_SENDFILE 1500 5775.41 24.65 17.16
UDP_STREAM_TX 1500 5291.70 15.05 00.00
UDP_STREAM_RX 1500 5165.40 00.00 28.63
Notes:
- Solaris's GLDv2 driver ABI does not support TCP Segmentation Offload (TSO).
- CPU utilization (CPU %) is shown as the percentage of all CPUs as reported by netperf with the -c and -C options. I.e. a value of 100 would mean all CPUs are fully utilized.
- Netperf's CPU binding (-Tlocal,remote) feature was used to bind the netserver and the netperf processes to all combinations of local and remote CPUs. The results from the best combination of local and remote CPU binding are presented.
| Benchmark: | netperf version 2.4.5 |
| OS: | OpenSolaris 2008.11 (snv_101b_rc2) |
| NICs: | Myri-10G 10G-PCIE-8B |
| Driver: | Myri10GE version AMD64 1.4.5gldv3 |
| Interrupt Coalescing: | 125 µs |
| TCP Segmentation Offload (TSO): | enabled |
| Large Receive Offload (LRO): | enabled |
| Hosts: | Asus RS500-E6-PS4 systems with dual Intel
quad-core 2.93GHz Xeon X5570s (8 2.93GHz Nehalem cores) |
| Topology: | point-to-point (switchless) |
For these Solaris GLDv3 tests, the netserver was run without options. Performance is measured using 9000-byte (jumbo) frames and 1500-byte (standard) frames, and bandwidth (BW) is measured in Megabits/second.
Netperf Results, MTU 9000
Commands:
$ netperf -H asus2-m -t TCP_STREAM -C -c -l 60 -- -s 512K -S 512K
$ netperf -H asus2-m -t TCP_SENDFILE -F/var/tmp/scratch -C -c -l 60 -- -s
512K -S 512K
$ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c -- -m 8972 -s 512K -S 512K
Results:
Netperf Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
TCP_STREAM 9000 9868.72 9.29 8.91
TCP_SENDFILE 9000 9866.15 11.96 8.94
UDP_STREAM_TX 9000 9925.20 9.33 00.00
UDP_STREAM_RX 9000 9925.20 00.00 9.04
Netperf Results, MTU 1500
Commands:
$ netperf -H asus2-m -t TCP_STREAM -C -c -l 60 -- -s 512K -S 512K
$ netperf -H asus2-m -t TCP_SENDFILE -F/var/tmp/scratch -C -c -l 60 -- -s
512K -S 512K
$ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c -- -m 1472 -s 512K -S 512K
Results:
Netperf Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
TCP_STREAM 1500 9345.75 7.75 20.32
TCP_SENDFILE 1500 9285.96 9.15 20.98
UDP_STREAM_TX 1500 5978.60 12.55 00.00
UDP_STREAM_RX 1500 5978.60 00.00 24.20
Notes:
- The Solaris GLDv3 ABI is an unstable ABI, meaning that a Solaris OS upgrade may render the driver inoperable; hence, we do not offer this driver on our download page. We are working with Sun to get this driver integrated into Solaris. The driver was integrated into OpenSolaris build snv_121 and will appear in a future release of OpenSolaris. In the interim, please send mail to help@myri.com to request the driver.
- CPU utilization (CPU %) is shown as the percentage of all CPUs as reported by netperf with the -c and -C options. I.e. a value of 100 would mean all CPUs are fully utilized.
| Benchmark: | netperf version 2.4.3 iperf version 2.0.2 |
| OS: | MacOSX 10.5 |
| NICs: | Myri-10G 10G-PCIE-8A |
| Driver: | Myri10GE version 1.1.0 |
| Interrupt Coalescing: | 75 µs |
| Large Receive Offload (LRO): | enabled |
| Hosts: | MacPro with Intel dual-core dual-processor 2.6GHz Xeons |
| Topology: | point-to-point (switchless) |
For these MacOSX tests, LRO was enabled as recommended in the Performance Tuning section of the MacOSX Myri10GE README, and the netserver was run without options. The iperf server was run with the same window (-w) and buffer length (-l) arguments as the client. Performance is measured using 9000-byte (jumbo) frames and 1500-byte (standard) frames, and bandwidth (BW) is measured in Megabits/second.
Netperf Results, MTU 9000
Commands:
$ netperf -H macpro01-m -t TCP_STREAM -C -c -l 60 -- -S 768K -S 768K -m 256K
$ netperf -H macpro01-m -t UDP_STREAM -l 60 -C -c -- -m 32K -s 512K -S512K
$ iperf -c macpro01-m -w -w 768k -l 256k -P 2 -f m -t 60
Results:
Netperf Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
TCP_STREAM 9000 9661.82 41.38 36.74
UDP_STREAM_TX 9000 6867.00 28.08 00.00
UDP_STREAM_RX 9000 6867.00 00.00 39.26
Dual-Stream TCP Results (2 netperf processes):
Netperf Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
TCP_STREAM 9000 9692.00 54.72 47.36
Dual-Stream TCP Results (2 iperf threads):
Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
iperf 9000 9825.00 65 58
Netperf Results, MTU 1500
Commands:
$ netperf -H macpro01-m -t TCP_STREAM -C -c -l 60 -- -s 768K -S 768K -m 256K
$ netperf -H macpro01-m -t UDP_STREAM -l 60 -C -c -- -m 32K -s 512K -S512K
$ iperf -c macpro01-m -w 512k -l 256k -P 2 -f m -t 60
Single-Stream Results:
Netperf Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
TCP_STREAM 1500 4782.15 41.70 39.15
UDP_STREAM_TX 1500 3310.40 27.85 00.00
UDP_STREAM_RX 1500 3310.40 00.00 39.24
Dual-Stream TCP Results (2 netperf processes):
Netperf Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
TCP_STREAM 1500 4367.00 42.29 43.75
Dual-Stream TCP Results (2 iperf threads):
Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
iperf 1500 6417.00 76 65
Notes:
- MacOSX does not support TCP Segmentation Offload (TSO).
- The CPU usage reported for the iperf runs is the sum of the user and system times as reported by iostat. Iperf itself does not report CPU usage.
| Benchmark: | netperf version 2.4.5 |
| OS: | FreeBSD/amd64 7.2-RELEASE |
| NICs: | Myri-10G 10G-PCIE-8B |
| Driver: | if_mxge |
| Interrupt Coalescing: | 30 µs |
| TCP Segmentation Offload (TSO): | enabled |
| Large Receive Offload (LRO): | enabled |
| Hosts: | Asus RS500-E6-PS4 systems with dual Intel quad-core 2.93GHz Xeon X5570s (8 2.93GHz Nehalem cores) |
| Topology: | point-to-point (switchless) |
For these FreeBSD tests, the kern.ipc.maxsockbuf tunable was increased to 16777216, and the netserver was run without options. Performance is measured using 9000-byte (jumbo) frames and 1500-byte (standard) frames, and bandwidth (BW) is measured in Megabits/second.
Netperf Results, MTU 9000
Commands:
$ netperf -H asus02-m -t TCP_STREAM -C -c -l 60
$ netperf -H asus02-m -t TCP_SENDFILE -l 60 -C -c -F /boot/kernel/kernel
$ netperf -H asus02-m -t UDP_STREAM -l 60 -C -c -- -m 8972 -s 128K -S 128K
Results:
Netperf Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
TCP_STREAM 9000 9887.91 8.22 7.73
TCP_SENDFILE 9000 9887.31 6.33 7.50
UDP_STREAM_TX 9000 9926.00 13.85 0.00
UDP_STREAM_RX 9000 9926.00 0.00 6.77
Netperf Results, MTU 1500
Commands:
$ netperf -H asus02-m -t TCP_STREAM -C -c -l 60
$ netperf -H asus02-m -t TCP_SENDFILE -l 60 -C -c -F /boot/kernel/kernel
$ netperf -H asus02-m -t UDP_STREAM -l 60 -C -c -- -m 16256 -s 128K -S 128K
Results:
Netperf Test MTU BW TX_CPU % RX_CPU %
------------ ---- ------- -------- --------
TCP_STREAM 1500 9361.92 8.26 10.07
TCP_SENDFILE 1500 9390.04 5.90 10.21
UDP_STREAM_TX 1500 9243.90 14.18 0.00
UDP_STREAM_RX 1500 9243.90 0.00 14.69
Notes:
- CPU utilization (CPU %) is shown as the percentage of all CPUs as reported by netperf with the -c and -C options. I.e. a value of 100 would mean all CPUs are fully utilized.



