Myrinet 10G性能情報

Myri-10G
10-Gigabit Ethernet
Performance Measurements

We report performance measurements for Myri-10G NICs using our 10-Gigabit Ethernet driver, Myri10GE, on Linux, Windows, Solaris, MacOSX, and FreeBSD.

Linux | Windows | Solaris GLDv2 | Solaris GLDv3 | MacOSX | FreeBSD

Linux

Benchmark: netperf version 2.4.5
OS: Centos5 x86_64 2.6.18-128.1.16.el5 kernel
NICs: Myri-10G 10G-PCIE-8B
Driver: Myri10GE version 1.5.0
Interrupt Coalescing: 75 µs
TCP Segmentation Offload (TSO): enabled
Large Receive Offload (LRO): enabled
Hosts: Asus RS500-E6-PS4 systems with dual Intel quad-core 2.93GHz
Xeon X5570s (8 2.93GHz Nehalem cores)
Topology: point-to-point (switchless)

For these Linux tests, TCP buffer sizes were increased and TCP timestamps were disabled as recommended in the Performance Tuning section of the Linux Myri10GE README, and the netserver was run without options. Performance is measured using 9000-byte (jumbo) frames and 1500-byte (standard) frames, and bandwidth (BW) is measured in Megabits/second.

Netperf Results, MTU 9000

Commands:
     $ netperf -H asus02-m -t TCP_STREAM -C -c -l 60
     $ netperf -H asus02-m -t TCP_SENDFILE -l 60 -C -c -F /boot/vmlinuz-2.6.18-
128.1.16.el5
     $ netperf -H asus02-m -t UDP_STREAM -l 60 -C -c -- -m 8972 -s 4M -S 4M
Results:
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     9000   9910.33   4.52       2.84
     TCP_SENDFILE   9000   9910.32   2.71	2.82
     UDP_STREAM_TX  9000   9924.70   5.73	0.00
     UDP_STREAM_RX  9000   9924.70   0.00	3.66
     

Netperf Results, MTU 1500

Commands:
     $ netperf -H asus02-m -t TCP_STREAM -C -c -l 60
     $ netperf -H asus02-m -t TCP_SENDFILE -l 60 -C -c -F /boot/vmlinuz-2.6.18-
128.1.16.el5
     $ netperf -H asus02-m -t UDP_STREAM -l 60 -C -c -- -m 1472 -s 4M -S 4M

Results:
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     1500   9477.10   4.62       5.57
     TCP_SENDFILE   1500   9452.54   2.56       5.63
     UDP_STREAM_TX  1500   9249.00  12.51       0.00
     UDP_STREAM_RX  1500   9249.00   0.00      11.59

Notes:

  • If you are unable to reproduce these performance results, refer to this FAQ entry for Linux performance tuning suggestions, as well as the Test Results with Myri-10G NICs and PCI-Express Motherboards web page for comparative results with different chipsets and motherboards.
  • CPU utilization (CPU %) is shown as the percentage of all CPUs as reported by netperf with the -c and -C options. I.e. a value of 100 would mean all CPUs are fully utilized.


Windows

Benchmark: ntttcps and ntttcpr
(from the Windows 2003 DDK)
OS: Windows Server 2003 x64 SP1 Edition
NICs: Myri-10G 10G-PCIE-8A
Driver: Myri10GE AMD64 version 1.0.1
Interrupt Coalescing: 25 µs
TCP Segmentation Offload (TSO): enabled
Checksum Offload: enabled
Flow Control: enabled
Hosts: Sender: Tyan S2895 motherboard with AMD single-core
dual-processor 2.6GHz Opteron

Receiver: Dell PowerEdge 2950
Topology: point-to-point (switchless)

For these Windows tests, no registry entries were added to the Windows 2003-based machines. Bandwidth (BW) is measured in Megabits/second.

One ntttcps process was run on one Windows host connected to one Windows host running one ntttcpr process.

Ntttcp Results, MTU 9000

Commands:
    Sender: ntttcps -m 1,1,10.0.130.50 -l 1048576 -n 100000 -w -v -a 8
    Receiver: ntttcpr -m 1,1,10.0.130.50 -l 1048576 -rb 2097152 -n 1000000 -w -v -a
 8

Results on the Sender:

-----------------------------------------------------------------
|     Estimated Time to Complete Test at line speed (seconds)   |
-----------------------------------------------------------------

1000 Base-T  622 OC-12(ATM)  155 OC-3(ATM)  100 Base-T  10 Base-T
===========  ==============  =============  ==========  =========

        419             369           1408        2128       25000



------------------------------------------------------
|                   Output Summary                   |
------------------------------------------------------

Thread Realtime(s) Throughput(KB/s) Throughput(Mbit/s)
====== =========== ================ ==================

     0      85.500        1226404.678           9811.237


Total Bytes(MEG) Realtime(s) Average Frame Size Total Throughput(Mbit/s)
================ =========== ================== ========================

   104857.600000      85.500          60667.263                 9811.237


Total Buffers Throughput(Buffers/s) Pkts(sent/intr) Intr(count/s) Cycles/Byte
============= ===================== =============== ============= ===========

   100000.000              1169.591               1      23467.10         0.5


Packets Sent Packets Received Total Retransmits Total Errors Avg. CPU %
============ ================ ================= ============ ==========

     1728405           281845                 2            0      10.70

Results on the Receiver:

-----------------------------------------------------------------
|     Estimated Time to Complete Test at line speed (seconds)   |
-----------------------------------------------------------------

1000 Base-T  622 OC-12(ATM)  155 OC-3(ATM)  100 Base-T  10 Base-T
===========  ==============  =============  ==========  =========

        419             369           1408        2128       25000



------------------------------------------------------
|                   Output Summary                   |
------------------------------------------------------

Thread Realtime(s) Throughput(KB/s) Throughput(Mbit/s)
====== =========== ================ ==================

     0      85.735        1223043.098           9784.345


Total Bytes(MEG) Realtime(s) Average Frame Size Total Throughput(Mbit/s)
================ =========== ================== ========================

   104857.600000      85.735           8959.587                 9784.345


Total Buffers Throughput(Buffers/s) Pkts(recv/intr) Intr(count/s) Cycles/Byte
============= ===================== =============== ============= ===========

   100000.000              1166.385              29       4610.68         2.7


Packets Sent Packets Received Total Retransmits Total Errors Avg. CPU %
============ ================ ================= ============ ==========

      281837         11703396                 0            0      27.27

Notes:

  • NTTTCP is a closed source benchmark available from Microsoft at http://www.microsoft.com/whdc/device/network/TCP_tool.mspx. Windows OSes benefit from overlapping socket communication using Winsock2. The benchmark is based on the original ttcp benchmark.
  • The performance results can vary and are dependent on CPU type and the Windows operating system version. Tweaking can be done for example by changing the message size and the socket sizes on the receive side (-l, -rb). When using version 2.5, an optional -fr argument can also improve performance.
  • A "Frame" in the ntttcps output refers to a unit passed to the socket (1MB), and a "Packet" refers to a unit passed from the TCP stack to the Ethernet driver (64KB, since the TSO is enabled).
  • If you're using a version of the Myri10GE driver prior to 1.0.3, it is possible to achieve higher throughput and lower CPU utilization by using a driver that configures the PCI Express chipset in a mode better suited to Myri-10G NICs. However, this kind of re-configuration is not allowed in a WHQL-certified driver. Contact help@myri.com for details.
  • If you're using Windows 2000, XP, or 2003, you will need to add the following two registry entries:

    HKLM\System\CurrentControlSet\Services\Tcpip\Parameters:

    • Tcp1323Opts, type REG_DWORD, value set to 1.
    • TcpWindowSize, type REG_DWORD, value set to 512K.
  • For a detailed list of Performance Tuning Guidelines for Windows Server 2003 and 2008 refer to this FAQ entry.



Solaris GLDv2

Benchmark: netperf version 2.4.5
OS: OpenSolaris 2008.11 (snv_101b_rc2)
NICs: Myri-10G 10G-PCIE-8B
Driver: Myri10GE version AMD64 1.0.4
Interrupt Coalescing: 30 µs
Large Receive Offload (LRO): enabled
Hosts: Asus RS500-E6-PS4 systems with dual Intel quad-core 2.93GHz
Xeon X5570s (8 2.93GHz Nehalem cores)
Topology: point-to-point (switchless)

For these Solaris GLDv2 tests, the netserver was run without options. Performance is measured using 9000-byte (jumbo) frames and 1500-byte (standard) frames, and bandwidth (BW) is measured in Megabits/second.

Netperf Results, MTU 9000

Commands:
     $ netperf -H asus2-m -t TCP_STREAM -C -c -l 60  -T loc,remote -- -s 512K
 -S 512K 
     $ netperf -H asus2-m -t TCP_SENDFILE -F/var/tmp/scratch -C -c -l 60  -T loc,
remote -- -s 512K -S 512K
     $ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c  -T loc,remote -- -m 8972
 -s 512K -S 512K

Results:
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     9000   9877.62   9.91      10.08
     TCP_SENDFILE   9000   9887.49  11.83      10.34
     UDP_STREAM_TX  9000   9880.90  17.51      00.00
     UDP_STREAM_RX  9000   9880.90  00.00      17.93

Netperf Results, MTU 1500

Commands:
     $ netperf -H asus2-m -t TCP_STREAM -C -c -l 60  -T loc,remote -- -s 1M -S 1M 
     $ netperf -H asus2-m -t TCP_SENDFILE -F/var/tmp/scratch -C -c -l 60  -T
 loc,remote -- -s 1M -S 1M 
     $ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c  -T loc,remote -- -m 1472 -s 1M
 -S 1M 

Results:
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     1500   7787.70  17.59      19.51
     TCP_SENDFILE   1500   5775.41  24.65      17.16
     UDP_STREAM_TX  1500   5291.70  15.05      00.00
     UDP_STREAM_RX  1500   5165.40  00.00      28.63

Notes:

  • Solaris's GLDv2 driver ABI does not support TCP Segmentation Offload (TSO).
  • CPU utilization (CPU %) is shown as the percentage of all CPUs as reported by netperf with the -c and -C options. I.e. a value of 100 would mean all CPUs are fully utilized.
  • Netperf's CPU binding (-Tlocal,remote) feature was used to bind the netserver and the netperf processes to all combinations of local and remote CPUs. The results from the best combination of local and remote CPU binding are presented.


Solaris GLDv3

Benchmark: netperf version 2.4.5
OS: OpenSolaris 2008.11 (snv_101b_rc2)
NICs: Myri-10G 10G-PCIE-8B
Driver: Myri10GE version AMD64 1.4.5gldv3
Interrupt Coalescing: 125 µs
TCP Segmentation Offload (TSO): enabled
Large Receive Offload (LRO): enabled
Hosts: Asus RS500-E6-PS4 systems with dual Intel quad-core 2.93GHz
Xeon X5570s (8 2.93GHz Nehalem cores)
Topology: point-to-point (switchless)

For these Solaris GLDv3 tests, the netserver was run without options. Performance is measured using 9000-byte (jumbo) frames and 1500-byte (standard) frames, and bandwidth (BW) is measured in Megabits/second.

Netperf Results, MTU 9000

Commands:
     $ netperf -H asus2-m -t TCP_STREAM -C -c -l 60  -- -s 512K -S 512K 
     $ netperf -H asus2-m -t TCP_SENDFILE -F/var/tmp/scratch -C -c -l 60   -- -s
 512K -S 512K 
     $ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c   -- -m 8972 -s 512K -S 512K 

Results:
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     9000   9868.72   9.29       8.91
     TCP_SENDFILE   9000   9866.15  11.96       8.94
     UDP_STREAM_TX  9000   9925.20   9.33      00.00
     UDP_STREAM_RX  9000   9925.20  00.00       9.04

Netperf Results, MTU 1500

Commands:
     $ netperf -H asus2-m -t TCP_STREAM -C -c -l 60   -- -s 512K -S 512K 
     $ netperf -H asus2-m -t TCP_SENDFILE -F/var/tmp/scratch -C -c -l 60   -- -s
 512K -S 512K 
     $ netperf -H asus2-m -t UDP_STREAM -l 60 -C -c   -- -m 1472 -s 512K -S 512K 

Results:
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     1500   9345.75   7.75      20.32
     TCP_SENDFILE   1500   9285.96   9.15      20.98
     UDP_STREAM_TX  1500   5978.60  12.55      00.00
     UDP_STREAM_RX  1500   5978.60  00.00      24.20

Notes:

  • The Solaris GLDv3 ABI is an unstable ABI, meaning that a Solaris OS upgrade may render the driver inoperable; hence, we do not offer this driver on our download page. We are working with Sun to get this driver integrated into Solaris. The driver was integrated into OpenSolaris build snv_121 and will appear in a future release of OpenSolaris. In the interim, please send mail to help@myri.com to request the driver.
  • CPU utilization (CPU %) is shown as the percentage of all CPUs as reported by netperf with the -c and -C options. I.e. a value of 100 would mean all CPUs are fully utilized.


MacOSX

Benchmark: netperf version 2.4.3
iperf version 2.0.2
OS: MacOSX 10.5
NICs: Myri-10G 10G-PCIE-8A
Driver: Myri10GE version 1.1.0
Interrupt Coalescing: 75 µs
Large Receive Offload (LRO): enabled
Hosts: MacPro with Intel dual-core dual-processor 2.6GHz Xeons
Topology: point-to-point (switchless)

For these MacOSX tests, LRO was enabled as recommended in the Performance Tuning section of the MacOSX Myri10GE README, and the netserver was run without options. The iperf server was run with the same window (-w) and buffer length (-l) arguments as the client. Performance is measured using 9000-byte (jumbo) frames and 1500-byte (standard) frames, and bandwidth (BW) is measured in Megabits/second.

Netperf Results, MTU 9000

Commands:
     $ netperf -H macpro01-m -t TCP_STREAM -C -c -l 60 -- -S 768K -S 768K -m 256K
     $ netperf -H macpro01-m -t UDP_STREAM -l 60 -C -c -- -m 32K -s 512K -S512K
     $ iperf -c macpro01-m -w  -w 768k -l 256k -P 2 -f m -t 60
Results:
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     9000   9661.82  41.38      36.74
     UDP_STREAM_TX  9000   6867.00  28.08      00.00
     UDP_STREAM_RX  9000   6867.00  00.00      39.26

Dual-Stream TCP Results (2 netperf processes):
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     9000   9692.00  54.72      47.36

Dual-Stream TCP Results (2 iperf threads):
      Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     iperf          9000   9825.00  65         58         

Netperf Results, MTU 1500

Commands:
     $ netperf -H macpro01-m -t TCP_STREAM -C -c -l 60 -- -s 768K -S 768K -m 256K
     $ netperf -H macpro01-m -t UDP_STREAM -l 60 -C -c -- -m 32K -s 512K -S512K
     $ iperf -c macpro01-m -w 512k -l 256k -P 2 -f m -t 60

Single-Stream Results:
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     1500   4782.15  41.70      39.15
     UDP_STREAM_TX  1500   3310.40  27.85      00.00
     UDP_STREAM_RX  1500   3310.40  00.00      39.24

Dual-Stream TCP Results (2 netperf processes):
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     1500   4367.00  42.29      43.75
  
Dual-Stream TCP Results (2 iperf threads):
      Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     iperf          1500   6417.00  76         65

Notes:

  • MacOSX does not support TCP Segmentation Offload (TSO).
  • The CPU usage reported for the iperf runs is the sum of the user and system times as reported by iostat. Iperf itself does not report CPU usage.


FreeBSD

Benchmark: netperf version 2.4.5
OS: FreeBSD/amd64 7.2-RELEASE
NICs: Myri-10G 10G-PCIE-8B
Driver: if_mxge
Interrupt Coalescing: 30 µs
TCP Segmentation Offload (TSO): enabled
Large Receive Offload (LRO): enabled
Hosts: Asus RS500-E6-PS4 systems with dual Intel quad-core 2.93GHz
Xeon X5570s (8 2.93GHz Nehalem cores)
Topology: point-to-point (switchless)

For these FreeBSD tests, the kern.ipc.maxsockbuf tunable was increased to 16777216, and the netserver was run without options. Performance is measured using 9000-byte (jumbo) frames and 1500-byte (standard) frames, and bandwidth (BW) is measured in Megabits/second.

Netperf Results, MTU 9000

Commands:
     $ netperf -H asus02-m -t TCP_STREAM -C -c -l 60 
     $ netperf -H asus02-m -t TCP_SENDFILE -l 60 -C -c -F /boot/kernel/kernel 
     $ netperf -H asus02-m -t UDP_STREAM -l 60 -C -c -- -m 8972 -s 128K -S 128K

Results:
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     9000   9887.91   8.22       7.73
     TCP_SENDFILE   9000   9887.31   6.33       7.50
     UDP_STREAM_TX  9000   9926.00  13.85	0.00
     UDP_STREAM_RX  9000   9926.00   0.00	6.77

Netperf Results, MTU 1500

Commands:
     $ netperf -H asus02-m -t TCP_STREAM -C -c -l 60
     $ netperf -H asus02-m -t TCP_SENDFILE -l 60 -C -c -F /boot/kernel/kernel
     $ netperf -H asus02-m -t UDP_STREAM -l 60 -C -c -- -m 16256 -s 128K -S 128K

Results:
     Netperf Test   MTU    BW       TX_CPU %   RX_CPU %
     ------------   ----   -------  --------   --------
     TCP_STREAM     1500   9361.92   8.26      10.07
     TCP_SENDFILE   1500   9390.04   5.90      10.21
     UDP_STREAM_TX  1500   9243.90  14.18       0.00
     UDP_STREAM_RX  1500   9243.90   0.00      14.69

Notes:

  • CPU utilization (CPU %) is shown as the percentage of all CPUs as reported by netperf with the -c and -C options. I.e. a value of 100 would mean all CPUs are fully utilized.