Monday, June 27, 2011

Speeding up FreeBSD's NFS on ZFS for ESX clients

My life revolves around four 3-letter acronyms: BSD, ESX, ZFS, and NFS.

However, these four do not get along well, at least not on FreeBSD.

The problem is between ESX's NFSv3 client and ZFS's ZIL.

You can read a bit about this from one of the ZFS programmers here - Although I don't agree that it's as much of a non-issue as this writer found.

ESX uses a NFSv3 client, and when it connects to the server, it always asks for a sync connection. It doesn't matter what you set your server to, it will be forced by the O_SYNC command from ESX to sync all writes.

By itself, this isn't a bad thing, but when you add ZFS to the equation,we now have an unnecessary NFS sync due to ZFS's ZIL. It's best to leave ZFS alone, and let it write to disk when it's ready, instead of instructing it to flush the ZIL all the time. Once ZFS has it, you can forget about it (assuming you haven't turned off the ZIL).

Even if your ZIL is on hardware RAM-drives, you're going to notice a slow-down. The effect is magnified on a HD based ZIL (which is what you have if you don't have a separate log device on SSD/RAM). For my tests, I was using a hardware RAM device for my ZIL.

Some ZFS instances can disable the ZIL. We can't in FreeBSD if you're running ZFS v28.

Here's two quick iometer tests to show the difference between a standard FreeBSD NFS server, and my modified FreeBSD NFS server.

Test Steup: Running iometer 1.1 devel,  on a Windows 7 SP1 machine, connected to the test drives via NFS. iometer has 128 writers, full random, 4k size, 50% write 50% read, 100% sequential access, 8GB file, 15 run-time. Reboot after each test, ran each test twice to make sure we were receiving a sane result. Using FreeBSD 9-CURRENT as of 2011.05.28.15.00.00

Standard NFS
Test1    1086 IOPS    4.45 MBs    117 AvgIO (ms)
Test2    1020 IOPS    4.18 MBs    125 AvgIO (ms)

Modified NFS
Test 3   2309 IOPS    9.45 MBs    55 AvgIO (ms)
Test 4   2243 IOPS    9.19 MBs    57 AvgIO (ms)

I feel the results speak for themselves, but in case they don't - We're looking at an increase in IOPS, MB/per sec, and a decrease in the time to access the information when we use the modified NFS server code. For this particular test, we're looking at nearly a doubling in performance. Other tests are close to a 10% increase in speed, but that's still a wanted increase.

These test results will be apparent if you're using the old NFS server (v2 and v3 only) or the new NFS server (v2-3-4) that is now the default in FreeBSD 9 as of a month ago.

I've used this hack for over 6 months now on my SANs without any issue or corruption, on both 8.1 and various 9-Current builds, so I believe it's fairly safe to use.

I'm too lazy to make a proper patch, but manually editing the source is very easy:

- The file is /usr/src/sys/fs/nfsserver/nfs_nfsdport.c
- Go to line 704, you'll see code like this;
(Edit: Now line 727 in FreeBSD-9.0-RC3)

if (stable == NFSWRITE_UNSTABLE)
  ioflags = IO_NODELOCKED;
else
  ioflags = (IO_SYNC IO_NODELOCKED);
uiop->uio_resid = retlen;
uiop->uio_rw = UIO_WRITE;

- Change the code to look like this below. We're commenting out the logic that decides to allow this to be an IO_SYNC write.

// if (stable == NFSWRITE_UNSTABLE)
ioflags = IO_NODELOCKED;
// else
// ioflags = (IO_SYNC | IO_NODELOCKED);
uiop->uio_resid = retlen;
uiop->uio_rw = UIO_WRITE;

- Recompile your kernel, install it, and reboot - You're now free from NFS O_SYNC's under ESX.


If you are running the older NFS server (which is the default for 8.2 or older), the file to modify is /usr/src/sys/nfsserver/nfs_serv.c  - Go to line 1162 and comment out these lines as shown in this example:

// if (stable == NFSV3WRITE_UNSTABLE)
ioflags = IO_NODELOCKED;
// else if (stable == NFSV3WRITE_DATASYNC)
// ioflags = (IO_SYNC IO_NODELOCKED);
// else
// ioflags = (IO_METASYNC | IO_SYNC | IO_NODELOCKED);

If you try this, and let me know what type of before and after speed results you're receiving.

Wednesday, June 22, 2011

FreeBSD 9.0 and EKOPath (Path64) Compiler

You may be aware that Pathscale announced they are making their EKOPath compiler suite open-source. This is great news, even if it's a GPLv3 license. I''d be happier with a BSD license so we could bundle it easier with FreeBSD, but regardless, having free access to this excellent compiler is just what we need to increase FreeBSD's performance, and start catching up with other OS'es for performance.

Phoronix released info about the story here

Compilers do make a difference in the speed of our OS, don't think for a second that you're getting full performance in FreeBSD from the old gcc 4.2.2 on modern hardware.

I've recently done some tests with clang/llvm vs gcc with various optimization switches, and there is a definite increase from the default generic builds of FreeBSD for my standard environment of NFS-ZFS-ESX. I'll post about that shortly when I've finished my tests.

For now, Martin Matuska has some info about compiling FreeBSD with a newer version of gcc, and also links to some statistically valid data showing the speed increase from doing this. You can start here, and find his performance data link at the bottom.  

clang/llvm is a great step forward, I wish it would beat gcc for compiled binary run speed, but it can't. In my tests, the only time clang/llvm was faster than gcc was when I was doing compression based tests, and I'm assuming this is because the newer clang/llvm can take advantage of more modern processor extensions than the older gcc 4.2.2

FreeBSD is committed to clang/llvm, as it's a BSD license, and we need that for making the entire FreeBSD distribution as GPL free.  It will get better, but FreeBSD isn't the focus of this project.

Anyway, I digress: This is about Path64. Start here:

Sources to download Path64 compiler:
You'll need to include two libraries for this to work;

pkg_add -r libdwarf
pkg_add -r cmake
rehash

Follow the instructions in the readme, or check Marcello's page for more info.

I followed his instructions, the only difference is that I'm on FreeBSD-9-CURRENT, a build from 2011.05.28.15.00.00 , and I used this cmake command:

set MYLIBPATH=/usr/lib

cmake ~/work/path64 \
-DPATH64_ENABLE_TARGETS=x86_64 \
-DPATH64_ENABLE_MATHLIBS=ON \
-DPATH64_ENABLE_HUGEPAGES=OFF \
-DPATH64_ENABLE_FORTRAN=OFF \
-DPSC_CRT_PATH_x86_64=$MYLIBPATH \
-DPSC_DYNAMIC_LINKER_x86_64=/libexec/ld-elf.so.1 \
-DPSC_LIBSUPCPP_PATH_x86_64=$MYLIBPATH \
-DPSC_LIBSTDCPP_PATH_x86_64=$MYLIBPATH \
-DPSC_LIBGCC_PATH_x86_64=$MYLIBPATH \
-DPSC_LIBGCC_EH_PATH_x86_64=$MYLIBPATH \
-DPSC_LIBGCC_S_PATH_x86_64=$MYLIBPATH \
-DPATH64_ENABLE_HUGEPAGES=OFF \
-DCMAKE_BUILD_TYPE=Debug

There is discussion about this on the FreeBSD mailing lists here.

That's as far as I've made it - The tests with the compiler start now. I dream of a buildworld with this, but I know that's not going to be easy. For now, I'm going to start with some "Hello World" type programs.

:-)

Sunday, June 19, 2011

Windows 7 SP1 bring increase to Realtek RTL8102/8103 networking performance

One of the items on my To-Do list has been bringing the lab's file transfer speed up to snuff. Samba has never been a very willing component of a fast network, seemingly requiring different tweaking for every different system to get it's speed to be close to Window's smb performance. Adding ZFS to the mixture adds difficulty, as anyone who's tried to benchmark ZFS knows.

I did find out a combination that makes Samba fly in my environment, I'll share that shortly in a separate post.

Whilst performing iometer tests across the network, I noticed one of our ASUS-based systems was slower than the others, giving 80 MBs transfers compared to the 100-114 MBs transfers that I was receiving from identical machines. These boards use a Realtek 8102/8103 PCIe on-board network card.

That's when I noticed this machine hadn't applied Windows 7 SP1 yet.

After installation, I'm now receiving the same speed as the rest of the systems in the lab. I'm not sure if it was an updated driver, changes to the network subsystem, or something else - Unfortunately I don't have time to investigate.

However, if you're not receiving full network speed from your Win 7 machine, and you haven't applied SP1 yet, try that first.

Here's my quick test results:

Using iometer 1.1-devl, 128 writers, and the 4k sequential read test. 5 min ramp-up time, and the test runs for 15 minutes. I used a small 4 Meg test-file to remove ZFS from the equation (this small file is quickly cached in RAM, removing the server's disk speed from the test)

Before SP1
19617 IOPS  80 MBs 6.5 ms Avg IO

After SP1
25177 IOPS 103 MBs 5.1 ms Avg IO

(I'm receiving nearly the same speeds for write as well)

That's quite the difference. With some further tweaking, I'm getting my Win 7 SP1 machines to saturate the network link to 99% utilization using a FreeBSD9/Samba35/ZFS28 SAN.

Since we move a lot of data across this network each day (system images, data recovery, etc) it makes for happier technicians.