Tuesday, May 8, 2012

ZFS raidz - Make Sure To Select the Correct Number of Disks

I've read that selecting the correct number of drives when building a raidz ZFS array is important if you are trying to maximize performance.


Of course the best way to maximize performance is to use a straight stripe or a mirror instead of a raidz, but that isn't always an option when you have price, power, or physical space restrictions.

Since I'm perpetually curious, I set up tests to compare the write speed of a 3, 4, 5, 6, and 7 drive raidz array.

I used my saturate.c program to put the arrays under heavy write load, repeated 6 times, and took avg and std deviation.

My results are not clean enough to post, but they would suggest that the number of drives is important - Follow the rules below;

RAIDZ1 vdevs should have 3, 5, or 9 devices in each vdev
RAIDZ2 vdevs should have 4, 6, or 10 devices in each vdev
RAIDZ3 vdevs should have 5, 7, or 11 devices in each vdev

With some luck, I hope to have the time to revisit the tests and obtain results that are postable.

Friday, May 4, 2012

Supermicro USB Boot Issues, H8SGL

Thought I'd pass on a small tidbit;

A SuperMicro H8SGL(-F) won't boot from USB properly when there are more than 12 bootable items in the server.

In my case, filling my enclosure with drives surpassed this limit.  Quite the surprise, as it was working great with 8 drives in, but at 16 drives, a routine maintenance reboot left it unable to start again.

The symptom is : It will just act like there isn't a boot sector on the USB drive.

I'm unsure at this stage if it's just USB, or if it's any bootable device.

I'm guessing that the MB's BIOS keeps a very short table of bootable items, and as it discovers more, the first to be discovered scroll off and are no longer available, even when they still list on a F11 Boot Selection.

I fixed it by turning off the BIOS boot option inside my SAS adapter's BIOS, and everything was fine.

It goes to show that you _ALWAYS_ need to reboot a server after any hardware/software changes, just to make sure. If I had learned about this problem when we were in some sort of urgent situation where I needed to quickly power cycle this server, well, that would have been a bad thing.


Thursday, May 3, 2012

ZFS Testing - Saturate.c

Benchmarking ZFS is hard.

ZFS is such a complex FS with multiple levels of cache that getting a good solid reading on it's performance is always difficult.

I was recently tasked with building a FreeBSD ZFS SAN for a client that consisted of 90 1TB Seagate ST1000DM003 6GBps HD's in 2 SuperMicro SC847 Enclosures. The heads were 2 Dell PowerEdge T710's, 96 Gig DDR-3, Dual Xeon 5620 CPU's. I connected the head to the enclosure with a LSI2008 based SAS card.

As always, budget was tight, but performance HAD to be there. I needed to find out for sure what the write speed of this array was going to be under ZFS.

The throttling-effect of using a single SAS card was one of the items I needed to check .

Since the SuperMicro SC847's use a LSI2x36 backplane, I don't have a full SAS channel available to all drives at all times.  I wanted to start collecting data on what effect this would have on the performance of the arrays, and did we need to use 2 SAS cards to achieve a higher performance? This wasn't a "performance at all costs" scenario, rather a real-world situation with real-world budgets and needs.

This client had a lot of users, and a lot of big databases that are very active during the day. I needed to know that the array would quickly write down as much data as possible to satisfy their needs.

Knowing that the average-write of the array wouldn't saturate the single SAS card or the drives, but a heavy-write could, I needed to put the array into heavy-writes for a sustained period of time to compare various hardware and software configurations.

Thus began my search for a benchmark program that could really load down this gear.
To simplify things, I decided that I would only concern myself with a write saturation event - When the data is flowing to the drives so quickly that they never catch up - As close to 100% utilization as possible.

 I didn't have much luck with the standard FreeBSD benchmarks.. bonnie, bonnie++, iozone, etc. Either they were too hard to lock into a saturate write, or they didn't spawn enough writers to really load down the system.

In the end I ended up coding a quick and dirty program called Saturate.c that I'm passing on here. It's not very pretty, but it works.

It's hard-coded to spawn 4 files to write to for each forked child process. Tweak it as you need to for your system to really drive the files.

You can simply execute the file using time to check how long it takes like this;

time ./saturate

I created a simple script file that then executed the saturate program in different zpool configurations so I could confirm what was working best for us.  I've run through various raidz configurations, compression, etc. I'm still sifting through all of the data.

Warning: This will create a very large set of test files - I believe in it's current state it writes 40,000 MegaBytes (~40 Gigs).

Oh, with FreeBSD 9.0, my best average time was 1.2 Minutes.  Not bad for a Free Operating System.

I'll post some of my results of 2 months of various ZFS tests over the next while as I have time.

Code on PasteBin: http://pastebin.com/4SexdvLq