Tuesday, March 13, 2012

Blank NFS Datastore under ESXi - Check your MTU

If you've used ESX/ESXi over NFS long enough, you have probably encountered a situation where you have a connected NFS datastore, but when you browse the datastore, it's blank, or only a few files are showing.

There are many reasons for this situation, and there is plenty of vmWare KB's about troubleshooting such a problem.

Turning on NFS logging is a good idea.

This is the standard vmWare Troubleshooting document.

One thing that you may not be thinking of is a MTU mismatch.

I had this exact problem tonight, and it unfortunately took a long time to figure out because I was suspecting FreeBSD as being the culprit due to some struggles I've had with it as of late.

I came across this by issuing this command;

vmkping -s 8500 san0 -s 8500

This tells the ping command to send a much larger payload than usual. My 10Gbe network is setup with a MTU of 9000, something I verified many times over, so I really didn't think this was the issue. I only entered it out of completeness after realizing this was a serious problem that wasn't going to be solved quickly, and I needed to start a proper documentation trail of what I was doing if I hoped to solve the issue.

I won't bore you with the details of my troubleshooting, and get to the point.

Turns out my new Dell M8024-k blade 10Gbe switch requires you to set the MTU on the LAG channel group separately from the ports. That's not uncommon, but as far as I can tell, this isn't in the M8024-k GUI anywhere, it can only be done via the CLI.

I guess I've been spoiled by LAG's that set their MTU based on the members MTU, and when I didn't see a separate LAG MTU setting in the GUI anywhere, I assumed that everything was working fine - After all, pings were working, and that didn't work in the past when I had the wrong MTU applied on different adapters (although that could also be an oddity with the FreeBSD lagg driver).

By default the Dell M8024-k makes LAG groups with a MTU of 1500. 

What's odd is that some connectivity is maintained with this mismatch. You can browse and list some directories, and even see some information - I believe the point where it really breaks is when it needs to transfer more than the 1500 limit allows.

Further oddness: FreeBSD didn't have problems mounting and browsing the directories over NFS that ESXi couldn't browse or list properly. This could be to FreeBSD connecting via UDP instead of TCP, at this point I'm not sure.

I thought I'd pass this along in case anyone else is forgetting about each point in your data transmission chain that a MTU mismatch could be affecting.

Thursday, March 8, 2012

FreeBSD SCSI Sense Errors - Did You Check Your Cables?

I've been suffering from a lot of SCSI Sense errors under FreeBSD 9.0-STABLE with a recent ZFS based SAN build that have been driving me mad.

They are mostly present when ZFS does a wide scan of the available drives - via a 'zpool import' or 'zpool scan' or other operation. 

My drives are Seagate ST1000DM0003 SATA3 1TB drives, contained in a SuperMicro SC847 Chassis (LSI SAS2x36 SAS2 Expander so the whole thing can run 6gbps).

These errors also show after a few hours of running a custom script to saturate writes to my ZFS pool. SCSI Sense errors build until we end up in a flurry of errors on the console, and a hang of the SAS expanders, sometimes taking ZFS with it. 

Mar  7 13:50:06 Test kernel: (da33:mps1:0:27:0): CAM status: SCSI Status Error
Mar  7 13:50:06 Test kernel: (da33:mps1:0:27:0): SCSI status: Check Condition
Mar  7 13:50:06 Test kernel: (da33:mps1:0:27:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Mar  7 13:50:11 Test kernel: (da1:mps1:0:11:0): WRITE(10). CDB: 2a 0 0 6e 45 bf 0 1 0 0 length 131072 SMID 303 terminated ioc 804b s

Doing a bit of research, I find that I'm not the only one with SCSI Sense error problems, and naturally I started suspecting the recent mps driver committed from LSI, or some sort of firmware interaction issue with it (Check my recent post on the ixbge driver issue with LACP).

I've used the exact same hardware in the past for FreeBSD 9.0-BETA builds without this trouble, so I was sure it was software or expander related.

My HBA's are mostly Supermicro AOC-USAS2-L8i internal SAS2 cards for my ZFS builds. It's a good value card, and gives me what I want at a decent price.

The firmware on these cards was v7.21 and I knew Supermicro had newer BIOS and firmware, so I flashed them up to BIOS v 11 and FW v7.23. The documentation with the flash files shows a lot of advancement in the FW and BIOS that corrects errors, and I thought it would be a good try - Maybe the new mps driver made better use of the card, and needed newer firmware.

However, after the upgrade, my problems continued. System did seem to boot a little faster however.

I then applied the LSI direct FW (v7.23) and BIOS (v12) that you can find on LSI support site for the 9211-8i card (same thing as SuperMicro's). The flashing process was the same (although the SuperMicro download was a bit more automated with it's batch file).

Same problems.

I then started testing one item at a time - ending up with replacing my brand new Amphenol SAS2 external Mini-SAS cables for some older ones I had in use on a different server.

And it fixed it!

Damn! I then confirmed by swapping between the new and old cables that it was the two new cables that I had purchased that were causing all of this. Very frustrating, as it's a high quality, expensive Amphenol cable, not a no-name thin OEM thingie.

What was more puzzling was that a nearly identical hardware array was using the same cables, (from the same order as the bad cables), and it wasn't having issues.

However, that in-use array is using SATA2 1.5 Gbps drives, not 6Gbps SATA3.  When I checked the logs carefully, I saw we were running into a few SCSI Sense errors, but nothing was really wrong with the servers, and ZFS was generally happy.

I'm fairly confident that if I put that array under the strain of my saturation script, it would throw errors as well, possibly not as fast as the 6Gbps array does.

The Amphenol supplier has been most kind, and is offering to RMA the 4 cables for me. I'll get a new batch of 4, and will check them out carefully under heavy load before I deploy them.

It makes me wonder what other cables I have that will fail under heavy load. Data speeds are quickly increasing, and I'm sure it puts a strain on the cable manufacturers to keep up with the latest spec'ed cable, connectors, and techniques to ensure that the cables they make are indeed reliable at any speed, not to mention having to buy new cable testing gear. 

Wednesday, March 7, 2012

FreeBSD 9.0 and Intel X520's - ixgbe 2.3.11 won't allow LACP to function, but 2.4.4 does

Working with the Intel x520 adapters can be a little bit challenging if you're not used to them. I'll expand on what I've learned over the last year or so in a separate post.

For now, I hope to save someone a weekend of wasted time - If you're attempting to use a FreeBSD 9.0 system with an Intel x520 (ix/ixgbe driver), and you want to setup a LACP link, it won't work.

Here's what you'll see:

lagg1: flags=8843 metric 0 mtu 9000
        ether d4:ae:52:5c:5d:af
        inet netmask 0xffffff00 broadcast
        inet6 fe80::d6ae:52ff:fe5c:5daf%lagg1 prefixlen 64 scopeid 0xe
        nd6 options=29
        media: Ethernet autoselect
        status: active
        laggproto lacp
        laggport: ix1 flags=18 COLLECTING,DISTRIBUTING
        laggport: ix0 flags=18 COLLECTING,DISTRIBUTING

You'll notice it's just COLLECTING, DISTRIBUTING, where a properly functioning link should be ACTIVE, COLLECTING, DISTRIBUTING.

Check your driver version like so:

#sysctl dev.ix

dev.ix.0.%desc: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.3.11
dev.ix.0.%driver: ix
dev.ix.0.%location: slot=0 function=0
dev.ix.0.%pnpinfo: vendor=0x8086 device=0x10f8 subvendor=0x8086 subdevice=0x000c class=0x020000

Older drivers seem to work on slightly older FreeBSD 9-Beta systems (like ixgbe 2.3.10).

The eventual fix was to download the latest FreeBSD driver from Intel's website (v2.4.4) compile it as a module, and then add it to my /boot/loader.conf file. Note, you can rename the produced ixgbe to to ixgbe2.4.4.ko before you copy it to /boot/kernel/, allowing you more control over which version of ixgbe you load. My kernel had ixgbe compiled in, but it still accepts the new 2.4.4 loaded via loader.conf:


If you try loading a .ko over your existing ixgbe built into the kernel, make sure you check your driver version with a "sysctl dev.ix" command - I found that an older FreeBSD-9.0-RELEASE wouldn't load my new 2.4.4, but systems closer to the build time of my test FreeBSD-9-STABLE machine would.

This brings up questions of if we really want a lot of device drivers in the kernel, as it can make troubleshooting harder, but that's another post as well. 

You'll have to do a quick edit of the code to get it to compile,as it's complaining about the bool typedef.  Basically you need to remove the reference, and change the 4-6 instances of boolean_t to bool (you'll see the errors).

If that's too complex, hopefully Jack will have 2.4.4 ported to FreeBSD-9-STABLE shortly, I made a post to the stable list, and he's aware of it.