Tuesday, December 18, 2012

Passthrough PCIe devices from ESXi to FreeBSD - Leave MSI Enabled to Avoid Interrupt Storms With More than One vCPU

If you want to pass a PCIe device (such as a LSI SAS controller) through ESXi to a FreeBSD VM, you're probably aware that you may need to disable MSI/MSI-X interrupts to make it work.

However, I'm finding that while this setting will make the system boot, if you add a second vCPU, you will quickly stat to see Interrupt Storm messages on your console, and one of your CPU's will be nearly 100% used on interrupt tasks.

The easy fix for me? Only disable MSIX, not MSI.

Here are the relevant lines from my /boot/loader.conf:


All of my interrupt storm issues are now gone. 

Wednesday, December 12, 2012

VMWare Enhanced vMotion Compatibility (EVC) Not Needed between AMD Bulldozer 6234 and AMD Piledriver 6348 on VMware ESXi 5.1

I recently upgraded one of our datacenter servers from an AMD Bulldozer 6234 to a AMD Piledriver 6348.

I did find the new chip was faster for memory/CPU operations in our VM's, but I'll post about that in a different article.

What surprised me was that I didn't need to turn on EVC to allow VM's to migrate between hosts running the different CPU's.

Either VMware 5.1 doesn't recognize the new instruction set, or they don't matter in terms of migration.

Either way, I can migrate to the newer chips without worry, and since the 63xx are socket compatible with the 62xx and 61xx series, I have a fairly easy upgrade path for our servers if I wish.

Thursday, November 22, 2012

ESXi 5.1 running FreeBSD 9.0 / 9.1 and VMware Tools

Note - I have another article for ESXi 5.0. These instructions won't work for a 5.0 box.

ESXi 5.1 includes support for the official VMware Tools on FreeBSD 9.0 (yay!).

I'm somewhere between 9.0 and 9.1 RC3, but I feel these instructions will work for either. 

I'll be testing the throughput of the vmxnet3 driver under FreeBSD shorty. For now, I can just confirm that bothe vmxnet2 and vmxnet3 drivers show in FreeBSD, and can ping.

Here's a quick script to install it for you.

You will need to start a VMware Tools install from the guest before running this.

echo "Make sure you have started the vmware tools install"
mount -t cd9660 /dev/cd0 /media
cp /media/vmware-freebsd-tools.tar.gz ~
cd ~
tar xvf vmware-freebsd-tools.tar.gz
cd vmware-tools-distrib
echo # DONE. Remember that the vmxnet3 driver is called vmxnet3f0

ESXi 5.0 running FreeBSD 9.0 / 9.1 and VMware Tools

Note - I have another article for ESXi 5.1. If you are running 5.1, these instructions won't work for you.

Installing the proper VMware Tools for FreeBSD 9.0 / 9.1 can be a pain.

The hard work to figure this out has already been done by others (http://ogris.de/vmware/), so here is a quick script that will install the official VMware Tools using Dru's patch.

Make sure to start a vmtools install from your vsphere console so ESXi will make the CD available to you.

If you run into problems with a missing library when you try and execute /usr/local/bin/vmare-toolbox-cmd, try looking here (http://lists.freebsd.org/pipermail/freebsd-questions/2010-June/217718.html) it solved the problem for me. 

(Cut-n-Paste into a terminal window with root access)

echo "Make sure you have started the vmware tools install"
mount -t cd9660 /dev/cd0 /media
cp /media/vmware-freebsd-tools.tar.gz ~
cd ~
tar xvf vmware-freebsd-tools.tar.gz
cd vmware-tools-distrib
cd lib/modules/source
tar xvf vmblock.tar
tar xvf vmmemctl.tar
tar xvf vmxnet.tar
tar xvf vmxnet3.tar

fetch http://ogris.de/vmware/vmxnet.diff
fetch http://ogris.de/vmware/vmxnet3.diff

echo #
echo # If it's thinking you have a previously applied patch (-R) say yes.
echo #

patch -p1 < vmxnet.diff
patch -p0 < vmxnet3.diff

cd vmblock-only
make && make install
cd ..

cd vmmemctl-only
make && make install
cd ..

cd vmxnet-only
make && make install
cd ..

cd vmxnet3-only
make && make install
cd ..

cd ~/vmware-tools-distrib


echo # DONE. Remember that the vmxnet3 driver is called vmxnet3f0

Sunday, November 4, 2012

Blogger Code Syntax Highlighting

I really like Wordpress.org's Syntax Highlighter.


Particularly the button to copy the raw code without the line numbers.

Unfortunately, this is not easy to implement on Blogger.

The closest I have managed is from this blog:


Which does work, but we're missing the important buttons. Here's a demo of it in action on my saturate.c code - Note Blogger still completely mangles my code after a few lines, and we're missing the important buttons that Wordpress has.

Time to switch to Wordpress, or does anyone have any bright ideas?

(Update - Pastebin seems to work well for me, but it's an external link, not quite the same thing.  http://pastebin.com/4SexdvLq )


#define _FILE_OFFSET_BITS 64  /* enable large file support  */

float nCM=0; /* count of 100's of megs we've written */

void finish(const char *where) {
        printf("%s: wrote %.1f gig file\n",where,0.1*nCM);

void hitError(const char *where) { perror(where); finish(where); }

void writer(int id)
        char c=0; /* The byte we're writing */
        int cm=100*1024*1024;
        FILE *f[50];

        int     r;
        int     block[262114];
        int     i;
        char    fName[40];
        int     numFiles=4;

        printf("This is the %i writer, going to create %i files.\nOpened : ", id, numFiles);

        for(i=0; i
(end Code) 

Tuesday, June 26, 2012

Fixing "Open File - Security Warning" prompts in Windows 7 / 2008

If you enable Folder Redirection from a GPO, or even if you are trying to run a program from a non-local source, you'll run into this dialogue box:

For me, it was particularly annoying as it would happen when accessing any item from the start menu on our remote desktop server, as we were redirecting the AppData folder as well.

The fix is through GPO. I assume you know how to make a new GPO, and so to save time, here is the location.

User Configuration - Administative Templates - Windows Components - Internet Explorer - Internet Control Panel - Security Page  ... then it's Site to Zone Assignment List

I set both values for my file server here. You could also use a wildcard like *.yourdomain.com if you liked.

2 is the value for Trusted Sites. Check the help in the GPO Management box for other options.

The only downside to this is that now the user can't add their own trusted domains, as once it's specified in GPO, it can't be altered by the user.

Wednesday, June 6, 2012

Running Windows 2012 Server RC on ESXi 5.0 U1

If you're wanting to play with the upcomming Windows 2012 RC trial under ESXi 5.0 U1, you may find that it hangs if you're running your Virtual Machine as "Windows 8 (64 Bit)".

You need to set your Virtual Machine to "Windows 2008 R2 (64 bit) not "Windows 8 (64 Bit)". Once you make this change, it boots, and looks to run smoothly.

Vmware tools looks to install correctly, I can power up and down, and I can use the VMXNET3 network adapter in Windows. I also have the PVSCSI adapter enabled, and it looks to be running well.

I'm curious to test out it's built in iSCSI Target / clone abilities for running diskless (iSCSI Boot) Windows 7 Pro workstations.

Oh, and if you want to be able to use 2012 / Windows 8, check this site for where to find simpe things like shutdown and control panel.  I'm not sure it's wise to change such a staple of Windows useage, but it does make for a cleaner interface.

So far, I like the new interface.. we'll see how useful it is after extended use.

Tuesday, May 8, 2012

ZFS raidz - Make Sure To Select the Correct Number of Disks

I've read that selecting the correct number of drives when building a raidz ZFS array is important if you are trying to maximize performance.


Of course the best way to maximize performance is to use a straight stripe or a mirror instead of a raidz, but that isn't always an option when you have price, power, or physical space restrictions.

Since I'm perpetually curious, I set up tests to compare the write speed of a 3, 4, 5, 6, and 7 drive raidz array.

I used my saturate.c program to put the arrays under heavy write load, repeated 6 times, and took avg and std deviation.

My results are not clean enough to post, but they would suggest that the number of drives is important - Follow the rules below;

RAIDZ1 vdevs should have 3, 5, or 9 devices in each vdev
RAIDZ2 vdevs should have 4, 6, or 10 devices in each vdev
RAIDZ3 vdevs should have 5, 7, or 11 devices in each vdev

With some luck, I hope to have the time to revisit the tests and obtain results that are postable.

Friday, May 4, 2012

Supermicro USB Boot Issues, H8SGL

Thought I'd pass on a small tidbit;

A SuperMicro H8SGL(-F) won't boot from USB properly when there are more than 12 bootable items in the server.

In my case, filling my enclosure with drives surpassed this limit.  Quite the surprise, as it was working great with 8 drives in, but at 16 drives, a routine maintenance reboot left it unable to start again.

The symptom is : It will just act like there isn't a boot sector on the USB drive.

I'm unsure at this stage if it's just USB, or if it's any bootable device.

I'm guessing that the MB's BIOS keeps a very short table of bootable items, and as it discovers more, the first to be discovered scroll off and are no longer available, even when they still list on a F11 Boot Selection.

I fixed it by turning off the BIOS boot option inside my SAS adapter's BIOS, and everything was fine.

It goes to show that you _ALWAYS_ need to reboot a server after any hardware/software changes, just to make sure. If I had learned about this problem when we were in some sort of urgent situation where I needed to quickly power cycle this server, well, that would have been a bad thing.


Thursday, May 3, 2012

ZFS Testing - Saturate.c

Benchmarking ZFS is hard.

ZFS is such a complex FS with multiple levels of cache that getting a good solid reading on it's performance is always difficult.

I was recently tasked with building a FreeBSD ZFS SAN for a client that consisted of 90 1TB Seagate ST1000DM003 6GBps HD's in 2 SuperMicro SC847 Enclosures. The heads were 2 Dell PowerEdge T710's, 96 Gig DDR-3, Dual Xeon 5620 CPU's. I connected the head to the enclosure with a LSI2008 based SAS card.

As always, budget was tight, but performance HAD to be there. I needed to find out for sure what the write speed of this array was going to be under ZFS.

The throttling-effect of using a single SAS card was one of the items I needed to check .

Since the SuperMicro SC847's use a LSI2x36 backplane, I don't have a full SAS channel available to all drives at all times.  I wanted to start collecting data on what effect this would have on the performance of the arrays, and did we need to use 2 SAS cards to achieve a higher performance? This wasn't a "performance at all costs" scenario, rather a real-world situation with real-world budgets and needs.

This client had a lot of users, and a lot of big databases that are very active during the day. I needed to know that the array would quickly write down as much data as possible to satisfy their needs.

Knowing that the average-write of the array wouldn't saturate the single SAS card or the drives, but a heavy-write could, I needed to put the array into heavy-writes for a sustained period of time to compare various hardware and software configurations.

Thus began my search for a benchmark program that could really load down this gear.
To simplify things, I decided that I would only concern myself with a write saturation event - When the data is flowing to the drives so quickly that they never catch up - As close to 100% utilization as possible.

 I didn't have much luck with the standard FreeBSD benchmarks.. bonnie, bonnie++, iozone, etc. Either they were too hard to lock into a saturate write, or they didn't spawn enough writers to really load down the system.

In the end I ended up coding a quick and dirty program called Saturate.c that I'm passing on here. It's not very pretty, but it works.

It's hard-coded to spawn 4 files to write to for each forked child process. Tweak it as you need to for your system to really drive the files.

You can simply execute the file using time to check how long it takes like this;

time ./saturate

I created a simple script file that then executed the saturate program in different zpool configurations so I could confirm what was working best for us.  I've run through various raidz configurations, compression, etc. I'm still sifting through all of the data.

Warning: This will create a very large set of test files - I believe in it's current state it writes 40,000 MegaBytes (~40 Gigs).

Oh, with FreeBSD 9.0, my best average time was 1.2 Minutes.  Not bad for a Free Operating System.

I'll post some of my results of 2 months of various ZFS tests over the next while as I have time.

Code on PasteBin: http://pastebin.com/4SexdvLq

Thursday, April 19, 2012

PowerShell Script to Set Public Folder Replication Schedule Recursively in Exchange 2007

I've recently been tasked to replicate a lot of Exchange 2007 Public Folder data to another server.  I'll blog the steps here shortly, but first, I had to write a quick power-shell script to do it, as there were over 100 public folders in the root public folder, and they were all set to not replicate.

I had already used the MS script AddReplicaToPFrecursive.ps1, but it doesn't set the replication schedule if it wasn't set before.

This script will walk through all of your public folders and set the replication  schedule to "always"

It logs to C:\set_pf_log.txt and you can see two work files it also makes in C:\ - modify as you wish.

It's my first PowerShell script, so be gentle. :-) I don't know how to handle wrapping long folder names from the Get-PublicFolders command, so it won't process a long PF name.

# Set Public Folder Replication Schedules Recursively
# NOTE: If you have very long public folder names, they may wrap, and will not be handled correctly here.
# Define Varables

$filename_log = "C:\set_pf_log.txt"
$filename_pf_long = "C:\set_pf_long.txt"
$filename_pf_short = "C:\set_pf_short.txt"

# First, get a list of all the public folders and write them to $filename_pf_ong
write-host "Fetching Public Folders and writing to $filename_pf_long"
write-host "Fetching Public Folders and writing to $filename_pf_long" | out-file $filename_log -append

get-publicfolder "\" -recurse | fl identity > $filename_pf_long

# Now we need to trim the file, as it also outputs 'Identity : ' which we don't want

write-host "Trimming $filename_pf_long into $filename_pf_short"
write-host "Trimming $filename_pf_long into $filename_pf_short" | out-file $filename_log -append

get-content $filename_pf_long | foreach-object {$_ -replace 'Identity : ',''} > $filename_pf_short

# Now loop through the file, executing our action for each public folder
# NOTE - The action is commented out, using write-host to output to the console instead. If you want to make this actually run, remove the # in front of set-publicfolder

foreach($line in get-content $filename_pf_short)
    if ($line -ne "")   
        write-output "Processing $line"
        write-output "Processing $line" | out-file $filename_log -append
#         set-publicfolder $line -ReplicationSchedule Always

write-host "Complete!"

Wednesday, April 18, 2012

A Call to Revive the Uniform Driver Interface (UDI)

(UDI’s Home - http://www.projectudi.org/ )

Over the years, I have wished for some sort of universal driver, so that we didn’t need one driver for FreeBSD, one for Linux, one for Windows Xp, one for Windows 7, etc.  

I started searching the internet to see if anything was in development on such a framework, and was surprised to see that it did indeed exist, and was already pretty much dead before I even knew it existed.

The Uniform Driver Interface (UDI) was, created by a cooperation of some big industry players (Intel, SCO, Adaptec, Sun, IBM, DEC, Compaq/HP, etc).
However, I notice Microsoft wasn’t part of it, which is one of the reasons it is currently languishing. 

It’s specification was released in 2001 under a BSD-style license, so it’s very Open-Source. 

During my 'autopsy-research', I found some muttering about it being as complex and bloated as CORBA, but I can’t find any proof of that.  

I think the real damage to it was being blasted by Richard Stallman http://www.linuxtoday.com/developer/1998100500205OP

I understand where Richard Stallman’s stance on UDI is coming from, but I do not agree. I’ll get into why shortly.

First, let me state this;

The _BEST_ think for Free Software would be UDI, because the largest thing holding back Free Software Operating Systems is drivers.


 - Try running quality accelerated X-Windows on FreeBSD, and a recent ATI card.
 - Try booting/using OpenIndiana on a cheap desktop quality board with a generic Marvell or other SATA controller and Network Card?
-  How’s your Intel GMA500 video working under Ubuntu?

This list is endless. Linux is in the best position for hardware drivers as it has the larger installed base, and so hardware vendors do have decent support. Solaris/(x)BSD’s seem to be next, and it quick drops off the face of a cliff after that. 

If you wanted to revive the interesting Plan-9 OS, or if you wanted to branch from an existing Free OS and make some large changes to the kernel/scheduler, or you wanted to try and design your own OS, you’re now going to need to write or port drivers.

This is a huge amount of time wasted on trivial matters that shouldn’t be. I believe in re-useable code, and not doing the same thing twice. We need to understand that skilled programmers are a resource like anything else, and we shouldn’t be squandering their time across repeating the same work for different OSes. 

Sure, it’s a rite of passage to code your own drivers – But when you have to code 27 drivers just to obtain basic functionality for your new OS, so you can work on what is really driving you to create that new OS, you’re wasting time.

Some Open-Source OS drivers come from the hardware vendor directly. A large percentage of the drivers in Open-Source OSes are reverse engineered.  While some are very stable, fast, and feature-rich, many are not. 

Why should the Free-OS crowd have to make do with lower quality drivers than those who run Windows?

Why should the programmers on the freebsd-scsi list have to deal with bug-stomping on reverse engineered drivers for a product that they didn’t make, when they could be working on the CAM structure (a product they are making)?

At the risk of over-simplifying Richard Stallman’s position, this is how I see his stance against it;

     There is a chance that a UDI driver made by/for an Open-Source OS team would be used in a commercial product and possibly distributed by the hardware vendor.

And you know what? Yup, there is a chance in the very early stages of a switch-over to UDI drivers that something like that may happen.  If you’re concerned, slap a GPL v3 license on the driver you code and be done with it.

However, no hardware manufacturer will ever take drivers from the Open-Source community and use them as their own for any considerable period of time– They will develop their own drivers, so they can properly support and control the quality and performance of their hardware. 

If the hardware vendors want to keep them proprietary, so be it. This is another key point (I think) in  Richard Stallman’s stance against UDI

-         If the big hardware vendors could release closed source UDI drivers, then the pressure on the hardware vendors to release open-source drivers would cease.

That’s very true, and possibly a concern. However, I look at this as a game of chess with big-picture goals, and short-term compromises.  

Yes, you may be taking a step backwards in the Open-Source drivers category, but you’d be _vaulting_ Open-Source OS’s miles ahead. 

Just imagine a copy of FreeBSD, Ubuntu Linux, OpenBSD, or OpenIndiana with the same driver support as Windows. 

As a user, think of what you could do with that system.

As a programmer, think of what you could concentrate on instead of tracing down bugs in reverse-engineered driver that so many people use with your software. 

You do realize that with technology like WINE, and Thinapp (which works under WINE nicely), a good Firefox cross-OS implementation, we're closer than ever to properly challenging Microsoft's dominance  on the PC market? If an Open-Source OS could run on anything that Windows runs on, what options would that give you?

UDI is something that the hardware vendors want (it’s why they all joined the working group). It’s something that any OS designer would want (less work for me? Great!), and it’s something that the end user will want (you mean all my hardware now works with FreeBSD?)

How do we get UDI started again? It’s a bit of a chicken-and-egg scenario:

1 - We need pressure on the Hardware Vendors to start releasing real UDI drivers for current hardware that people want. 

2- We  need pressure on the Open-Source OS developers to take the ProjectUDI sample implementations of UDI for their OS and integrate it into the recent releases of each OS.

With drivers available, and OSes to use said drivers, UDI will start gaining some ground.

If there are bloat or performance problems with UDI, they will be exposed, and worked around or repaired. Are there other hurdles to UDI acceptance? Lets expose and conquer them.

I’m very interested in any comments or suggestions about UDI, how it died, and anything we can do to get this project started again.

Tuesday, March 13, 2012

Blank NFS Datastore under ESXi - Check your MTU

If you've used ESX/ESXi over NFS long enough, you have probably encountered a situation where you have a connected NFS datastore, but when you browse the datastore, it's blank, or only a few files are showing.

There are many reasons for this situation, and there is plenty of vmWare KB's about troubleshooting such a problem.

Turning on NFS logging is a good idea.

This is the standard vmWare Troubleshooting document.

One thing that you may not be thinking of is a MTU mismatch.

I had this exact problem tonight, and it unfortunately took a long time to figure out because I was suspecting FreeBSD as being the culprit due to some struggles I've had with it as of late.

I came across this by issuing this command;

vmkping -s 8500 san0 -s 8500

This tells the ping command to send a much larger payload than usual. My 10Gbe network is setup with a MTU of 9000, something I verified many times over, so I really didn't think this was the issue. I only entered it out of completeness after realizing this was a serious problem that wasn't going to be solved quickly, and I needed to start a proper documentation trail of what I was doing if I hoped to solve the issue.

I won't bore you with the details of my troubleshooting, and get to the point.

Turns out my new Dell M8024-k blade 10Gbe switch requires you to set the MTU on the LAG channel group separately from the ports. That's not uncommon, but as far as I can tell, this isn't in the M8024-k GUI anywhere, it can only be done via the CLI.

I guess I've been spoiled by LAG's that set their MTU based on the members MTU, and when I didn't see a separate LAG MTU setting in the GUI anywhere, I assumed that everything was working fine - After all, pings were working, and that didn't work in the past when I had the wrong MTU applied on different adapters (although that could also be an oddity with the FreeBSD lagg driver).

By default the Dell M8024-k makes LAG groups with a MTU of 1500. 

What's odd is that some connectivity is maintained with this mismatch. You can browse and list some directories, and even see some information - I believe the point where it really breaks is when it needs to transfer more than the 1500 limit allows.

Further oddness: FreeBSD didn't have problems mounting and browsing the directories over NFS that ESXi couldn't browse or list properly. This could be to FreeBSD connecting via UDP instead of TCP, at this point I'm not sure.

I thought I'd pass this along in case anyone else is forgetting about each point in your data transmission chain that a MTU mismatch could be affecting.