Tuesday, March 12, 2013

(FIXED) Network mbuf Leak / Exhaustion in FreeBSD 9.0 / 9.1

UPDATE: Patch included below

There is a rather annoying bug floating around in FreeBSD 9.0 / 9.1 regarding network mubf leaking to the point of mbuf exhaustion.

These is a problem report (PR) filed about this from last year, but it looks to be abandoned.

http://www.freebsd.org/cgi/query-pr.cgi?pr=165903&cat=



 I am experiencing the same mbuf leak on fresh 9.1-RELEASE machines (AMD64). Most of my machines are ESXi 5.1 VM's running the e1000 (em0) NIC. This VM is stock, just one freebsd-update done, nothing custom.

 I have also experienced this condition on an older 9.0-STABLE from Jul 1st 2012. I did not notice it much before that date, but I can't tell for sure. I have a few machines on that build that I still use, so confirmation was easy.

 I do not experience the error if I load up vmware tools and use the vmx3f0 adapter, it's just with em0.

 I have set the mbufs to very high numbers via sysctl kern.ipc.nmbclusters=322144 to buy more time between lockups/crashes. Most often the systems stay functional, they just need a reboot or more mbufs if I add them. Some times the servers lock up or crash as I ifconfig down/up the adapter or attempt to add more mbufs via sysctl.

Is anyone else able to reproduce this?

I have attempted to update the PR or post to the list, but the freebsd.org server and my mail server no longer seem to get along. I'll have to troubleshoot that later this week.

UPDATE Apr 19th 2013:

Gleb Smirnoff was kind enough to quickly forward me a patch that fixed the problem for me. You will need to apply this to usr/src/sys/netinet/if_ether.c

I've now run for 2 days, and my mbufs have not increased at all.  Thanks for the quick response from the FreeBSD-Stable list.

Index: if_ether.c
===================================================================
--- if_ether.c    (revision 249327)
+++ if_ether.c    (working copy)
@@ -558,13 +558,13 @@ in_arpinput(struct mbuf *m)
     if (ah->ar_pln != sizeof(struct in_addr)) {
         log(LOG_NOTICE, "in_arp: requested protocol length != %zu\n",
             sizeof(struct in_addr));
-        return;
+        goto drop;
     }

     if (allow_multicast == 0 && ETHER_IS_MULTICAST(ar_sha(ah))) {
         log(LOG_NOTICE, "arp: %*D is multicast\n",
             ifp->if_addrlen, (u_char *)ar_sha(ah), ":");
-        return;
+        goto drop;
     }

     op = ntohs(ah->ar_op);


5 comments:

  1. Christopher - sorry to go off on a tangent...

    You posted a problem in the VMWare support forums that is *exactly* the same as I am having with DHCP broadcast packets not being passed from one host to another on the same dvPortGroup. I was just wondering if you ever found a solution. I don't believe it is a hardware problem. The URL is http://communities.vmware.com/message/2020958#2020958
    Please followup if solved or e-mail me at maulermark@yahoo.com
    Thanks!

    ReplyDelete
  2. Hi Mark,

    The problem for me was Dell hardware, and it was resolved.

    I've updated my original post on vmware.com to reflect that fact.

    Have fun tracking down the source.. it can take a while.. :-)

    ReplyDelete
  3. [Update]
    The above two comments are not specifically about the problem I posted here.

    Concerning FreeBSD mbuf leak:

    I'm installing a new 9.1-STABLE system to see if the problem (with FreeBSD) persists.

    I should know in a few days, as the build process just completed.

    ReplyDelete
  4. I've got the same problem my firewall (pf + pfsync) on 9.0.
    Driver: em(4) with chipset 82571EB.

    netstat -m | grep allocated:

    559467K/4059K/563526K bytes allocated to network (current/cache/total)

    ReplyDelete
  5. Olivier:

    Stay tuned, I'm testing a patch that I installed today that may do the trick.

    ReplyDelete