Wednesday, January 15, 2014

Using MessagePack as Transport Beween a Python Tornado Server, and a JavaScript Browser Client

First of all: Good Luck. :-)

MessagePack is a great binary serialization format, allowing you to transport globs of variables across whatever transport you like. It's very similar to JSON, but leaning towards a tighter packet size , as shown in the graphic I took from the site above.

MessagePack also has a great variety of languages supported. If you code it, you can probably find a msgpack.X file on their site.

Now the problems:

1) MessagePack has evolved over the years, and updated the specifications for the protocol. This isn't well documented, nor is there a "Version 2.1" or similar tag on the specification documents. If you look around, you can see that some links refer to this as v5, but how do I know that we're not at v8 and that linking site is stale? This makes it less obvious about what version a msgpack library supports, which means it's more difficult to choose compatible msgpack libraries when you're using different languages. 

2) No CHANGELOG, because there are no revisions/versions. Good luck seeing what msgpack was 3 years ago without detective work.

3) Because the specification has changed, some of the MessagePack libraries are up-to-date, others are years behind. The pure Python is up-to-date as of Dec 2013, but the JavaScript msgpack.js (v1.05) is not.

Through trial and error, I was able to get the JavaScript version working with the version, but not the standard (which is a wrapper for the C library).

However, since is following the new specification, it can create msgpacks that the JavaScript version can't decode, particularly when Python is sending a long string.

I've patched msgpack.js, and have contacted the author (uupaa) to see if he has an updated version. If that dosen't pan out, I'll fork and update the code so others can have access to it.  I'm also interested in patching it to work with JavaScript Blobs, so I can send binary data to it.

There is one small error in the file:

The problem is in _pack_string. It's calculating the length of the string before it's encoded to UTF-8.

I think you must encode the string before you find the length of it, as some characters need to encode as double-byte or longer.

An example would be the French name Allagbe, or the French word precedent , where the 'e' is with Acute (
Python's encoder makes this b'Allagb\xc3\xa9', which is one byte longer than than the original string.

u-msgpack encodes this as b'\xa7Allagb\xc3\xa9'  - notice how it's only 7 bytes long - it's trimming the \xa9 char from the msgpack.

When you feed this trimmed string through a .decode('utf-8') method, you'll crash with a python error : 'utf-8' codec can't decode byte 0xc3 in position 6: unexpected end of data

The solution is to encode to UTF-8 before calculating the string length, as detailed below:

def _pack_string(x):
x = x.encode('utf-8')
    if len(x) <= 31:
        return struct.pack("B", 0xa0 | len(x)) + x
    elif len(x) <= 2**8-1:
        return b"\xd9" + struct.pack("B", len(x)) + x
    elif len(x) <= 2**16-1:
        return b"\xda" + struct.pack(">H", len(x)) + x
    elif len(x) <= 2**32-1:
        return b"\xdb" + struct.pack(">I", len(x)) + x
        raise UnsupportedTypeException("huge string")

With this patch in place, I'm able to pass plenty of French and Italian names through to msgpack.js

Now.. if you're trying to make all of this work across Tornado's WebSockets, keep in mind that what you get from Tornado will probably not fit into's .unpackb() function properly, but I'm out of time to detail that fun today.

In the end: I'm happy with MessagePack, but I wish it was a little easier to get into.