POST requests and character encoding

Friday 22 January, 2010 @ 14:38

While trying to eliminate all character encoding problems in my Rails application, I stumbled upon the problem of POST requests and their encoding. The problem with these requests is that, when a very basic HTML form is submitted, some browsers do not indicate the character encoding of the data in the request at all. I tested this on Firefox 3.6.

Most of the info I could find on this simply claims that the encoding of the POST request is the same as the page that contained the submitted form. Therefor, if you serve pages as UTF-8, any forms that are submitted back to you will also be in UTF-8.

That may be true, but that doesn’t really help you, if you’re an idealist who wants to treat the HTTP request like the stateless request that it really is. Such as yours truly.

Looking around in specs, there are two methods a browser can use to indicate the character encoding in a POST request:

  • By specifying it in the Content-Type header, such as “application/x-www-form-urlencoded; charset=UTF-8”. It looks like there’s a Mozilla bug from back in 1999, in which this was discussed. Eventually, they didn’t opt for this method because it caused breakage on several HTTP server implementations at the time.

  • For forms that use the application/x-www-form-urlencoded encoding (most forms that don’t do file uploads), a hidden field named ‘_charset_’ can be included. Browsers will override its value on submission with the encoding used. This will be in HTML5, and you can find it in the current draft.

Neither of these methods are handled by Rails or Rack for Ruby 1.9, and all you get is strings with the #encoding set to US-ASCII, while the string actually contains UTF-8. A nice contradiction and source of exceptions elsewhere deep in your application.

I set out to get this sorted in my app, and wrote a monkey-patch. The patch automatically adds the hidden field when using FormHelper, and tries to deal with both that field and the Content-Type header in requests. It’s been briefly tested in Firefox 3.6 only. You can find it in a gist on Github.

Apache locale trickery in Ubuntu

Friday 22 January, 2010 @ 10:31

The default Apache install in Ubuntu, and probably Debian too, contains a config file /etc/apache2/envvars which I have consistently ignored. I can’t remember ever having to deal with environment variables in a web application before.

But now I had to, and not realizing it, I spent a good hour fighting a vague problem from various angles, before I finally made the breakthrough.

This config file contains a line “LANG=C” by default. This has many consequences, but one of them is that all file operations in Ruby 1.9 expect files to be in ASCII encoding, while the rest of the system operates in UTF-8.

I ran into this with a Rails application hosted in Apache with Passenger, and a view containing non-ASCII characters. Ruby’s errors when it encounters incompatible encodings are… very terse.

But I imagine the reason that particular line in that particular config file is really there, because someone else banged his or her head against a wall for a good hour too, because LANG wasn’t C.

Of fighting piracy and spam

Saturday 24 October, 2009 @ 18:21

Here’s a thought.

If we applied the same ferocity the media industry has applied to fighting piracy, to fighting spam instead, how much better would our internet be?

If we were to make a list of every IP address of an SMTP server or rogue machine filling our inbox with spam, or of a machine probing and brute-forcing for SSH access to our systems, or attempting to exploit vulnerabilites in our HTTP servers, FTP servers, SMB servers, you name it.

And then we would go and send abuse mail to providers of every single one of them. With logs. Detailed. And thorough inquiries after the perpetrator’s identity.

And we would go all the way to take legal action against these basement crawlers or their shady providers.

Point is…

We take an awful lot of this crap as just an everyday chore for our systems to deal with, and put all of our services in virtual bunkers to prevent anything bad from happening because of it. Which is good sysadmin practice, of course, but…

Then there’s nobody putting these guys on the stand for what they are doing. And it piles up. This often isn’t even that far from our doorstep. Most of the attacks on the systems I manage are from elsewhere in continental Europe.

And furthermore, these things or not legally grey, like some of the things the media industry is pursuing. But pitch black, as far as I know.

So why does it take such incredible effort to get back at these people?

A strange concoction

Saturday 6 June, 2009 @ 23:22

Libvirtweb is a bland and unoriginal name for a spiffy new web interface to libvirt. Huzzah, I am releasing something!

This is an early prototype kind of thing. Here’s what it looks like:

libvirtweb viewing the console of a domain over an SSH tunnel

You will find the code and some quick instructions are hosted at Github:
http://github.com/stephank/libvirtweb/

The rest of this post is some background.

Our office is a battlefield split in two

On the one side, there’s Linux on the desktop, Mac OS X on the desktop, Linux machines powering all primary services we sell. And then on the other, Microsoft Exchange, and Windows administrators. Sarcastic retorts and counter retorts go back and forth.

So far, this is holding up fairly okayish. There’s just one piece of iron powering our office, and a Windows virtual machine gobbling up about half of it’s resources.

We’ve tried a couple of alternative virtualization technologies, but surprisingly, an Ubuntu Server install with KVM was the most stable of all! Now comes the problem of colleagues wanting to manage the thing from their Windows desktops.

Turns out it was a good choice not to make Windows the host. When we doubled up the processor, we found our Windows license goes up to just 4 cores. Score one for the *nix side.

Face to face with libvirt

No doubt one of nastiest obstacles in creating this was libvirt. When you look at it’s goals from a high level (a very very high level), it sounds nice: a single API to manage virtual machines of any kind.

In practice, that has become a single API to manage local and remote iron over various types of connections and tunnels, and running various virtualization technologies. All that in a blocking fashion. Hell, I can’t even imagine how to do all that in a nice asynchronous API.

It looks like the only useful thing the libvirt API can be used for is to script some things in Python. And the Python bindings are very limited. Scripting things in bash works just as well using virsh.

If I had a say in it, the libvirtd daemon would not be optional, and would simply expose a DBus API. Rather like NetworkManager.

Mixing Twisted and CherryPy

The above is written in Twisted and CherryPy. However, when dealing with SSH tunnels, I had to find a way to take the tunnel out of the equation for the browser. The approach I took is a proxy for the RFB protocol, which is what is used to view consoles in KVM and QEMU.

Adding an RFB server to a CherryPy server is not really one of CherryPy’s use cases. However, CherryPy can be treated like a WSGI application. This allowed me to host CherryPy in a twisted.web.wsgi.WSGIResource, and mix the rest of my Twisted components with the webapp.

Using CherryPy as a WSGI application, and specifically in Twisted, is not really documented well anywhere. So here’s a snippit of code to do it. This is based on CherryPy 3.1.2 and Twisted 8.2.0.

#!/usr/bin/env python

import cherrypy
from twisted.internet import reactor
from twisted.web import wsgi, server
from twisted.python import threadpool

# Here's our hello world CherryPy application
class Root(object):
   @cherrypy.expose
   def index(self):
      return "Hello world!"

# Use the 'embedded' configuration template
cherrypy.config.update({'environment': 'embedded'})
# We need to unsubscribe the CherryPy server to prevent a port conflict
cherrypy.server.unsubscribe()
# Start CherryPy internals
cherrypy.engine.start()
# Make sure we shut down CherryPy when we're done
reactor.addSystemEventTrigger('after', 'shutdown', cherrypy.engine.exit)

# Create a WSGI callable from our application
app = cherrypy.Application(Root())
# Twisted needs a threadpool to run WSGI applications
threads = threadpool.ThreadPool()
threads.start()
# Make sure we shut this down too
reactor.addSystemEventTrigger('after', 'shutdown', threads.stop)

# Setup the twisted.web factory, and listen on 8080
resource = wsgi.WSGIResource(reactor, threads, app)
factory = server.Site(resource, 'server.log')
reactor.listenTCP(8080, factory)

# The main loop
reactor.run()

Wii, Linux and OpenEmbedded

Tuesday 22 July, 2008 @ 20:15

In the spirit of “Don’t ever finish my last project and move on to the next”, I have been hacking away on Nintendo Wii related stuff again lately. I wanted to give kernel hacking a shot, and my first step was supposed to be setting up an easy to use build environment.

I went beyond that, and here is bit of work I’d like to call Whiite-OE. It’s an integration of the Whiite Linux kernel patches for Nintendo Wii support, with the OpenEmbedded cross compile environment.

Find it at: http://stephan.kochen.nl/proj/wii-oe/

Next Page »
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.
(c) 2010 Shtééf | powered by WordPress with Barecity