POST requests and character encoding
While trying to eliminate all character encoding problems in my Rails application, I stumbled upon the problem of POST requests and their encoding. The problem with these requests is that, when a very basic HTML form is submitted, some browsers do not indicate the character encoding of the data in the request at all. I tested this on Firefox 3.6.
Most of the info I could find on this simply claims that the encoding of the POST request is the same as the page that contained the submitted form. Therefor, if you serve pages as UTF-8, any forms that are submitted back to you will also be in UTF-8.
That may be true, but that doesn’t really help you, if you’re an idealist who wants to treat the HTTP request like the stateless request that it really is. Such as yours truly.
Looking around in specs, there are two methods a browser can use to indicate the character encoding in a POST request:
- By specifying it in the Content-Type header, such as “application/x-www-form-urlencoded; charset=UTF-8”. It looks like there’s a Mozilla bug from back in 1999, in which this was discussed. Eventually, they didn’t opt for this method because it caused breakage on several HTTP server implementations at the time.
- For forms that use the application/x-www-form-urlencoded encoding (most forms that don’t do file uploads), a hidden field named ‘_charset_’ can be included. Browsers will override its value on submission with the encoding used. This will be in HTML5, and you can find it in the current draft.
Neither of these methods are handled by Rails or Rack for Ruby 1.9, and all you get is strings with the #encoding set to US-ASCII, while the string actually contains UTF-8. A nice contradiction and source of exceptions elsewhere deep in your application.
I set out to get this sorted in my app, and wrote a monkey-patch. The patch automatically adds the hidden field when using FormHelper, and tries to deal with both that field and the Content-Type header in requests. It’s been briefly tested in Firefox 3.6 only. You can find it in a gist on Github.