Gravitonic
Andrei Zmievski

21-February-2007
PHP 6 and Request Decoding

It looks like we have finally settled on an approach for HTTP input (request) decoding in PHP 6. There have been no fewer than 4 different proposals floated before, but this one combines flexibility, performance, intuitiveness, and minimal architectural changes, and has only a couple of small drawbacks. Let's take a closer look.

As you probably know, correctly determining the encoding of HTTP requests is somewhat of an unsolved problem. I know of no mainstream clients that send the charset specification along with the request. This means that it is up to the server or the application to figure out the encoding, which can be done in a number of ways, including encoding detection, looking at Accept-Charset header, parsing request to see if _charset_ field is passed, and more. Unfortunately, none of them are completely reliable and the best you can do is guess the encoding with some degree of confidence.

The approach that we decided on is basically a lazy evaluation scheme. When PHP receives the request, it will simply store it internally as-is and not do any request decoding at all. However, if your script happens to access $_GET, $_POST, or $_REQUEST arrays, the runtime JIT handler will kick in and convert the values in the array from binary (raw) to Unicode based on the current HTTP input encoding setting. This will be done for the whole array at once, not per element. The encoding setting can be changed at runtime via tentatively named http_input_encoding() function. If the encoding is changed, the JIT handler is re-armed and the next access to the arrays will re-convert the stored raw data to Unicode based on the new setting.

The advantages of this approach are numerous. For one, PHP is not forced to guess the encoding of the request during request parsing stage, which happens before the script is executed. This allows the application to explicitly set the expected encoding or query other sources for the possible encoding value. For example, there could be a function that performs encoding detection on the request and returns the guess along with the degree of confidence; or PHP could parse the request and provide the raw value of the _charset_ field. In either case, it is up to the application to set the encoding before accessing the request arrays. Secondly, PHP does not have to do request decoding until it is necessary to do so, removing the upfront cost for scripts that do not need request arrays. Thirdly, in case there are conversion errors, they are processed using the same mechanism that PHP employs for other encoding conversions, allowing application to set a custom conversion error handler.

One possible problem with this approach was pointed out by Rasmus. Someone could try to inject bogus data into the request, so that when the app accesses a request array for the first time, the bogus data trigger the errors in the conversion process. I think we can deal with this issue in a sensible way, and that the pros of our approach outweigh the cons. Note that the decoding of the request has nothing to do with filtering. The job of the filter extension is to validate or sanitize the data, and it has to operate on the results of the request conversion, i.e. Unicode strings.

Hope this has been a useful preview of this very important part of PHP 6. Once this functionality is complete, we can finally make the Unicode preview release. Stay tuned.

Posted at 11:27 | Permalink | | Comments (4)
16-February-2007
99 Frameworks of Code Out There

People, enough with new frameworks already. I know you might be lusting after Rails for some reason and want to have the fame, the glory, and the dancing girls of DHH, but are we not going to be satisifed until Sourceforge is filled with the object-oriented diarrheal remains of our overblown egos and delusions of grandeur? I counted no less than 3 separate announcements about new PHP frameworks today, just by scanning the front pages of phpdeveloper.org and planet-php.net. As well intentioned and technically robust as these efforts might be, do we really need yet another patterns-based abstracted MVC-driven buzzwords-filled concoction? Look at the list of existing PHP frameworks, are they really all that different? Why start another one? Why? How long are we going to be suffering from the NIH syndrome? Oh, the humanity...

If you have an itch that only frameworks can scratch, then my advice, should you choose to take it, is to find an existing mature framework that gets as close as possible to your requirements and work on it. Add features, fix bugs, write documentation, promote, contribute, and improve in general, but resist the urge to spew out a torrent of code into our environment simply because you thought of an oh-so-clever moniker and need to stick it onto something. Please, no more new frameworks.

Posted at 16:50 | Permalink | | Comments (34)
"VIM for (PHP) Programmers" slides and resources

By popular demand, I have uploaded the slides and the VIM script files from my VIM for (PHP) Programmers talk in Vancouver. You can find them on the Talks page.

This was the inaugural presentation of the talk, so I beg forgiveness for the rough edges and any inadvertent mistakes (of which there should be none, I hope).

It was nice to see that the talk was well attended and received. With any luck, it might become a semi-regular one on the PHP conference circuit. Happy VIM'ing.

Posted at 0:18 | Permalink | | Comments (9)
15-February-2007
Unicode slides from Vancouver PHP Conf

The slides from my Unicoding with PHP 6 talk are now available on the Talks page. VIM slides and resources will be coming up shortly.

I want to thank Shane Caraveo, Audrey Foo, Peter, and the rest of the organizers for the excellent, well-run conference. I really enjoyed the variety and quality of the talks.

Posted at 10:41 | Permalink | | Comments (1)