Crawling UTF-8 pages using the Symfony2 DomCrawler component

Just a small gotcha for anyone using Symfony2′s DomCrawler component. The standard behaviour of the class (from the current docs) is:

$crawler = new Crawler($html);
 
foreach ($crawler as $domElement) {
    print $domElement->nodeName;
}

However, this will assume the document is ISO-8859-1. If you want to crawl a UTF-8 page correctly do it like so:

$crawler = new Crawler;
$crawler->addHTMLContent(file_get_contents('http://www.columbia.edu/~fdc/utf8/'), 'UTF-8');
 
foreach ($crawler as $domElement) {
    print $domElement->nodeName;
}

The second parameter to addHTMLContent is ‘UTF-8′ by default, but I’ve added it to illustrate that you could use other character sets too.

Migrating from apache to nginx (wordpress edition)

Today I migrated my whole site from apache to nginx. The main reason for this being that nginx seems to handle load and use less memory on smaller boxes. It’s also an opportunity for me to try something new.

I’ll cut straight to the chase. There’s some great information already available. At the time of this writing though, both sets of instructions didn’t work for me. I’ll come to why later.
Continue reading