How To Dynamically Instantiate Classes In PHP

Posted on March 10th, 2009 in Annabel, Work | 2 Comments »

For Annabel, I need to be able to dynamically instantiate classes, based on options in a configuration file (Zend_Config_Ini). However, some of these classes have constructors (some with required options), others don’t even have a constructor. I first tried to instantiate them with call_user_func_array, like this:

$datatype_options = explode(',', $global_config->customer->data->field->$key->type);
$datatype_class = array_shift($datatype_options);
$class = call_user_func_array(array($datatype_class, '__construct'), $datatype_options);

This didn’t work, since some classes don’t have a constructor and they were not static. Then, I came up with another solution: use a generic static method getInstance() to do it, like this:

class Test {
	public static function getInstance($classname)
	{
		return new $classname();
	}
}

$datatype_options = explode(',', $global_config->customer->data->field->$key->type);
$datatype_class = array_shift($datatype_options);
$class = call_user_func_array(array('Test', 'getInstance'), $datatype_options);

This worked, as long as I don’t need any constructor parameters. But for some classes, I do need them. So I went thinking, and came to the conclusion that I would need the Reflection API to do what I want. Finally, I implemented the following solution, which works perfectly. It could probably be cleaned up a little bit, but it basically works:

class Annabel_Data
{
    /**
     * Creates instances of classes in the Annabel_Data package.
     *
     * To use this, provide arguments in the order you would need them
     * in the instantiated classes, with as the first argument the name
     * of the class to instantiate.
     *
     * @param $arguments An array with arguments
     *
     * @return Annabel_Data_Abstract
     */
    public static function factory($arguments)
    {
        $class_name = array_shift($arguments);
        $class_name = 'Annabel_Data_' . ucfirst($class_name);
        Zend_Loader::loadClass($class_name);
        $reflector = new ReflectionClass($class_name);
        if ($reflector->isInstantiable()) {
            $constructor = $reflector->getConstructor();
            if (is_null($constructor)) {
                return $reflector->newInstance();
            }
            $params = $constructor->getNumberOfParameters();
            $req_params = $constructor->getNumberOfRequiredParameters();
            if (count($arguments) < $req_params) {
                throw new Annabel_Data_Exception('Please provide ' . $req_params . ' parameters to instantiate class ' . $class_name);
            }
            if (0 < $params) {
                return $reflector->newInstanceArgs($arguments);
            } else {
                return $reflector->newInstance();
            }
        } else {
            throw new Annabel_Data_Exception("Could not instantiate a class of type '" . $class_name . "'");
        }
    }
}

$datatype_options = explode(',', $global_config->customer->data->field->$key->type);
$class = Annabel_Data::factory($datatype_options);

This works perfectly, for each case I need it (and tested it ;) ). I should add some more exception checking and things like that, but those are minor details.
Oh, and last but not least, some credits go the commenters over at the PHP Reflection API documentation.

40x Speedup With iconv And PHP

Posted on March 3rd, 2009 in Annabel, Development, Personal, Work | 1 Comment »

For our product Annabel (dutch), we have to cleanup the data our customers provide us with. Because this is a fully automated process, we are unable to give feedback and have them fix their input. Therefore, I need a means to clean the data up, so we can process it.

Since we don’t need to support any unicode stuff, we can stick with just plain ASCII. That’s a very safe approach, which will reduce the chances of failure greatly. To convert the UTF-8 (Unicode) input into ASCII data, we use GNU C Library iconv in combination with PHP.

The default iconv has two caveats: it stops on an unconvertible string and it prints a question mark when it does not have an equivalent character (or transliterated character) in the destination charset. To overcome this problem, I used to just convert every single character with the PHP  iconv function, which gave me a throughput of about 250KiB/sec, using the following code:

/**
*  Replaces special characters with their ASCII equivalents.
*
* This function uses iconv to replace each seperate character with its
* ASCII equivalent, using the ASCII//TRANSLIT option. However, this makes
* the function very slow: max throughput is about 150KiB/sec.
*
* @param string $line
* @return string
*/
protected function _convertSpecialChars($line) {
if (!empty($line)) {
$new_line = "";

/*
* This potentially could be a very long string, so don't split the line
* in separate tokens, for that would tak way too much memory.
*/
$line_length = strlen($line);
for ($x=0; $x < $line_length; $x++) {
$old_char = substr($line, $x, 1);

/*
* Use iconv to replace the other special characters.
* If iconv can't convert it (and so returns '?'), just skip
* the character, for it probably is something malicious and
* there's probably no need to keep it anyway.
*
* Beware for the edge case if the original character is ? also
*/
$char = iconv('UTF-8', 'ASCII//TRANSLIT', $old_char);
if ( ('?' != $char) && ('?' != $old_char) ) $new_line .= $char;
}
}

return $new_line;
}

However, I was not satisfied with this, so I looked up the man page of the iconv version of GNU C Library. I supposed PHP was internally using this one, so that seemed a natural action. In that man-page I foud the IGNORE option, which just skips any character which cannot be converted or transliterated. That was exactly what I wanted. So I tried that with the PHP function as well, and it worked. Instead of converting every single character, I can now convert a whole file at once, which gave me a throughput of 11MiB/sec. The caveat, of course, is that I have to use the GNU C Library iconv, with a version the same (or greater than) the current one, to avoid compatibility problems. However, that’s a price I’m surely willing to pay. The new code is this (removed proprietary Annabel specific code):

/**
*  Replaces special characters with their ASCII equivalents.
*
* This function uses iconv to replace each seperate character with its
* ASCII equivalent, using the ASCII//TRANSLIT,IGNORE option. Throughput
* is measured at about 11MiB/sec.
*
* WARNING: Using the extra IGNORE option only works with a recent
* GNU libc iconv, so be very picky about which iconv to use! This is an
* undocumented feature, which is not supported by default and is not
* listed in the PHP manual!
*
* @param string $line
* @return string
*/
protected function _convertSpecialChars($line) {
/*
* Check whether we have the right version of iconv
*/
if ( ('glibc' !== ICONV_IMPL) || (true == version_compare(ICONV_VERSION, '2.8.90', '<')) ) {
throw new Exception('Please use the glibc iconv, version 2.8.90 or higher');
}

/*
* Use iconv for speed and glory
* We use the ASCII//TRANSLIT,IGNORE option to replace the string
* with its ASCII transliterated equivalent. If there's no ASCII
* equivalent, the IGNORE option makes sure the character is just
* thrown away, which is exactly what I want.
*
*/
$new_line = iconv('UTF-8', 'ASCII//TRANSLIT,IGNORE', $line);

return $new_line;
}

I guess I don’t need to comment on this code example ;) If you do have questions, just ask them in the comments, or via @jacobkiers on Twitter.