Nothing much to say here, but with absolutely minimal pain and suffering, I have 64-bit linux virtual machines running on top of my 32-bit windows XP install. This pleases me.

The recipe:

  1. Compatible CPU with VT-x/AMD-V enabled in the BIOS
  2. Innotek/Oracle/Sun VirtualBox (a current version) with hardware virtualization enabled
  3. Profit!

The one downside to this? 64-bit VM’s running on 32-bit host OS can’t see multiple cpu’s. Boo. Hoo. I’ll just run more VM’s!

64-bit centos installer 64-bit ubuntu livecd

Caveman profiling with a side of “where were you at 9pm on the night in question?” As always, season to taste.

<?
$_profile_log = "/tmp/php-profile.log";

function _profile() {
    static $fh;
    if( !isset($fh) ) {
        global $_profile_log;
        if( !file_exists($_profile_log) ) {
            @touch( $_profile_log );
            @chmod( $_profile_log, 0664 );
        }
        $fh = @fopen( $_profile_log, "a" );
    }
    if( !$fh )
        return false;

    $stack = debug_backtrace();
    if( $stack[1] )
        $base = $stack[1];
    else
        $base = $stack[0];
    $buf = $base['file'].":".$base['line'].", ";
    if( $base['class'] )
        $buf .= $base['class'].$base['type'];
    $buf .= $base['function'];

    $buf = sprintf("[%s] %s\n",date("H:i:s"),$buf);
    return @fwrite( $fh, $buf );
}
?>

Read the rest of this entry »

Several months ago, I’d switched to using Percona’s xtrabackup & innobackupex for all of my mysql backup needs. I had successfully used these backups to restore and replicate databases across several systems. It is good stuff.

Last week, I needed to set up new replication of an 80gb database. This should have been routine by now, but when I attempted to prepare the backup this time, it whined and complained and failed. I was kind of frazzled by the time I gave up on the issue and declared it a fluke of one sort or another.

Last night, I tried again from Sunday’s full backup, and it happened again:

ammon@amy:/var/lib/2009-08-16_04-02-17$ sudo xtrabackup --prepare --target-dir=.
xtrabackup  Ver 0.8.1rc Rev 78 for 5.0.83 unknown-linux-gnu (x86_64)
xtrabackup: cd to .
xtrabackup: This target seems to be not prepared yet.
xtrabackup: xtrabackup_logfile detected: size=75546624, start_lsn=(86 1293090752)
xtrabackup: Temporary instance for recovery is set as followings.
xtrabackup:   innodb_data_home_dir = ./
xtrabackup:   innodb_data_file_path = ibdata1:512M:autoextend
xtrabackup:   innodb_log_group_home_dir = ./
xtrabackup:   innodb_log_files_in_group = 1
xtrabackup:   innodb_log_file_size = 75546624
xtrabackup: Starting InnoDB instance for recovery.
xtrabackup: Using 104857600 bytes for buffer pool (set by --use-memory parameter)
InnoDB: Log scan progressed past the checkpoint lsn 86 1293090752
090818  2:54:34  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
090818  2:54:34  InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
InnoDB: If you are installing InnoDB, remember that you must create
InnoDB: directories yourself, InnoDB does not create them.
InnoDB: File name .//tmp/#sql6e1e_8cce1_0.ibd
InnoDB: File operation call: 'create'.
InnoDB: Cannot continue operation.

I gave up after poking a few things.

This morning’s fresh look turned up this bug report.

ammon@amy:/var/lib/2009-08-16_04-02-17$ sudo mkdir tmp
ammon@amy:/var/lib/2009-08-16_04-02-17$ sudo xtrabackup --prepare --target-dir=.
xtrabackup  Ver 0.8.1rc Rev 78 for 5.0.83 unknown-linux-gnu (x86_64)
xtrabackup: cd to .
xtrabackup: This target seems to be not prepared yet.
090818 12:27:41  InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
xtrabackup: Warning: cannot open ./xtrabackup_logfile. will try to find.
xtrabackup: 'ib_logfile0' seems to be 'xtrabackup_logfile'. will retry.
xtrabackup: xtrabackup_logfile detected: size=84983808, start_lsn=(86 1293090752)
xtrabackup: Temporary instance for recovery is set as followings.
xtrabackup:   innodb_data_home_dir = ./
xtrabackup:   innodb_data_file_path = ibdata1:512M:autoextend
xtrabackup:   innodb_log_group_home_dir = ./
xtrabackup:   innodb_log_files_in_group = 1
xtrabackup:   innodb_log_file_size = 84983808
xtrabackup: Starting InnoDB instance for recovery.
xtrabackup: Using 104857600 bytes for buffer pool (set by --use-memory parameter)
InnoDB: Log scan progressed past the checkpoint lsn 86 1293090752
090818 12:27:41  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Doing recovery: scanned up to log sequence number 86 1298333184 (6 %)
InnoDB: Doing recovery: scanned up to log sequence number 86 1303576064 (13 %)
InnoDB: Doing recovery: scanned up to log sequence number 86 1308818944 (20 %)
InnoDB: Doing recovery: scanned up to log sequence number 86 1314061824 (27 %)
InnoDB: Doing recovery: scanned up to log sequence number 86 1319304704 (34 %)
090818 12:27:44  InnoDB: Starting an apply batch of log records to the database...
... snip ...

That’s right. There’s a bug in innodb restoration that interprets location of /tmp (configurable in my.cnf) to be relative in stead of absolute.

So, if you have problems while trying to restore from an xtrabackup/ibbackup snapshot (or if you’re trying to recover innodb after a crash), just creating the offending tmp directory appears to work.

This is a rudimentary template that I’ve been using for very quick and dirty /etc/init.d scripts recently.

It works under the assumption that your server daemon has a unique name and only ever runs a single instance – this also means that the binary and the init.d script cannot share a name – otherwise strange things happen ;)

Actual invocation logic may need to be updated on a per-service basis and chkconfig style headers would have to be added manually, but it works well for what it is.

#!/bin/bash

DIR=''	# path to the daemon executable
CMD=''	# name of the command itself
ARG=''	# optional. any arguments to pass when starting
NAM=''	# descriptive name of the daemon so it shows up pretty

function get_ps {
	ps --no-header -C${CMD}
}

function do_start {
	echo -n "Starting ${NAM}... "
	cd ${DIR}
	nohup ./${CMD} ${ARG} &
	SUCC=`get_ps | wc -l`
	if [ "1" == "$SUCC" ]; then
		echo "[SUCCESS]"
	else
		echo "[FAILURE]"
	fi
}

function do_stop {
	echo -n "Stopping ${NAM}... "
	PID=`get_ps | awk '{print $1}'`
	kill $PID
	SUCC=`get_ps | wc -l`
	if [ "0" == "$SUCC" ]; then
		echo "[SUCCESS]"
	else
		echo "[FAILURE]"
	fi
}

case "${1:-''}" in
	'start')
		do_start
		;;
	'stop')
		do_stop
		;;
	'restart')
		do_stop
		do_start
		;;
	*)
		#echo "Usage: $SELF start|stop|restart|reload|force-reload|status"
		echo "Usage: $SELF start|stop|restart"
		exit 1
		;;
esac

No real preamble to be made here. Gearman is a distributed job queuing system by the fine folks who brought us memcached. It is nicer than anything else I’ve looked at. I am attempting to switch one of my projects over to it (replacing a crufty curl + unix sockets + memcached monstrosity that attempted to do the same job).

The documentation is lacking, but if the discussion group is any indication, real docs are a high priority for the project team. Today, I visited the IRC channel to ask for a status update on docs for the PHP extension api (as opposed to the PEAR all-script api, whose auto-generated docs are broken). Turns out my suspicions were right. Documentation is a high priority and none currently exists for the api in question. However… I was informed that the classes support reflection… so :)

A quick grep of the source for the extension tells me that I am looking at four classes: GearmanClient, GearmanWorker, GearmanJob, and GearmanTask. A ridiculously short php script later…

<?
Reflection::export( new ReflectionClass('GearmanWorker') );
Reflection::export( new ReflectionClass('GearmanClient') );
Reflection::export( new ReflectionClass('GearmanJob') );
Reflection::export( new ReflectionClass('GearmanTask') );
?>

And I can at least try to make a human readable list of available methods.

GearmanWorker

  • __construct()
  • clone()
  • error()
  • returnCode()
  • setOptions( $option, $data )
  • addServer( $host, $port ) – both args optional, examples say defaults are localhost on port 4730.
  • addFunction( $function_name, $function, $data, $timeout ) – data and timeout optional
  • work()

GearmanClient

  • __construct()
  • clone()
  • error()
  • setOptions( $option, $data )
  • addServer( $host, $port ) – reflection says REQUIRED, however the provided examples and personal experience says otherwise
  • do( $function_name, $workload, $unique ) – unique is optional
  • doHigh( $function_name, $workload, $unique ) – unique is optional
  • doLow( $function_name, $workload, $unique ) – unique is optional
  • doJobHandle()
  • doStatus()
  • doBackground( $function_name, $workload, $unique ) – unique is optional
  • doHighBackground( $function_name, $workload, $unique ) – unique is optional
  • doLowBackground( $function_name, $workload, $unique ) – unique is optional
  • jobStatus( $job_handle )
  • echo( $workload )
  • addTask( $function_name, $workload, $data, $unique ) – data and unique are optional
  • addTaskHigh( $function_name, $workload, $data, $unique ) – data and unique are optional
  • addTaskLow( $function_name, $workload, $data, $unique ) – data and unique are optional
  • addTaskBackground( $function_name, $workload, $data, $unique ) – data and unique are optional
  • addTaskHighBackground( $function_name, $workload, $data, $unique ) – data and unique are optional
  • addTaskLowBackground( $function_name, $workload, $data, $unique ) – data and unique are optional
  • addTaskStatus( $job_handle, $data ) – data is optional
  • setWorkloadCallback( $callback )
  • setCreatedCallback( $callback)
  • setClientCallback( $callback)
  • setWarningCallback( $callback)
  • setStatusCallback( $callback)
  • setCompleteCallback( $callback)
  • setExceptionCallback( $callback)
  • setFailCallback( $callback)
  • clearCallbacks()
  • data()
  • setData( $data )
  • runTasks()

GearmanJob

  • __construct()
  • returnCode()
  • workload()
  • workloadSize()
  • warning( $warning )
  • status( $numerator, $denominator )
  • handle()
  • unique()
  • data( $data )
  • complete( $result )
  • exception( $exception )
  • fail()
  • functionName()
  • setReturn( $gearman_return_t )

GearmanTask

  • __construct()
  • returnCode()
  • create()
  • free()
  • function()
  • uuid()
  • jobHandle()
  • isKnown()
  • isRunning()
  • taskNumerator()
  • taskDenominator()
  • data()
  • dataSize()
  • takeData( $task_object ) – optional
  • sendData( $data )
  • recvData( $data_len )

The extension also appears to expose all constants defined in the C api.

I have since added this to the official wiki – so there are at least SOME docs on the site now ;)

As of version 5.0, PHP has had the ability to dynamically include required classes as needed – without requiring the developer to manually include all possible dependencies beforehand. This means that in cases where your code execution never touches 39 of the 40 classes in the project, it loads, parses, and runs that much faster.

There is a performance hit for actually having to call the __autoload() method, but if you’re in a situation where the hit for executing a few extra comparison calls is unacceptable… you probably aren’t developing in PHP in the first place ;)

Almost all of the php I’ve written in the last 2-3 years uses autoloading, and it has probably saved me hundreds of hours of aggravation.

In most of my projects, the first line of any script or class usually looks something like this:

require_once "/var/www/common/lib.php";

Then lib.php usually reads something like this:

<?
function __autoload( $class ) {
    include_once( "$class.php" );
}
?>

And that is all that is strictly required to make the magic happen. It is fast, it is easy to understand, it is easy to use. You can use require_once() or include_once() and there is very little meaningful difference.

I’ve looked around the net and found several other attempts at improving on this simple mechanism. But they invariably overcomplicate things. They attempt to recurse source directories, cache filename->class differences to the filesystem, and otherwise turn what should be a simple filesystem operation that the php environment supports natively into a mess of exception handling and wheel reinvention.

There are obviously theoretical instances where you might want to have more than the one require_once/include_once line… but I’ve honestly never encountered one myself.

I mean, you could try to throw an exception if the file didn’t exist or otherwise failed to load… but nothing will happen. Failure to instantiate a nonexistant class is a fatal error in PHP, and will be handled as such with or without you – preempting any attempt at throwing an exception.

The only thing you can add is a bit of extra diagnostics or maybe logging to a separate location.

Assume that we have a file ‘test.php’:

<?
require_once "autoload.php";
$frog = new Frog();
?>

If autoload.php contains a simple simple autoload function that uses require_once(), and Frog.php doesn’t exist anywhere in your include path, the results will look something like this:

ammon@kif:~$ php test.php 

Warning: require_once(Frog.php): failed to open stream: No such file or directory in /home/ammon/autoload.php on line 3

Fatal error: require_once(): Failed opening required 'Frog.php' (include_path='.:/usr/share/php:/usr/share/pear') in /home/ammon/autoload.php on line 3

If we had used an include_once() call, the output is similar, but slightly more informative:

ammon@kif:~$ php test.php 

Warning: include_once(Frog.php): failed to open stream: No such file or directory in /home/ammon/autoload.php on line 3

Warning: include_once(): Failed opening 'Frog.php' for inclusion (include_path='.:/usr/share/php:/usr/share/pear') in /home/ammon/autoload.php on line 3

Fatal error: Class 'Frog' not found in /home/ammon/test.php on line 4

So that’s probably a bit more useful in tracking down the error. Require calls don’t return anything – they throw a fatal error on failure. Include calls, however, return FALSE on failure and TRUE if the file is (or, in the case of include_once, has already been) successfully included. So you can include_once() and write to a separate logfile (or to the output stream…) if you need more information than the fatal error already provides you.

<rant>

To those who insist on giving your classes and their containing files different names… umm. Wow.

If I have a class called DatabaseConnection, I’m going to put it in a file called DatabaseConnection.php. If I’m working with strange people who somehow don’t think that is explicit enough, I might call it DatabaseConnection.class.php and tweak the autoload method ever so slightly to compensate. There’s no good reason to put it in a file called projx-database_connection.incl or something. No. There isn’t.

If you want to organize your classes into a meaningful directory structure… good for you. Use PHP’s built-in include_path ini option. Don’t waste time trying to cascade down a directory structure searching for the classes – just make sure your includes are all in a set of reliable locations. You don’t actually have to edit the php.ini file and bounce Apache or your php-cgi processes, just define the additional include paths in the same file where you define your autoloader:

set_include_path(
    get_include_path() . PATH_SEPARATOR .
    "/var/www/includes" . PATH_SEPARATOR .
    "/var/www/includes/apple" . PATH_SEPARATOR .
    "/var/www/includes/banana"
);

Naturally, you could turn that into some function calls to dynamically register and unregister directories, etc… but at that point, you’re probably hurting yourself again. If your codebase is being reorganized enough to make maintenance of the list of include dirs onerous without full time intervention, something else has probably already gone very wrong. At best, the code probably doesn’t work anyway, so any brief delay in updating the list can’t hurt any more than whatever else is happening.

</rant>

But seriously. __autoload() is your friend. It will help clean up your code if you let it. It can help enforce naming conventions. It can even improve performance… so long as you refrain from using it to shoot yourself in the foot. ;)