Several months ago, I’d switched to using Percona’s xtrabackup & innobackupex for all of my mysql backup needs. I had successfully used these backups to restore and replicate databases across several systems. It is good stuff.

Last week, I needed to set up new replication of an 80gb database. This should have been routine by now, but when I attempted to prepare the backup this time, it whined and complained and failed. I was kind of frazzled by the time I gave up on the issue and declared it a fluke of one sort or another.

Last night, I tried again from Sunday’s full backup, and it happened again:

ammon@amy:/var/lib/2009-08-16_04-02-17$ sudo xtrabackup --prepare --target-dir=.
xtrabackup  Ver 0.8.1rc Rev 78 for 5.0.83 unknown-linux-gnu (x86_64)
xtrabackup: cd to .
xtrabackup: This target seems to be not prepared yet.
xtrabackup: xtrabackup_logfile detected: size=75546624, start_lsn=(86 1293090752)
xtrabackup: Temporary instance for recovery is set as followings.
xtrabackup:   innodb_data_home_dir = ./
xtrabackup:   innodb_data_file_path = ibdata1:512M:autoextend
xtrabackup:   innodb_log_group_home_dir = ./
xtrabackup:   innodb_log_files_in_group = 1
xtrabackup:   innodb_log_file_size = 75546624
xtrabackup: Starting InnoDB instance for recovery.
xtrabackup: Using 104857600 bytes for buffer pool (set by --use-memory parameter)
InnoDB: Log scan progressed past the checkpoint lsn 86 1293090752
090818  2:54:34  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
090818  2:54:34  InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
InnoDB: If you are installing InnoDB, remember that you must create
InnoDB: directories yourself, InnoDB does not create them.
InnoDB: File name .//tmp/#sql6e1e_8cce1_0.ibd
InnoDB: File operation call: 'create'.
InnoDB: Cannot continue operation.

I gave up after poking a few things.

This morning’s fresh look turned up this bug report.

ammon@amy:/var/lib/2009-08-16_04-02-17$ sudo mkdir tmp
ammon@amy:/var/lib/2009-08-16_04-02-17$ sudo xtrabackup --prepare --target-dir=.
xtrabackup  Ver 0.8.1rc Rev 78 for 5.0.83 unknown-linux-gnu (x86_64)
xtrabackup: cd to .
xtrabackup: This target seems to be not prepared yet.
090818 12:27:41  InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
xtrabackup: Warning: cannot open ./xtrabackup_logfile. will try to find.
xtrabackup: 'ib_logfile0' seems to be 'xtrabackup_logfile'. will retry.
xtrabackup: xtrabackup_logfile detected: size=84983808, start_lsn=(86 1293090752)
xtrabackup: Temporary instance for recovery is set as followings.
xtrabackup:   innodb_data_home_dir = ./
xtrabackup:   innodb_data_file_path = ibdata1:512M:autoextend
xtrabackup:   innodb_log_group_home_dir = ./
xtrabackup:   innodb_log_files_in_group = 1
xtrabackup:   innodb_log_file_size = 84983808
xtrabackup: Starting InnoDB instance for recovery.
xtrabackup: Using 104857600 bytes for buffer pool (set by --use-memory parameter)
InnoDB: Log scan progressed past the checkpoint lsn 86 1293090752
090818 12:27:41  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Doing recovery: scanned up to log sequence number 86 1298333184 (6 %)
InnoDB: Doing recovery: scanned up to log sequence number 86 1303576064 (13 %)
InnoDB: Doing recovery: scanned up to log sequence number 86 1308818944 (20 %)
InnoDB: Doing recovery: scanned up to log sequence number 86 1314061824 (27 %)
InnoDB: Doing recovery: scanned up to log sequence number 86 1319304704 (34 %)
090818 12:27:44  InnoDB: Starting an apply batch of log records to the database...
... snip ...

That’s right. There’s a bug in innodb restoration that interprets location of /tmp (configurable in my.cnf) to be relative in stead of absolute.

So, if you have problems while trying to restore from an xtrabackup/ibbackup snapshot (or if you’re trying to recover innodb after a crash), just creating the offending tmp directory appears to work.

So a fairly longstanding gripe of mine has been that PHP fails to execute registered signal handlers when it receives a signal in the middle of a blocking select call. Today, I finally bumped into a situation where I couldn’t just change the spec to avoid the situation… and I’ve finally figured out how to make it work.

The bug has been reported here, where it was ignored for a few months before being shot down and ignored some more as per php dev team regulations.

Sample code given by the reporter of the bug is markedly similar to the situations I’ve encountered the problem:

pcntl_signal(SIGINT, "sig_handler");
$sock = socket_create_listen($port);
$read_socks = array($sock);
$n = NULL;
$foo = socket_select($read_socks, $n, $n, NULL);

By filling in his blanks, my first test case looks something like this:

<?
function sig_handler($signo) {
        echo "received sig #$signo\\\n";
}
pcntl_signal( SIGINT, "sig_handler" );

$socket = socket_create_listen( 1234 );
$r = array( $socket );
$n = NULL;
while( true ) {
        $foo = socket_select( $r, $n, $n, NULL );
        echo "select returned '$foo'\\\n";
}
?>

When executing the script and pressing ^C (which sends SIGINT), the following occurs:

ammon@morbo:~$ php sigtest.php
PHP Warning:  socket_select(): unable to select [4]: Interrupted system call in /home/ammon/sigtest.php on line 13
select returned ''

Ok, so the warning is to be expected, and we can easily squelch that.

The real problem is that the signal handler never runs.

However… for the first time in my life, a response to a php bug report proves enlightening. The dev who answered this ticket provides his sample code and says he can’t duplicate the bug. Upon looking at the differences between their code, only one difference stands out:

declare(ticks=1);

The declare(ticks) directive is deprecated as of php 5.3 and will not be with us in php 6.0. Ticks are an unreliable, unpredictable, and generally bad thing in php. I’ve neither successfully used them nor seen a successful and justified use.

That being said… turning the tick on but not telling it to do anything appears to address the problem of discarded interrupts:

<?
declare(ticks=1);

function sig_handler($signo) {
        echo "received sig #$signo\\\n";
}
pcntl_signal( SIGINT, "sig_handler" );

$socket = socket_create_listen( 1234 );
$r = array( $socket );
$n = NULL;
while( true ) {
        $foo = @socket_select( $r, $n, $n, NULL );
        echo "select returned '$foo'\\\n";
}
?>

And execution:

ammon@morbo:~$ php sigtest.php
received sig #2
select returned ''

Which is precisely the desired behavior.

I don’t know what the performance hit for turning ticks on is, I haven’t had time to research this. But I can confirm that by declaring ticks globally, it does work in an OO environment as well:

<?
declare(ticks=1);

class signal_tester {
    function __construct() {
        pcntl_signal( SIGINT, array(&$this,"sig_handler") );
        $this->start();
    }

    function sig_handler($signo) {
        echo "received sig #$signo\\\n";
    }

    function start() {
        $socket = socket_create_listen( 1234 );
        $r = array( $socket );
        $n = NULL;
        while( true ) {
            $foo = @socket_select( $r, $n, $n, NULL );
            echo "select returned '$foo'\\\n";
        }
    }
}

$test = new signal_tester();
?>

Executing and hitting ^C:

ammon@morbo:~$ php sigtest.php
received sig #2
select returned ''

After a few minutes of largely unscientific testing, it appears that turning ticks on globally costs a whopping 4 bytes of ram and causes the script to occasionally consume more cpu than the top process I used to monitor it. So… at first glance the cost is pretty negligible and all I can say is that if you ever need to handle signals (SIGTERM, SIGHUP, etc…) from within a blocking select call in php, it looks like declare ticks is the only option for now.

I did the initial tests in 5.1.6, but can confirm the same behavior in 5.2.5. I don’t know how the behavior is going to be in 5.3, since I don’t run alpha releases on my servers but my gut likes to think that it will continue to work the same for now… and will hopefully not break until 6.0 (when everything else will explode for a few years). Shrug.

This is something that has kept coming back to bite me recently.

When you are setting up public-key authentication on OpenSSH, you must be very careful of file ownerships and permissions. In many stock unix setups, this isn’t a problem. But in any environment where you are relying on a lot of group access to files, it is easy to slip up and earn yourself a system that will silently fail to authenticate (unless you turn on debug level verbosity).

  1. The private key must be readable only by the user initiating the connection.
  2. The authorized_hosts file must be writable only by the account accepting the connection.

Sounds simple enough, ne?

The real trick is that group write permission anywhere up the directory tree can render these precautions meaningless. Who cares if I can’t see into .ssh in your home directory if I can manipulate your home dir itself?

  1. $HOME and $HOME/.ssh must be locked down on the destination host.

A general good rule of thumb for permissions might be something like this:

ammon@farnsworth:~$ chmod 755 .
ammon@farnsworth:~$ chmod 700 .ssh
ammon@farnsworth:~$ chmod 600 .ssh/authorized_keys

Obviously, this gets kind of tricksy if you want to do something like allow SCP file transfers to the Apache user on a system… and their home dir is /var/www… and your web developers have group write access to this dir.

In situations like that, you have two options. First, you could disable the permissions checks (by turning off StrictModes in the sshd_config), but that’s not advisable. Second, you could make a separate home dir for the apache user with the restrictions in a place where they won’t interfere with anyone’s work.

For the longest time, I have been suffering with problems of changes whitespace rendering SVN diffs useless.

Sometimes it’s the spaces vs tabs issue. Sometimes it’s file line endings (silly Windows-only editors and their CRLF). And sometimes it’s just people adjusting whitespace arbitrarily on lines (like adding spaces around parens or leaving spaces at the end of lines, etc…).

Regardless of the individual manifestation, it’s a silly problem, but one that causes more than its share of tears among developers everywhere.

Perhaps the easiest and smartest solution is to browbeat your co-developers into compliance. Force people to use editors that preserve line endings, force them to strip trailing whitespace and conform to a universal standard of indentation, etc… but it’s not always the nicest or most reliable solution. People will make mistakes, even if it’s only once a month… going over that diff might cost you an hour to figure out what had actually changed.

There are a few other solutions out there. They’re not new, and they’re not for everyone… but they can be phenomenally helpful at times. I’ll go over the two simplest ones.

dos2unix

Ever gotten a diff that reads like this?

ammon@binky:~/test$ svn diff one
Index: one
===================================================================
--- one (revision 2)
+++ one (working copy)
@@ -1,11 +1,11 @@
-One is the loneliest number that you'll ever do
-Two can be as bad as one
-It's the loneliest number since the number one
-
-No is the saddest experience you'll ever know
-Yes, it's the saddest experience you'll ever know
-`Cause one is the loneliest number that you'll ever do
-One is the loneliest number, worse than two
-
-It's just no good anymore since she went away
-Now I spend my time just making rhymes of yesterday
+One is the loneliest number that you'll ever do
+Two can be as bad as one
+It's the loneliest number since the number one
+
+No is the saddest experience you'll ever know
+Yes, it's the saddest experience you'll ever know
+'cause one is the loneliest number that you'll ever do
+One is the loneliest number, worse than two
+
+It's just no good anymore since she went away
+Now I spend my time just making rhymes of yesterday

This is what happens when something changes the line endings of a file. In this case, the original file was created with LF endings and was then edited slightly by an application that converted them to CRLF.

Now… if this were a 1000 line perl script in stead of an 11 line lyrics snippet… it would be soulcrushingly difficult to find the one actual change in the file.

Most unix distros have at their disposal the dos2unix / unix2dos utilities. On Red Hat, you can yum install dos2unix to get them. On Debian/Ubuntu, you can apt-get install tofrodos. I don’t have any other unices lying around at present to check on, but you can always just get the source at http://www.thefreecountry.com/tofrodos.

ammon@binky:~/test$ dos2unix one
ammon@binky:~/test$ svn diff one
Index: one
===================================================================
--- one (revision 2)
+++ one (working copy)
@@ -4,7 +4,7 @@

 No is the saddest experience you'll ever know
 Yes, it's the saddest experience you'll ever know
-`Cause one is the loneliest number that you'll ever do
+`cause one is the loneliest number that you'll ever do
 One is the loneliest number, worse than two

 It's just no good anymore since she went away

Much easier to figure out what has changed this way.

For extra credit, look into the svn:eol-style property. Set this on files as you commit them – or just use autoprops to do the dirty work for you…

diff-cmd

Of course, some times it’s not line endings. Sometimes the problem is random meaningless whitespace changes. Maybe somebody used an editor that auto-indents with spaces when the file was already indented with tabs, etc…

Subversion allows you to specify an alternate command to use to generate your diffs (in stead of relying on svn’s internal diff generation).

ammon@binky:~/test$ svn diff ----diff-cmd /usr/bin/diff -x -w one
Index: one
===================================================================
7c7
< `Cause one is the loneliest number that you'll ever do
---
> `cause one is the loneliest number that you'll ever do

But what if (for some bizarre reason) you don’t care about the case of letters?

ammon@binky:~/test$ svn diff ----diff-cmd /usr/bin/diff -x -iw one
Index: one
===================================================================

If you always want to use your custom diff utility you can set it in your runtime config to save yourself the hassle of having to type it manually each time.

For those using TortoiseSVN, you can always just specify graphical diff/merge utils to use in stead of Tortoise’s builtin ones. Personally, I’m a big fan of WinMerge, but there are several other good ones out there.

Last week week, I noticed a strange problem with a project I am working on. The SWF worked fine on XP, Linux, Vista, and OSX. It worked under Firefox, Opera, and Safari. It… loaded under IE7, and then just sort of sat there and pretended that the web services it was trying to call were broken. After poking things for a bit, I sent an email to the Flashcoders mailing list:

I have a swf that is being loaded off of an https server. As it fires up, it attempts to call a simple authentication service that lives on the same host. This works fine under Firefox, Opera, and Safari.

However, under IE, it throws an exception:

[IOErrorEvent type="ioError" bubbles=false cancelable=false eventPhase=2 text="Error #2032: Stream Error. URL: https://host/path/script.php?username=allaryin&passwd=hash"]

Obviously, if I just load the url directly into IE, it loads fine. This problem only occurs when flash tries to load the url for me.

When I monitor the query with Charles, it shows that the request is being made correctly and that the information I am expecting is successfully being returned. However, Flash is apparently ignoring the response.

This behavior has been observed on different machines, running both XP and Vista.

Thoughts?

A few days after sending this email, I’d received no response other than an IM from a friend on the list who didn’t know the answer either. So… I resumed consultation of the Google. I dug through ML archives. I read IRC logs. Eventually, I stumbled across mention of a blog post back in 2005 that had addressed a similar problem under IE6. Unfortunately, the site hosting this old blog has ceased to exist/function. So, I found it on the Internet Archive:

http://web.archive.org/web/20070521185428/http://www.gmrweb.net/2005/08/18/flash-remoting-https-internet-explorer/

The post mentions a few potential solutions to the problem such as doing some http header management in Apache. I tried the suggested changes (in the PHP, I didn’t have the access/desire to tweak Apache at the time):

header("Expires: " . date("D, d M Y H:i:s", 0) . " GMT");
header("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");
header("Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0");

This didn’t have any effect on my particular problem. Charles showed that the requested headers were being sent correctly, but Flash + IE7 + HTTPS still failed to talk to my web service.

So, I poked around the net some more without coming up with any helpful solutions to the problem. Returning to the archived blog post, I read the comments and saw another solution proposed. Someone said that simply sending an empty pragma header seemed to help in cases where IE was having difficulty dealing with PHP sessions over HTTPS. Specifically, the pragma header needed to be flushed after the session had started.

And we were using a session variable…

session_start();
header("Pragma: ");

And it works now.

So… yeah. Hooray for obscure comments on wayback machine archived blog posts addressing a similar problem ;)

Today I’ve discovered an interesting bug.

Big application we’re working on over here (a Flex/AS3 rewrite and feature upgrade for the CMYK Books layout application) works fine in every browser we’ve thrown at it. With one interesting caveat.

It doesn’t play nice with Opera 9 (haven’t tried older versions, come to think of it…).

The application loads just fine. If you point your browser directly at the SWF file, it does everything it’s supposed to.

But, if you load the SWF from within an HTML page (ie, the normal way), it starts to be problematic. Specifically, it starts sending bogus HTTP headers. These headers confuse the web services (since IIS detects that the’re being sent an ill-formed request and refuses to pass it on for soap parsage). What you wind up with is a page that loads normally and takes you to a login form that doesn’t do anything when you click the button.

Specifically, what happens is the application sends two Referer headers. One containing the URL of the HTML file and one containing the URL of the SWF. If the SWF isn’t loaded from within an HTML file, it only sends the one referrer (the SWF’s location), and IIS recognizes the valid request and everything works.

So… how to fix it? Not entirely sure yet. None of the options that I’ve been able to see/change seem to have any effect. But this is a pretty big strike against Opera IMO. I’m a fan of the application, it’s always been good to me when I’ve used it. I’m excited to play with it on my Wii (if I ever get one… mumble). But for it to do such an odd thing… confuses me.