Skip to main content.
May 27th, 2008

svn get revision

One of the more annoying things about svn is that (to my knowledge), there exists no single simple command to retrieve the revision number from a shell.

What I want:

ammon@hermes:~/repo$ svn info --get-revision .
1234

But of course, nothing like this exists.

Thankfully, svn info's output IS easy enough to parse. You just have to do it your self.

ammon@hermes:~/repo$ svn info | grep Revision | awk -- '{print $2}'
1234

Will give you the revision of your current checkout without the network hit of a call to svn log.

To get the current version of the repo itself (hits the network), add "-r HEAD" to the svn info call:

ammon@hermes:~/repo$ svn info -r HEAD | grep Revision | awk -- '{print $2}'
1280

Of course, svn info also supports outputting info as xml, so you could use that to parse things in a more advanced environment but one where you're still not using the svn api bindings.

Posted by Ammon as play at 12:57 PM EDT

No Comments »

php tail

I have a php script that frequently needs to email me the last few lines of a log file. I can't afford to exec() a binary tail process, so the solution has to be in pure php.

Originally, the files in question never exceeded more than a few thousand lines. Unfortunately, I am encountering cases now where the files are now occasionally 50,000 lines or longer. This causes PHP's memory consumption to explode.

Note: Code snippets provided here are not fully functional standalone shell scripts. The scripts I ran to benchmark the algorithms contain some rudimentary setup logic that is not important here, so has not been included.

My original method:

// tail-file.php
$arr = @file( $fname, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES );
$arr = array_slice($arr, -$lines);
$buf = implode("\n",$arr);

This is easy to understand and is pretty fast, all things considered. Unfortunately, the memory footprint for loading a file into an array is obscene. Loading a 4400 line log file with this method could consume more than 17mb of ram. 50,000 line files easily stressed the 256mb limit I am able to provide the process.

So, the obvious solution to the memory consumption is to avoid loading the entire file at once. What if we kept a rotating list of lines in the file?

// tail-array.php
$arr = array_fill( 0, $lines+1, "\n" );

$fp = fopen($fname, "r");
while( !feof($fp) ) {
    $line = fgets($fp, 4096);
    $arr[] = $line; // faster than array_push()
    array_shift($arr);
}
fclose($fp);
$buf = implode("",$arr);

This method works by keeping the $lines-many most recent lines of the file in an array. Memory consumption remains sane, but the performance hit for performing so many array pushes and shifts is bad. Really bad. With small files, I can't notice any difference between this method and the file() method... but with longer files, it adds up quickly.

Given a 51 line, 4kb file, an average execution ($lines = 20) might look like this:

ammon@zap:~$ time ./tail-file.php a.log>/dev/null

real    0m0.015s
user    0m0.009s
sys     0m0.007s

ammon@zap:~$ time ./tail-array.php a.log>/dev/null

real    0m0.016s
user    0m0.010s
sys     0m0.006s

Comparable enough. But given a 50,004 line (3.3mb) log file:

ammon@zap:~$ time ./tail-file.php b.log >/dev/null                  

real    0m0.079s
user    0m0.058s
sys     0m0.021s

ammon@zap:~$ time ./tail-array.php b.log >/dev/null                 

real    0m0.119s
user    0m0.112s
sys     0m0.007s

The difference becomes quite clear. However... what if my log file grows obscenely large? I've got a 9 million line log file (1.6gb) lying around to test with...

ammon@zap:~$ time ./tail-file.php c.log >/dev/null

real    0m0.015s
user    0m0.008s
sys     0m0.008s

ammon@zap:~$ time ./tail-array.php c.log >/dev/null                 

real    0m19.351s
user    0m18.545s
sys     0m0.803s

The file() method crashes because it can't allocate enough ram to hold a 9 million element array and the array method takes almost 20 seconds to execute. It's slow... but at least it works.

Of course, there are other methods. The one I finally settled on is this:

// tail-seek.php
$fp = fopen($fname, "r");
$lines_read = 0;
if( $fp !== FALSE ) {
    fseek( $fp, 0, SEEK_END );
    $pos = $eof = ftell($fp);
    do {
        --$pos;
        fseek($fp, $pos);
        $c = fgetc($fp);
        if( $c == "\n" )
            $lines_read++;
    } while( $pos> 0 && $lines_read <= $lines );
    $buf = fread($fp, $eof-$pos);
}
fclose($fp);

This method doesn't waste time reading the bulk of the file. It jumps to the end and scans backward until enough newlines have been located. The only problem here is that your average filesystem isn't optimized for reading backwards... but since we're not really reading very much data, it doesn't much matter.

ammon@zap:~$ time ./tail-seek.php a.log >/dev/null

real    0m0.017s
user    0m0.009s
sys     0m0.008s

ammon@zap:~$ time ./tail-seek.php b.log >/dev/null                  

real    0m0.017s
user    0m0.008s
sys     0m0.010s

ammon@zap:~$ time ./tail-seek.php c.log >/dev/null                  

real    0m0.023s
user    0m0.015s
sys     0m0.008s

Performance is a trifle slower on small files, but it's astronomically better on long ones. This is similar to the method used by most unix 'tail' commands, and is the clear winner for actual use in my application.

Of course, it needs a bit of cleanup from the state I've provided it in, and isn't appropriate for all environments... but it's a trifle better than requiring 20 seconds and 20gb of ram to execute ;)

Posted by Ammon as play at 12:08 PM EDT

No Comments »