Warning: This post does not really contain a solution, only a puzzle I just encountered…
In the course of my work, I occasionally have to deal with corefiles. Like… a few gigs worth of corefiles at a time. And I have to transfer them between hosts for analysis and such.
This afternoon (while waiting for our shiny new bonded T1 to be installed) I decided to experiment with compressing the files to determine if I could save myself some time and bandwidth.
I started with a small sample set of cores:
ammon@nibbler:~/corefiles$ ls -l total 31124 -rw-rw-r-- 1 apache apache 46485504 Apr 15 08:13 core-3614-9901.0 -rw-rw-r-- 1 apache apache 53022720 Apr 14 16:30 core-4079-9667.0 -rw-rw-r-- 1 apache apache 48463872 Apr 15 00:50 core-4079-9667.1 -rw-rw-r-- 1 apache apache 46034944 Apr 16 17:51 core-6161-10117.0 -rw-rw-r-- 1 apache apache 150757376 Apr 14 17:14 core-6202-9809.0 -rw-rw-r-- 1 apache apache 48033792 Apr 17 11:14 core-8188-10300.0 ammon@nibbler:~/corefiles$ du -hc * 3.6M core-3614-9901.0 11M core-4079-9667.0 5.4M core-4079-9667.1 2.7M core-6161-10117.0 3.2M core-6202-9809.0 5.6M core-8188-10300.0 31M total
Then, I gzipped them.
ammon@nibbler:~/corefiles$ gzip * ammon@nibbler:~/corefiles$ du -hc * 860K core-3614-9901.0.gz 2.6M core-4079-9667.0.gz 1.4M core-4079-9667.1.gz 656K core-6161-10117.0.gz 872K core-6202-9809.0.gz 1.1M core-8188-10300.0.gz 7.3M total ammon@nibbler:~/corefiles$ gunzip *
31mb to 7.3mb. That’s 76% savings. None too shabby. Perhaps bzip2 can do better? It usually does.
ammon@nibbler:~/corefiles$ bzip2 * ammon@nibbler:~/corefiles$ du -hc * 680K core-3614-9901.0.bz2 1.9M core-4079-9667.0.bz2 1.1M core-4079-9667.1.bz2 536K core-6161-10117.0.bz2 592K core-6202-9809.0.bz2 824K core-8188-10300.0.bz2 5.5M total ammon@nibbler:~/corefiles$ bunzip2 *
Awesome, well. I guess I can live with 82% compression for these files.
But wait… my twitch habit of typing ls while I’m thinking at a terminal fires off…
ammon@nibbler:~/corefiles$ ls -l total 384000 -rw-rw-r-- 1 ammon users 46485504 Apr 15 08:13 core-3614-9901.0 -rw-rw-r-- 1 ammon users 53022720 Apr 14 16:30 core-4079-9667.0 -rw-rw-r-- 1 ammon users 48463872 Apr 15 00:50 core-4079-9667.1 -rw-rw-r-- 1 ammon users 46034944 Apr 16 17:51 core-6161-10117.0 -rw-rw-r-- 1 ammon users 150757376 Apr 14 17:14 core-6202-9809.0 -rw-rw-r-- 1 ammon users 48033792 Apr 17 11:14 core-8188-10300.0
Total 384,000?! When I started, it said 31,124 blocks used. Something fishy’s going on here.
ammon@nibbler:~/corefiles$ du -h 376M .
Meep. 376mb? Up from 31mb?
Unfortunately, I didn’t twitch ls between decompressing the first time and recompressing, so I can’t see if the change happened after the gzip, or after the bzip2.
But wait now. It’s showing these corefiles as weighing in at 45-144mb, each… which matches the original ls output. So why did the original case report such small files? The only changes that occurred in the interim are that the files changed owner (cores were originally dumped by apache group, of which my normal account is a member)… and that I compressed and decompressed the files twice.
…
ammon@nibbler:~/corefiles$ gzip * ammon@nibbler:~/corefiles$ du -h 7.3M . ammon@nibbler:~/corefiles$ gunzip * ammon@nibbler:~/corefiles$ du -h 376M . ammon@nibbler:~/corefiles$ bzip2 * ammon@nibbler:~/corefiles$ du -h 5.5M . ammon@nibbler:~/corefiles$ bunzip2 * ammon@nibbler:~/corefiles$ du -h 376M .
Ok, it’s consistent. So… the savings aren’t actually from 31mb to 5-7mb… they’re from 376mb?
Why would these files have originally been taking up that much space?
I generated a new corefile by launching and then sending SIGSEGV to the same binary that generated the original test files.
...snip... [1]+ Stopped ~/bin/server_d ammon@nibbler:~/newcore$ ps PID TTY TIME CMD 2180 pts/1 00:00:00 bash 23114 pts/1 00:00:00 server_d 23153 pts/1 00:00:00 ps ammon@nibbler:~/newcore$ kill -s SIGSEGV 23114 ammon@nibbler:~/newcore$ fg ~/bin/server_d Segmentation fault (core dumped) ammon@nibbler:~/newcore$ ls -l total 2244 -rw------- 1 ammon users 44601344 Apr 17 13:22 core.23114 ammon@laxare-01-07:~/newcore$ du -h 2.2M .
Ok, so it happened again. Freshly dumped core is reporting WAY less size than it should. 44,601,344 bytes does not 2.2mb make.
ammon@nibbler:~/newcore$ touch core.23114 ammon@nibbler:~/newcore$ ls -l total 2244 -rw------- 1 ammon areae 44601344 Apr 17 13:33 core.23114 ammon@nibbler:~/newcore$ du -h 2.2M .
And, just touching the file doesn’t do anything… But running gzip on it does.
ammon@nibbler:~/newcore$ gzip core.23114 ammon@nibbler:~/newcore$ ls -l total 532 -rw------- 1 ammon areae 538074 Apr 17 13:33 core.23114.gz ammon@nibbler:~/newcore$ du -h 536K . ammon@nibbler:~/newcore$ gunzip core.23114.gz ammon@nibbler:~/newcore$ ls -l total 43604 -rw------- 1 ammon areae 44601344 Apr 17 13:33 core.23114 ammon@nibbler:~/newcore$ du -h 43M .
I grabbed another similar core, generated on another server, and experimented with it.
ammon@leela:~$ cp /var/server/corefiles/core-8224-9976.0 . ammon@leela:~$ ls -l total 2564 -rw-rw-r-- 1 ammon users 45404160 Apr 17 13:40 core-8224-9976.0 ammon@leela:~$ du -h core-8224-9976.0 2.6M core-8224-9976.0
So again, the same discrepancy. Obviously, copying the file isn’t enough to create the problem. Are the files actually growing? I made 3 copies of this core and checked what df had to say.
ammon@leela:~$ ls -l total 10256 -rw-rw-r-- 1 ammon users 45404160 Apr 17 13:40 core-8224-9976.0 -rw-rw-r-- 1 ammon users 45404160 Apr 17 13:45 core-8224-9976.1 -rw-rw-r-- 1 ammon users 45404160 Apr 17 13:45 core-8224-9976.2 -rw-rw-r-- 1 ammon users 45404160 Apr 17 13:45 core-8224-9976.3 ammon@leela:~$ df -h . Filesystem Size Used Avail Use% Mounted on /dev/md6 3.9G 83M 3.6G 3% /m1
Now to apply compressions to the files. Hey, I never compared plain old ‘compress’, let’s add that to the mix.
I doubt it’ll be better than gzip, but may as well check while I’m here.
ammon@leela:~$ gzip core-8224-9976.1 ammon@leela:~$ bzip2 core-8224-9976.2 ammon@leela:~$ compress core-8224-9976.3 ammon@leela:~$ du -h * 2.6M core-8224-9976.0 668K core-8224-9976.1.gz 512K core-8224-9976.2.bz2 888K core-8224-9976.3.Z ammon@leela:~$ ls -l total 4632 -rw-rw-r-- 1 ammon users 45404160 Apr 17 13:40 core-8224-9976.0 -rw-rw-r-- 1 ammon users 676842 Apr 17 13:45 core-8224-9976.1.gz -rw-rw-r-- 1 ammon users 519869 Apr 17 13:45 core-8224-9976.2.bz2 -rw-rw-r-- 1 ammon users 901821 Apr 17 13:45 core-8224-9976.3.Z ammon@leela:~$ df -h . Filesystem Size Used Avail Use% Mounted on /dev/md6 3.9G 78M 3.6G 3% /m1 ammon@leela:~$ gunzip core-8224-9976.1.gz ammon@leela:~$ bunzip2 core-8224-9976.2.bz2 ammon@leela:~$ uncompress core-8224-9976.3.Z ammon@leela:~$ ls -l total 135728 -rw-rw-r-- 1 ammon areae 45404160 Apr 17 13:40 core-8224-9976.0 -rw-rw-r-- 1 ammon areae 45404160 Apr 17 13:45 core-8224-9976.1 -rw-rw-r-- 1 ammon areae 45404160 Apr 17 13:45 core-8224-9976.2 -rw-rw-r-- 1 ammon areae 45404160 Apr 17 13:45 core-8224-9976.3 ammon@leela:~$ du -h * 2.6M core-8224-9976.0 44M core-8224-9976.1 44M core-8224-9976.2 44M core-8224-9976.3 ammon@leela:~$ cmp core-8224-9976.0 core-8224-9976.1 ammon@leela:~$ df -h . Filesystem Size Used Avail Use% Mounted on /dev/md6 3.9G 206M 3.5G 6% /m1
Since, cmp returned nothing, the files really are the same… but df says that I’m not imagining, the files do take up “more” space after the compression-decompression cycle.
So there’s something strange about the way core files are written to the filesystem the first time.
Manually rewriting the data in the file byte-by-byte seems to update whatever information was being misinterpreted initially. I suspect this is why decompressing the file has the effect buy copying does not.
ammon@leela:~$ dd if=core-8224-9976.0 of=core-8224-9976.4 88680+0 records in 88680+0 records out 45404160 bytes (45 MB) copied, 0.373917 seconds, 121 MB/s ammon@leela:~$ du -h core-8224-9976.4 44M core-8224-9976.4
Might this be an artifact of the particular filesystem in question? Oh well.
I never discovered an answer to this puzzle and don’t really care enough to research/experiment any further. It’s obvious that the files really are occupying 45mb of disk, not 2.2mb… the real question is why du and df insist on misreporting until the file is rewritten?
They’re called sparse files — basically files with “blanks” in the middle which the filesystem does not bother to store.
GNU Tar can handle sparse files, so you may have better luck if you tar the files first, then compress them. I don’t know if either of bzip2 or gzip support sparse files.
The core dumps were written to disk as sparse files. GNU cp detects and recreates sparse files but bzip2 and gzip do not.