Warning: This post does not really contain a solution, only a puzzle I just encountered…

In the course of my work, I occasionally have to deal with corefiles. Like… a few gigs worth of corefiles at a time. And I have to transfer them between hosts for analysis and such.

This afternoon (while waiting for our shiny new bonded T1 to be installed) I decided to experiment with compressing the files to determine if I could save myself some time and bandwidth.

I started with a small sample set of cores:

ammon@nibbler:~/corefiles$ ls -l
total 31124
-rw-rw-r-- 1 apache apache  46485504 Apr 15 08:13 core-3614-9901.0
-rw-rw-r-- 1 apache apache  53022720 Apr 14 16:30 core-4079-9667.0
-rw-rw-r-- 1 apache apache  48463872 Apr 15 00:50 core-4079-9667.1
-rw-rw-r-- 1 apache apache  46034944 Apr 16 17:51 core-6161-10117.0
-rw-rw-r-- 1 apache apache 150757376 Apr 14 17:14 core-6202-9809.0
-rw-rw-r-- 1 apache apache  48033792 Apr 17 11:14 core-8188-10300.0
ammon@nibbler:~/corefiles$ du -hc *
3.6M    core-3614-9901.0
11M     core-4079-9667.0
5.4M    core-4079-9667.1
2.7M    core-6161-10117.0
3.2M    core-6202-9809.0
5.6M    core-8188-10300.0
31M     total

Then, I gzipped them.

ammon@nibbler:~/corefiles$ gzip *
ammon@nibbler:~/corefiles$ du -hc *
860K    core-3614-9901.0.gz
2.6M    core-4079-9667.0.gz
1.4M    core-4079-9667.1.gz
656K    core-6161-10117.0.gz
872K    core-6202-9809.0.gz
1.1M    core-8188-10300.0.gz
7.3M    total
ammon@nibbler:~/corefiles$ gunzip *

31mb to 7.3mb. That’s 76% savings. None too shabby. Perhaps bzip2 can do better? It usually does.

ammon@nibbler:~/corefiles$ bzip2 *
ammon@nibbler:~/corefiles$ du -hc *
680K    core-3614-9901.0.bz2
1.9M    core-4079-9667.0.bz2
1.1M    core-4079-9667.1.bz2
536K    core-6161-10117.0.bz2
592K    core-6202-9809.0.bz2
824K    core-8188-10300.0.bz2
5.5M    total
ammon@nibbler:~/corefiles$ bunzip2 *

Awesome, well. I guess I can live with 82% compression for these files.

But wait… my twitch habit of typing ls while I’m thinking at a terminal fires off…

ammon@nibbler:~/corefiles$ ls -l
total 384000
-rw-rw-r-- 1 ammon users  46485504 Apr 15 08:13 core-3614-9901.0
-rw-rw-r-- 1 ammon users  53022720 Apr 14 16:30 core-4079-9667.0
-rw-rw-r-- 1 ammon users  48463872 Apr 15 00:50 core-4079-9667.1
-rw-rw-r-- 1 ammon users  46034944 Apr 16 17:51 core-6161-10117.0
-rw-rw-r-- 1 ammon users 150757376 Apr 14 17:14 core-6202-9809.0
-rw-rw-r-- 1 ammon users  48033792 Apr 17 11:14 core-8188-10300.0

Total 384,000?! When I started, it said 31,124 blocks used. Something fishy’s going on here.

ammon@nibbler:~/corefiles$ du -h
376M    .

Meep. 376mb? Up from 31mb?

Unfortunately, I didn’t twitch ls between decompressing the first time and recompressing, so I can’t see if the change happened after the gzip, or after the bzip2.

But wait now. It’s showing these corefiles as weighing in at 45-144mb, each… which matches the original ls output. So why did the original case report such small files? The only changes that occurred in the interim are that the files changed owner (cores were originally dumped by apache group, of which my normal account is a member)… and that I compressed and decompressed the files twice.

ammon@nibbler:~/corefiles$ gzip *
ammon@nibbler:~/corefiles$ du -h
7.3M    .

ammon@nibbler:~/corefiles$ gunzip *
ammon@nibbler:~/corefiles$ du -h
376M    .

ammon@nibbler:~/corefiles$ bzip2 *
ammon@nibbler:~/corefiles$ du -h
5.5M    .

ammon@nibbler:~/corefiles$ bunzip2 *
ammon@nibbler:~/corefiles$ du -h
376M    .

Ok, it’s consistent. So… the savings aren’t actually from 31mb to 5-7mb… they’re from 376mb?

Why would these files have originally been taking up that much space?

I generated a new corefile by launching and then sending SIGSEGV to the same binary that generated the original test files.

...snip...
[1]+  Stopped                 ~/bin/server_d

ammon@nibbler:~/newcore$ ps
  PID TTY          TIME CMD
 2180 pts/1    00:00:00 bash
23114 pts/1    00:00:00 server_d
23153 pts/1    00:00:00 ps

ammon@nibbler:~/newcore$ kill -s SIGSEGV 23114
ammon@nibbler:~/newcore$ fg
~/bin/server_d
Segmentation fault (core dumped)
ammon@nibbler:~/newcore$ ls -l
total 2244
-rw------- 1 ammon users 44601344 Apr 17 13:22 core.23114

ammon@laxare-01-07:~/newcore$ du -h
2.2M    .

Ok, so it happened again. Freshly dumped core is reporting WAY less size than it should. 44,601,344 bytes does not 2.2mb make.

ammon@nibbler:~/newcore$ touch core.23114
ammon@nibbler:~/newcore$ ls -l
total 2244
-rw------- 1 ammon areae 44601344 Apr 17 13:33 core.23114

ammon@nibbler:~/newcore$ du -h
2.2M    .

And, just touching the file doesn’t do anything… But running gzip on it does.

ammon@nibbler:~/newcore$ gzip core.23114
ammon@nibbler:~/newcore$ ls -l
total 532
-rw------- 1 ammon areae 538074 Apr 17 13:33 core.23114.gz

ammon@nibbler:~/newcore$ du -h
536K    .

ammon@nibbler:~/newcore$ gunzip core.23114.gz
ammon@nibbler:~/newcore$ ls -l
total 43604
-rw------- 1 ammon areae 44601344 Apr 17 13:33 core.23114

ammon@nibbler:~/newcore$ du -h
43M     .

I grabbed another similar core, generated on another server, and experimented with it.

ammon@leela:~$ cp /var/server/corefiles/core-8224-9976.0 .
ammon@leela:~$ ls -l
total 2564
-rw-rw-r-- 1 ammon users 45404160 Apr 17 13:40 core-8224-9976.0

ammon@leela:~$ du -h core-8224-9976.0
2.6M    core-8224-9976.0

So again, the same discrepancy. Obviously, copying the file isn’t enough to create the problem. Are the files actually growing? I made 3 copies of this core and checked what df had to say.

ammon@leela:~$ ls -l
total 10256
-rw-rw-r-- 1 ammon users 45404160 Apr 17 13:40 core-8224-9976.0
-rw-rw-r-- 1 ammon users 45404160 Apr 17 13:45 core-8224-9976.1
-rw-rw-r-- 1 ammon users 45404160 Apr 17 13:45 core-8224-9976.2
-rw-rw-r-- 1 ammon users 45404160 Apr 17 13:45 core-8224-9976.3

ammon@leela:~$ df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/md6              3.9G   83M  3.6G   3% /m1

Now to apply compressions to the files. Hey, I never compared plain old ‘compress’, let’s add that to the mix. :P I doubt it’ll be better than gzip, but may as well check while I’m here.

ammon@leela:~$ gzip core-8224-9976.1
ammon@leela:~$ bzip2 core-8224-9976.2
ammon@leela:~$ compress core-8224-9976.3
ammon@leela:~$ du -h *
2.6M    core-8224-9976.0
668K    core-8224-9976.1.gz
512K    core-8224-9976.2.bz2
888K    core-8224-9976.3.Z

ammon@leela:~$ ls -l
total 4632
-rw-rw-r-- 1 ammon users 45404160 Apr 17 13:40 core-8224-9976.0
-rw-rw-r-- 1 ammon users   676842 Apr 17 13:45 core-8224-9976.1.gz
-rw-rw-r-- 1 ammon users   519869 Apr 17 13:45 core-8224-9976.2.bz2
-rw-rw-r-- 1 ammon users   901821 Apr 17 13:45 core-8224-9976.3.Z

ammon@leela:~$ df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/md6              3.9G   78M  3.6G   3% /m1

ammon@leela:~$ gunzip core-8224-9976.1.gz
ammon@leela:~$ bunzip2 core-8224-9976.2.bz2
ammon@leela:~$ uncompress core-8224-9976.3.Z

ammon@leela:~$ ls -l
total 135728
-rw-rw-r-- 1 ammon areae 45404160 Apr 17 13:40 core-8224-9976.0
-rw-rw-r-- 1 ammon areae 45404160 Apr 17 13:45 core-8224-9976.1
-rw-rw-r-- 1 ammon areae 45404160 Apr 17 13:45 core-8224-9976.2
-rw-rw-r-- 1 ammon areae 45404160 Apr 17 13:45 core-8224-9976.3

ammon@leela:~$ du -h *
2.6M    core-8224-9976.0
44M     core-8224-9976.1
44M     core-8224-9976.2
44M     core-8224-9976.3

ammon@leela:~$ cmp core-8224-9976.0 core-8224-9976.1

ammon@leela:~$ df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/md6              3.9G  206M  3.5G   6% /m1

Since, cmp returned nothing, the files really are the same… but df says that I’m not imagining, the files do take up “more” space after the compression-decompression cycle.

So there’s something strange about the way core files are written to the filesystem the first time.

Manually rewriting the data in the file byte-by-byte seems to update whatever information was being misinterpreted initially. I suspect this is why decompressing the file has the effect buy copying does not.

ammon@leela:~$ dd if=core-8224-9976.0 of=core-8224-9976.4
88680+0 records in
88680+0 records out
45404160 bytes (45 MB) copied, 0.373917 seconds, 121 MB/s

ammon@leela:~$ du -h core-8224-9976.4
44M     core-8224-9976.4

Might this be an artifact of the particular filesystem in question? Oh well.

I never discovered an answer to this puzzle and don’t really care enough to research/experiment any further. It’s obvious that the files really are occupying 45mb of disk, not 2.2mb… the real question is why du and df insist on misreporting until the file is rewritten?

2 Responses to “magic exploding corefiles”

  1. They’re called sparse files — basically files with “blanks” in the middle which the filesystem does not bother to store.

    GNU Tar can handle sparse files, so you may have better luck if you tar the files first, then compress them. I don’t know if either of bzip2 or gzip support sparse files.

  2. David Bremner says:

    The core dumps were written to disk as sparse files. GNU cp detects and recreates sparse files but bzip2 and gzip do not.

Leave a Reply