Spurious Sparseness.
I'm STILL working on recovering the dead server. Part of the problem has been a lack of adequate tools. I must say I am surprised at how hard it is to find tools to do the (I would think) relatively simple tasks required.
My latest hurdle was that reiserfsck can produce spurious sparse files when recovering a file system. How spurious? I have several hundred files (in a tree of hundreds of thousands) each claiming to have sizes in the Hundred Terrabyte to Hundred Petabyte range. Normally I would just shrug and tell the various copy programs that can handle sparse files to copy them sparsely. After all, there might be a few blocks of useful info in these files.
Well, that's not possible. Try as I might, I could find no 'sparse-aware' program that could process the allocated blocks in a sparse file in less than a time proportional to the reported size of the sparse file. For multi-Petabyte files, we're talking weeks to months. Each.
So, not being able to even find a tool to copy them, or to even selectively delete them for me (which baffles me -- Find will report the 'sparseness' of a file, but you can't test the value and perform an operation contingent on its size. In fact find seems to have no inequalities whatsoever, and you can't just pipe the info into bash because bash doesn't do floats).
In the end I took three days to find out what I could about linux filesystems and sparse files and how to write a recursive descent walker over a filesystem that can handle randomly-generated symlinks (because reiserfsck generates them too) and I'm now just deleting all these impossible-to-process files with a hand-written perl script. Which kind of scares me as I haven't done a lot of mucking around in linux filesystem internals, and I'm doing recovery work here.
What gives? I expected there to be a VFS API specifically for returning a map of the allocated blocks in a sparse file, but I could find no such thing. Without it, there's not much I can do. So, I am deleting them.
It shouldn't be the end of the world if there IS some important data there, as I'm going to regenerate the entire file system three times, once per pair in the three partitions, and merge the results (which is also going to be fun -- whee!).
So far, I'm almost done the first copy but I expect the 2nd and 3rd will go much faster as I now know what tools I will use to do what.
My latest hurdle was that reiserfsck can produce spurious sparse files when recovering a file system. How spurious? I have several hundred files (in a tree of hundreds of thousands) each claiming to have sizes in the Hundred Terrabyte to Hundred Petabyte range. Normally I would just shrug and tell the various copy programs that can handle sparse files to copy them sparsely. After all, there might be a few blocks of useful info in these files.
Well, that's not possible. Try as I might, I could find no 'sparse-aware' program that could process the allocated blocks in a sparse file in less than a time proportional to the reported size of the sparse file. For multi-Petabyte files, we're talking weeks to months. Each.
So, not being able to even find a tool to copy them, or to even selectively delete them for me (which baffles me -- Find will report the 'sparseness' of a file, but you can't test the value and perform an operation contingent on its size. In fact find seems to have no inequalities whatsoever, and you can't just pipe the info into bash because bash doesn't do floats).
In the end I took three days to find out what I could about linux filesystems and sparse files and how to write a recursive descent walker over a filesystem that can handle randomly-generated symlinks (because reiserfsck generates them too) and I'm now just deleting all these impossible-to-process files with a hand-written perl script. Which kind of scares me as I haven't done a lot of mucking around in linux filesystem internals, and I'm doing recovery work here.
What gives? I expected there to be a VFS API specifically for returning a map of the allocated blocks in a sparse file, but I could find no such thing. Without it, there's not much I can do. So, I am deleting them.
It shouldn't be the end of the world if there IS some important data there, as I'm going to regenerate the entire file system three times, once per pair in the three partitions, and merge the results (which is also going to be fun -- whee!).
So far, I'm almost done the first copy but I expect the 2nd and 3rd will go much faster as I now know what tools I will use to do what.