swestrup: (Default)
[personal profile] swestrup
I'm STILL working on recovering the dead server. Part of the problem has been a lack of adequate tools. I must say I am surprised at how hard it is to find tools to do the (I would think) relatively simple tasks required.

My latest hurdle was that reiserfsck can produce spurious sparse files when recovering a file system. How spurious? I have several hundred files (in a tree of hundreds of thousands) each claiming to have sizes in the Hundred Terrabyte to Hundred Petabyte range. Normally I would just shrug and tell the various copy programs that can handle sparse files to copy them sparsely. After all, there might be a few blocks of useful info in these files.

Well, that's not possible. Try as I might, I could find no 'sparse-aware' program that could process the allocated blocks in a sparse file in less than a time proportional to the reported size of the sparse file. For multi-Petabyte files, we're talking weeks to months. Each.

So, not being able to even find a tool to copy them, or to even selectively delete them for me (which baffles me -- Find will report the 'sparseness' of a file, but you can't test the value and perform an operation contingent on its size. In fact find seems to have no inequalities whatsoever, and you can't just pipe the info into bash because bash doesn't do floats).

In the end I took three days to find out what I could about linux filesystems and sparse files and how to write a recursive descent walker over a filesystem that can handle randomly-generated symlinks (because reiserfsck generates them too) and I'm now just deleting all these impossible-to-process files with a hand-written perl script. Which kind of scares me as I haven't done a lot of mucking around in linux filesystem internals, and I'm doing recovery work here.

What gives? I expected there to be a VFS API specifically for returning a map of the allocated blocks in a sparse file, but I could find no such thing. Without it, there's not much I can do. So, I am deleting them.

It shouldn't be the end of the world if there IS some important data there, as I'm going to regenerate the entire file system three times, once per pair in the three partitions, and merge the results (which is also going to be fun -- whee!).

So far, I'm almost done the first copy but I expect the 2nd and 3rd will go much faster as I now know what tools I will use to do what.

Date: 2009-06-22 03:46 pm (UTC)
From: [identity profile] sps.livejournal.com
As an alternative to deletion you could in principle rename them somewhere, I guess. Yeah, sparse files is in inadequately supported concept.

Date: 2009-06-22 04:45 pm (UTC)
From: [identity profile] cpirate.livejournal.com
I don't know if it's what you're meaning, and I suppose it's too late regardless, but find does support some inequalities. Just to the extent of comparing if something is greater or less than a particular number though, e.g. "find /path -size +1G" returns all files bigger than 1GB, and "find /path -size -1k" returns files smaller than 1 KB. And you can do the usual find-style things on a match, e.g. -delete. You don't get anything useful like checking if the size used on disk is less than the size reported, but checking if a file is >10TB should find you the troublesome ones.

But yes, sparse files are an abstraction that sometimes leaks too much, and other times not enough.

Date: 2009-06-28 12:24 pm (UTC)
From: [identity profile] hendrikboom.livejournal.com
Perhaps after the recovery you should consider moving to a different file system, such as ext3.

Date: 2009-06-28 05:09 pm (UTC)
From: (Anonymous)
I started a thread a long time ago on I think linux-user about file systems. Lots of people recommended file systems, but the only thing everyone seemed to agree on was that ext3 was stable. That's the primary reason I'm using ext3.

January 2017

S M T W T F S
1234567
891011121314
15161718192021
22232425262728
293031    

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Feb. 14th, 2026 08:05 am
Powered by Dreamwidth Studios