General NAS-Central Forums

Welcome to the NAS community
It is currently Sat May 27, 2017 4:19 am

All times are UTC




Post new topic Reply to topic  [ 6 posts ] 
Author Message
 Post subject: Double Disk Failure
PostPosted: Tue Feb 23, 2016 1:53 pm 
Offline

Joined: Tue Feb 23, 2016 1:25 pm
Posts: 4
Hi All,

So I know I'm a little late to the party (like... about 3 or 4 years late, it would seem!!!), but I just stumbled on this site hunting for an answer to a problem with my 5-big (v1).

A while ago (read: 9 months or so - I've just not had the time to do anything with it since) my 5Big dropped a disk. Fine, I thought, and I swapped it out with another one I'd acquired from somewhere else (NOT an official LaCie one, but that doesn't seem to matter, if other threads are to be believed). Sure enough, it started to chunter away, rebuilding happily.

Unfortunately, about 2 hours before the reconstruction was estimated to finish, it dropped a second disk. I'm not at all surprised by this, as the 5 disks in it were original ones provided by LaCie with their own extra sticker on, and I was gifted the NAS by a former company at one point (don't ask, just don't). It seems to me like they were part of the same batch and probably just dropped due to the stress of the reconstruction.

Anyway, I've got a bunch more 1TB disks to use in it (again, donations - handy working for storage companies, eh?). I'm pretty sure that the reconstruction will have covered all the data (it wasn't completely full, and assuming some kind of defragging or logic when writing to the disks in the first place, my data should have all been reconstructed already - pretty sure I had about 1TB of free space, if not more, across the disks, and given that it had a 20 hour rebuild time, I'd say that with 2 hours left to go, it's likely hit all, or at least almost all, of the data reconstruction, and was 'just' reconstructing empty space.

Now I'm not 100% sure the disks have actually failed, as I dropped the first 'failed' one into a SATA caddy after it was replaced, and it showed me the partitioning table etc. I reckon it's probably an impending failure or some kind of major SMART error set, but again, I can't tell (and I'm probably rambling here at this point, so apologies for that).

Questions are:

1) Is there a simple way to get a command line on the 5Big v1, and if so, has anyone written a definitive guide on how to do this without faffing a large amount?
2) Assuming the disks are actually 'usable' still (ie not totally catastrophically dead), is there any way to 'force' the failed disk to appear as good for just long enough (read: 2 hours or so) to get the reconstruction of the 'new' disk completed?
3) If the answer to 2) above is 'no', is there a way to force the 'new' disk (ie the one that was rebuilding itself into the array) to show as 'rebuilt' in order to assess the level of data loss (if any) that I've actually sustained?

If there's really no way of doing either 2) or 3), then I can live with the data loss I guess - I've not needed any of it for 9 months or more, so I figure I probably never will, but it'd be nice to get back the data on there anyway - it's mostly backups of other data, and my old media collection, but all the same recovery would be nice.

Thanks in advance,

--DJUnreal


Top
 Profile  
 
 Post subject: Re: Double Disk Failure
PostPosted: Tue Feb 23, 2016 7:38 pm 
Offline

Joined: Mon Jun 16, 2008 10:45 am
Posts: 5995
DJUnreal wrote:
I'm pretty sure that the reconstruction will have covered all the data (it wasn't completely full, and assuming some kind of defragging or logic when writing to the disks in the first place, my data should have all been reconstructed already
Unfortunately that's not how it works. The raid level is below the filesystem, and so it doesn't know which sectors are files, and which sectors are empty space. When resyncing it will just start at sector 1, and go on until all sectors are covered.
And no, the filessystem doesn't fill up the space linear. It will put the files on 'random' places, to prevent fragmentation. When the resyncing was done for 90%, you can assume 10% of your files are not yet covered.

Quote:
Now I'm not 100% sure the disks have actually failed, as I dropped the first 'failed' one into a SATA caddy after it was replaced, and it showed me the partitioning table etc.
It failed because it got an I/O error while reading the disk. AFAIK that is the only reason why Linux software raid drops a disk. It is possible that there is only a single bad sector. In normal use you won't notice that.

Quote:
1) Is there a simple way to get a command line on the 5Big v1, and if so, has anyone written a definitive guide on how to do this without faffing a large amount?
You can try the tricks for the 2Big Network.
Quote:
2) Assuming the disks are actually 'usable' still (ie not totally catastrophically dead), is there any way to 'force' the failed disk to appear as good for just long enough (read: 2 hours or so) to get the reconstruction of the 'new' disk completed?
Yes and no. You can force a failed disk back in the array, but AFAIK you can't continue the resync at 90% then. You'll have to start over again.
The trick is to rebuild the array with exact the same settings, while telling the system that the array is clean. More info here.
Quote:
3) If the answer to 2) above is 'no', is there a way to force the 'new' disk (ie the one that was rebuilding itself into the array) to show as 'rebuilt' in order to assess the level of data loss (if any) that I've actually sustained?
Yes, the same way. But as 10% of the surface is not rebuild yet, odd are that it won't mount. It's like removing 10% of all shelves in a store, and hope you'll still be able to find your stuff without stumbling.


Top
 Profile  
 
 Post subject: Re: Double Disk Failure
PostPosted: Tue Feb 23, 2016 10:30 pm 
Offline

Joined: Tue Feb 23, 2016 1:25 pm
Posts: 4
Thanks Mijzelf, I'm amazed anyone still reads this forum!

I work with hardware RAID all the time, but our systems tend to try to fill disks contiguously where possible, and do all kinds of clever optimisation that effectively reclaims empty space at the start of the drive and puts new data there if at all possible. I guess software RAID just isn't that clever...

With any luck you're right and it's just a sector that's dropped somewhere. I'll certainly see if, once I get a CLI running (and if the 2Big instructions work, I'll report back here and confirm it), I can force it to show good long enough to do another rebuild, although I guess there's a good chance that it'll die again when it tries to read the same sectors if it's already dropped them once...

It's certainly given me a starting point, so thank-you for that!!!

--DJUnreal


Top
 Profile  
 
 Post subject: Re: Double Disk Failure
PostPosted: Wed Feb 24, 2016 9:33 pm 
Offline

Joined: Mon Jun 16, 2008 10:45 am
Posts: 5995
DJUnreal wrote:
I work with hardware RAID all the time, but our systems tend to try to fill disks contiguously where possible, and do all kinds of clever optimisation that effectively reclaims empty space at the start of the drive and puts new data there if at all possible. I guess software RAID just isn't that clever...
I think you are mixing up the functions of a filesystem and a raid array. A raid array exposes a virtual disk, composed from several physical disks. It has no control on how that space is used.
A filesystem puts files on that surface, and decides where and how to place them.


Top
 Profile  
 
 Post subject: Re: Double Disk Failure
PostPosted: Thu Feb 25, 2016 12:28 am 
Offline

Joined: Tue Feb 23, 2016 1:25 pm
Posts: 4
Mijzelf wrote:
DJUnreal wrote:
I work with hardware RAID all the time, but our systems tend to try to fill disks contiguously where possible, and do all kinds of clever optimisation that effectively reclaims empty space at the start of the drive and puts new data there if at all possible. I guess software RAID just isn't that clever...
I think you are mixing up the functions of a filesystem and a raid array. A raid array exposes a virtual disk, composed from several physical disks. It has no control on how that space is used.
A filesystem puts files on that surface, and decides where and how to place them.

Actually, I'm afraid you're wrong there. The kind of systems I work with do a lot of dynamic adjustment at the hardware level. But then, I'm talking about enterprise class solutions which are smarter than the average bear, and are designed from the ground up with performance as much in mind as resilience.

When you get to the level where you're shifting data at block levels (yes, we do it at the block level, not at the file level) between different tiers on demand, and creating dynamic volumes stretched across multiple RAID groups, the OS and filesystem are so far detached from the physical block storage that the underlying hardware has to control this kind of stuff itself. It's why we have technologies such as zero page reclaimation and so on.

But anyway, that aside, I'm pleased to report a couple of good things:

- The 2Big CLI guide worked fine (once I realised that the link in it to the utilities file didn't work, but googled the filename and found the same howto on the creator's blog, complete with a link to the file that did work)
- The second disk that dropped appears to have actually been the first disk again, so it's far less catastrophic than I thought. I've still got to persuade it to rebuild (and I'm unsure if it's even faulty - the NAS now seems to have decided it's fine even though it doesn't seem to want to re-add it to the array), but the data is all intact and can be copied off (which I'm doing before I take a step that's fatal, such as poking around with the RAID config at the CLI level, or blowing away the config completely and rebuilding it from scratch), so it's not all bad.

Thanks again for the tips :)

--DJUnreal


Top
 Profile  
 
 Post subject: Re: Double Disk Failure
PostPosted: Thu Feb 25, 2016 6:36 pm 
Offline

Joined: Tue Feb 23, 2016 1:25 pm
Posts: 4
<snip>

Quote:
- The second disk that dropped appears to have actually been the first disk again, so it's far less catastrophic than I thought. I've still got to persuade it to rebuild (and I'm unsure if it's even faulty - the NAS now seems to have decided it's fine even though it doesn't seem to want to re-add it to the array), but the data is all intact and can be copied off (which I'm doing before I take a step that's fatal, such as poking around with the RAID config at the CLI level, or blowing away the config completely and rebuilding it from scratch), so it's not all bad.


... Or so I thought...

Turns out a second disk really /is/ on its last legs. It 'died' again during copying the data off. Before the replaced disk could rebuild (as before).

I guess I'll keep copying whilst it limps on, and hopefully get the data off before it finally gives up.

--DJUnreal


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group