General NAS-Central Forums

Welcome to the NAS community
It is currently Fri Jul 21, 2017 2:39 am

All times are UTC




Post new topic Reply to topic  [ 9 posts ] 
Author Message
PostPosted: Fri Feb 07, 2014 10:52 pm 
Offline

Joined: Fri Feb 07, 2014 8:10 pm
Posts: 21
Hello,
i'm having a MAJOR issue with a PX4-300D unit in which all shares ARE EMPTY.
unit has been working nonstop for 2 weeks, already running latest 4.0.6.19294(it was infact setup from scratch from that version, no "working upgrades") and has an UPS.
has 2x1TB HDD(factory install) in RAID1

It has 2 volumes, each one with different shares(and different snapshot settings), only thing enabled is SMB, no security, pretty basic

I needed to add an extra share today so i tried to open the management web and got a "you do not have permission to browse /, use http://ip/manage" so i use /manage, different error this time "resource does not exist", files and shares remain perfectly accesible, front panel is "all blue", no alerts on LCD, nothing.
I decide to reboot the unit then and ~15min later it's still shutting down...., so i force a shutdown(all client computers had long closed any files)

After starting, i can access the management web ok, event log shows snapshots being created and then a weird "unit has been removed from the system" on 31/1 and then the shutdown/restart today.
I also removed mcafee app as it requires security and it's not desirable right now(i cancelled the install but it seems to have added it anyway)

THEN when i opened the shares, THEY'RE EMPTY!!!!!!!!!!!. :cry: :o :o
the snapshots also stop at the 31 and only for one volume(the other isn't showing anything), if i try to expose a snapshot for everyone(as a share), it doesn't appears when browsing through network!!!(but appears as a share on the share list).

checking with SSH i can see the share on the 2nd volume on mnt/volumes and the contents appear ok(haven't copied them to double check), but the other folder are all empty

i need to recover these files at all cost, the data is extremely important!(i'm working with support as well but time is of utmost importance)


Top
 Profile  
 
PostPosted: Sat Feb 08, 2014 8:40 am 
Offline

Joined: Mon Jun 16, 2008 10:45 am
Posts: 6014
The insipid answer is: Put back your backup. If you don't have a backup, the data apparently is not important.

You have two disks in raid1, and two volumes. So both volumes are part of that single raid1?
You have shares, and 'snapshots'. I guess these snapshots are some form of incremental backup of that shares. All shares on both volumes are gone. Your newest snapshot on one volume is from 31 January, and on one (the other?) volume you can see your data when logged in over ssh. Correct?

It's bad that it took that long to shutdown. When the box went mad and just applied a factory reset or something like that, it had plenty of time to delete everything. Yet didn't it have the time to actually erase the data. That takes hours on a TB disk. So it might be possible to do a low level rescue on that data. What is the nature of the data? Office documents, photos, movies, ...? PhotoRec can recover a lot of different file types from a raw disk. Despite it's name it can recover a lot more than photo's alone. But there are some conditions. The file needs to have a recognizable header (so flat text like source code won't work) and the file should not be fragmented.
Further it is possible that you loose your metadata (filenames, timestamps, ...)


Top
 Profile  
 
PostPosted: Sat Feb 08, 2014 8:34 pm 
Offline

Joined: Fri Feb 07, 2014 8:10 pm
Posts: 21
answers on line:
the backups ARE THE SNAPSHOTS, and data is business critical.

yes, both volumes are part of s single raid (they differ in size)
snapshots are set by volume
The shares are not gone, the CONTENTS of the shares are gone (i..e: you browse to\\NAS and see all the shares)
yes, the last visible snapshot on the web manager is from the 31st, and yes, when browsing to /mnt/ i can see the what looks to be the contents of that 2nd volume.

the data is mainly office documents, pdfs, and on the other volume databases and apps, there are hundreds of files it's impossible to lose the metadata.
i know photorec and used it in the past but i steer clear as like you say, it deletes metadata and needs to recover the files it recognizes whilst i need to recover everything even unrecognized.


Top
 Profile  
 
PostPosted: Sun Feb 09, 2014 2:02 pm 
Offline

Joined: Mon Jun 16, 2008 10:45 am
Posts: 6014
Basically there are two possibilities. The files are gone (deleted, or lost in a filesystem error), or not easy visible. If they are deleted or lost, only low level recovery is possible. I *think* the data partitions have an xfs filesystem, and undelete *might* be possible, however, here is also PhotoRec mentioned, which is basically lowlevel. So I have a bad feeling about metadata.

If the files are only 'invisible', the command 'df' should show their storage usage. The existence of snapshots on the same filesystem makes it a bit hard to read, but the command 'du -s /path/to/snapshots' can tell how much the space the snapshots use, so the remaining usage (minus a bit for the filesystem itself) is somehow invisible.

If you have the idea the files still should be somewhere, you can try to find them:
Code:
find /mnt/pools/ -name <some-filename-you-remember>
or
Code:
find /mnt/pools/ -name *.doc
The filenames and extensions are case sensitive. (BTW, you can try this anyhow. It won't hurt.)

Quote:
the backups ARE THE SNAPSHOTS
I guess by now you have realized that keeping backups on the same device as the 'originals' is not a good idea.


Top
 Profile  
 
PostPosted: Mon Feb 10, 2014 1:00 pm 
Offline

Joined: Fri Feb 07, 2014 8:10 pm
Posts: 21
Mijzelf,
thank you very much for the info and links, before booting the NAS ans risking further overwrite damage i've decided to run a data recovery/image of one of the disks offline, i've used UFS Explorer to recover the data after browsing your links and this is what i've found:
1) filesystem seems to be ext2/3/4, at least that's how UFSE recognizes it
2) each "volume" i made turned out to be an ext partition
3) i didn't even had to run any undelete or anything of the sort!, the data is there on each partition in plain sight, i just copied it and the data is in perfect shape.

Now, why did this happen in the first place?, i've been reading the kernel logs and found something interesting:
i see normal logs until i get to the jan 31 date...:
Code:
Jan 31 13:34:31 Servidor kernel: [692642.000034] usb 2-1: reset high speed USB device number 2 using ehci_hcd
Jan 31 13:34:46 Servidor kernel: [692657.112028] usb 2-1: device descriptor read/64, error -110
Jan 31 13:35:01 Servidor kernel: [692672.328026] usb 2-1: device descriptor read/64, error -110
Jan 31 13:35:01 Servidor kernel: [692672.544025] usb 2-1: reset high speed USB device number 2 using ehci_hcd
Jan 31 13:35:16 Servidor kernel: [692687.656028] usb 2-1: device descriptor read/64, error -110
Jan 31 13:35:32 Servidor kernel: [692702.872034] usb 2-1: device descriptor read/64, error -110
Jan 31 13:35:32 Servidor kernel: [692703.088028] usb 2-1: reset high speed USB device number 2 using ehci_hcd
Jan 31 13:35:42 Servidor kernel: [692713.496022] usb 2-1: device not accepting address 2, error -110
Jan 31 13:35:42 Servidor kernel: [692713.608029] usb 2-1: reset high speed USB device number 2 using ehci_hcd
Jan 31 13:35:53 Servidor kernel: [692724.016021] usb 2-1: device not accepting address 2, error -110
Jan 31 13:35:53 Servidor kernel: [692724.016062] usb 2-1: USB disconnect, device number 2
Jan 31 13:35:53 Servidor kernel: [692724.016099] sd 6:0:0:0: Device offlined - not ready after error recovery
Jan 31 13:35:53 Servidor kernel: [692724.016116] sd 6:0:0:0: [sdc] Unhandled error code
Jan 31 13:35:53 Servidor kernel: [692724.016122] sd 6:0:0:0: [sdc]  Result: hostbyte=0x05 driverbyte=0x00
Jan 31 13:35:53 Servidor kernel: [692724.016129] sd 6:0:0:0: [sdc] CDB: cdb[0]=0x2a: 2a 00 00 08 00 4e 00 00 08 00
Jan 31 13:35:53 Servidor kernel: [692724.016146] end_request: I/O error, dev sdc, sector 524366
Jan 31 13:35:53 Servidor kernel: [692724.016155] Buffer I/O error on device sdc1, logical block 65538
Jan 31 13:35:53 Servidor kernel: [692724.016160] lost page write due to I/O error on sdc1
Jan 31 13:35:53 Servidor kernel: [692724.016296] sd 6:0:0:0: [sdc] Unhandled error code
Jan 31 13:35:53 Servidor kernel: [692724.016302] sd 6:0:0:0: [sdc]  Result: hostbyte=0x01 driverbyte=0x00
Jan 31 13:35:53 Servidor kernel: [692724.016309] sd 6:0:0:0: [sdc] CDB: cdb[0]=0x28: 28 00 00 00 00 3e 00 00 08 00
Jan 31 13:35:53 Servidor kernel: [692724.016326] end_request: I/O error, dev sdc, sector 62
Jan 31 13:35:53 Servidor kernel: [692724.041114] EXT2-fs (sdc1): error: ext2_get_inode: unable to read inode block - inode=30594, block=65538
Jan 31 13:35:53 Servidor kernel: [692724.041144] EXT2-fs (sdc1): previous I/O error to superblock detected
Jan 31 13:35:53 Servidor kernel: [692724.041148]


then the logs are filled with the same error:
Code:
Feb  7 08:17:29 Servidor kernel: [1278419.832128] EXT2-fs (sdc1): error: ext2_get_inode: unable to read inode block - inode=30594, block=65538
Feb  7 08:17:29 Servidor kernel: [1278420.369919] Buffer I/O error on device loop1, logical block 2068
Feb  7 08:17:29 Servidor kernel: [1278420.369925] lost page write due to I/O error on loop1
Feb  7 08:17:32 Servidor kernel: [1278423.269744] quiet_error: 11 callbacks suppressed



a TON of repetition of those, now if i search for sdc1:
Code:
Feb  7 14:56:40 Servidor kernel: [    6.800975] scsi 6:0:0:0: Direct-Access     SMI      USB DISK         0100 PQ: 0 ANSI: 0 CCS
Feb  7 14:56:40 Servidor kernel: [    6.801669] sd 6:0:0:0: Attached scsi generic sg2 type 0
Feb  7 14:56:40 Servidor kernel: [    6.802082] sd 6:0:0:0: [sdc] 1957888 512-byte logical blocks: (1.00 GB/956 MiB)
Feb  7 14:56:40 Servidor kernel: [    6.802816] sd 6:0:0:0: [sdc] Write Protect is off
Feb  7 14:56:40 Servidor kernel: [    6.802824] sd 6:0:0:0: [sdc] Mode Sense: 43 00 00 00
Feb  7 14:56:40 Servidor kernel: [    6.802829] sd 6:0:0:0: [sdc] Assuming drive cache: write through
Feb  7 14:56:40 Servidor kernel: [    6.805067] sd 6:0:0:0: [sdc] Assuming drive cache: write through
Feb  7 14:56:40 Servidor kernel: [    6.805709]  sdc: sdc1
Feb  7 14:56:40 Servidor kernel: [    6.807566] sd 6:0:0:0: [sdc] Assuming drive cache: write through
Feb  7 14:56:40 Servidor kernel: [    6.807574] sd 6:0:0:0: [sdc] Attached SCSI removable disk


looks like a regular USB pendrive, internal of course, ┬┐but what does it do?, does it have the boot OS?(i thought it used a portion of the disks, in fact there's a whole partition dedicated to the system), and it "by chance" looks like it failed on jan 31st..., too much of a coincidence

but the last logs(on friday 7) before the support dump and after(when i ran ssh) don't show these errors anymore..


now i'm not confident of "factory resetting" this unit and putting the data in it again and pray it won't happen again


Top
 Profile  
 
PostPosted: Mon Feb 10, 2014 7:42 pm 
Offline

Joined: Mon Jun 16, 2008 10:45 am
Posts: 6014
First, congratulations!

I guess the 'initial firmware' which is used to install the disks, is stored on some internal usb thumb. But it's not clear to me (understatement) why that stick should be mounted/used in normal operation. And I also don't see why an I/O error on that stick (which is bad, BTW) would have influence on your data partitions.

Quote:
Feb 7 08:17:29 Servidor kernel: [1278419.832128] EXT2-fs (sdc1): error: ext2_get_inode: unable to read inode block - inode=30594, block=65538
Feb 7 08:17:29 Servidor kernel: [1278420.369919] Buffer I/O error on device loop1, logical block 2068
This 2 lines suggest that loop0 is somehow related to sdc1. All EMC boxes have the same firmware structure, AFAIK. When you look at the Stock configuration of the Home Media CE, you can see that loop0 is the file /boot/images/apps, which resides on sda1. So it seems to me that for some reason (a bug or a hardware error) the box was running from flash instead of harddisk, and the I/O error crippled down the firmware. Possibly this caused the stop of making snapshots, and also caused the other problems why you shut down the box. And ultimately it prevented the box from shutting down normally.

Quote:
now i'm not confident of "factory resetting" this unit and putting the data in it again and pray it won't happen again
I can imagine. Actually I can imagine you even can't do a factory reset, if the internal flash is damaged. That should be easy to test, just put in an empty disk, and see if the box succeeds in initializing it. If not, you have a good reason to RMA the box.
Did Lenovo say anything useful?


Top
 Profile  
 
PostPosted: Mon Feb 10, 2014 7:55 pm 
Offline

Joined: Fri Feb 07, 2014 8:10 pm
Posts: 21
the moment i encountered the issue i called Lenovo, they opened a case and asked for the support dump files (the case email took more than 1:30hrs to be delivered...), which i sent them, since then i haven't heard back, not even an acknowledgement of them receiving it...
From what i see this probably merits an RMA as i don't think the internal USB flash is removable (soldered down maybe, i haven't been able to find a teardown video or pictures).

i'll wait until i hear back from Lenovo before trying with an empty drive/webmanage factory reset, i could also externally wipe the HDDs and put them back in together with the factory reset(either via the back button or the manager).

for now i've put a "windows 7 PC nas" in place on the customer which gives me breathing room to wait for Lenovo to answer


Top
 Profile  
 
PostPosted: Tue Feb 11, 2014 1:26 am 
Offline

Joined: Fri Feb 07, 2014 8:10 pm
Posts: 21
this just in:
"we don't see any hardware problems in your unit" -REALLY :o -
"we recommend you do a FACTORY ERASE(losing all your data)" -facepalm, i told them i need to recover the data and they tell me the solution is TO DELETE EVERYTHING :lol: -

you can imagine that my answer hasn't been the most polite of ones at this blatant idiocy, i also shoved them the kernel log showing "no hardware error"


Top
 Profile  
 
PostPosted: Fri Feb 14, 2014 3:10 pm 
Offline

Joined: Fri Feb 07, 2014 8:10 pm
Posts: 21
They finally agreed to exchange the unit... have to RMA it now


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 9 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group