Error captured from dmesg and /var/adm/messages:
#tail /var/adm/messages
Aug 20 01:17:53 XXXXX scsi: [ID 107833 kern.warning] WARNING: /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w21000004cf6f64ca,0 (ssd0):
Aug 20 01:17:53 XXXXX SCSI transport failed: reason 'timeout': retrying command
Aug 20 01:17:53 XXXXX md_stripe: [ID 641072 kern.warning] WARNING: md: d23: write error on /dev/dsk/c1t0d0s3
Aug 20 01:17:53 XXXXX scsi: [ID 107833 kern.warning] WARNING: /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w21000004cf6f64ca,0 (ssd0):
Aug 20 01:17:53 XXXXX SCSI transport failed: reason 'timeout': giving up
Aug 20 01:17:53 XXXXX md_stripe: [ID 641072 kern.warning] WARNING: md: d20: write error on /dev/dsk/c1t0d0s0
Aug 20 01:17:53 XXXXX md_mirror: [ID 104909 kern.warning] WARNING: md: d23: /dev/dsk/c1t0d0s3 needs maintenance
Aug 20 01:17:53 XXXXX md_mirror: [ID 104909 kern.warning] WARNING: md: d20: /dev/dsk/c1t0d0s0 needs maintenance
Checked performed:
#iostat -En
c1t0d0 Soft Errors: 13 Hard Errors: 0 Transport Errors: 4 <<< No hard errors.
# metastat | more
d0: Mirror
Submirror 0: d20
State: Needs maintenance <<< The meta device is in “Needs Maintenance” state for d0 & d3 that is / and /var
Submirror 1: d10
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 16779312 blocks (8.0 GB)
d20: Submirror of d0
State: Needs maintenance
Invoke: metareplace d0 c1t0d0s0
Size: 16779312 blocks (8.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s0 0 No Maintenance Yes
d10: Submirror of d0
State: Okay
Size: 16779312 blocks (8.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s0 0 No Okay Yes
d3: Mirror
Submirror 0: d23
State: Needs maintenance
Submirror 1: d13
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 12584484 blocks (6.0 GB)
d23: Submirror of d3
State: Needs maintenance
Invoke: metareplace d3 c1t0d0s3
Size: 12584484 blocks (6.0 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s3 0 No Maintenance Yes
# metadb -i
flags first blk block count
a m p luo 16 1034 /dev/dsk/c1t1d0s7
a p luo 1050 1034 /dev/dsk/c1t1d0s7
a p luo 2084 1034 /dev/dsk/c1t1d0s7
a p luo 16 1034 /dev/dsk/c1t0d0s7
a p luo 1050 1034 /dev/dsk/c1t0d0s7
a p luo 2084 1034 /dev/dsk/c1t0d0s7
<<<< Metadb replicas seems to in well shape >>>>>
Action taken to address the issue:
# format c1t0d0
format> analyze
analyze> read
Ready to analyze (won't harm SunOS). This takes a long time,
but is interruptable with CTRL-C. Continue? yes
pass 0
24619/26/53
pass 1
24619/26/53
Total of 0 defective blocks repaired.
#metasync d0 ; metasync d3
# metareplace -e d0 c1t0d0s0
d0: device c1t0d0s0 is enabled
# metareplace -e d3 c1t0d0s3
d3: device c1t0d0s3 is enabled
Where option "e" stands for - Transitions the state of component to the available state and resyncs the failed component.
# metastat | grep %
Resync in progress: 72 % done
Resync in progress: 72 % done
After sync completes check metastat output, everything should be fine.
NOTE:
The error was “SCSI transport failed: reason 'timeout': retrying command” and according to me the root cause is - when the disk tried to send or receive data to the drive it could not. This could be cable or SCSI controller. I would get the data off and try a new drive be fore it fails.
Thanks a lot for this information......
ReplyDeleteit is a real help from you..
Sitansu
Hi Joshi,
ReplyDeleteThanks alot for your help.