Find it

Friday, August 21, 2009

SCSI transport failed: reason 'timeout': retrying command

Error captured from dmesg and /var/adm/messages:


#tail /var/adm/messages
Aug 20 01:17:53 XXXXX scsi: [ID 107833 kern.warning] WARNING: /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w21000004cf6f64ca,0 (ssd0):
Aug 20 01:17:53 XXXXX SCSI transport failed: reason 'timeout': retrying command
Aug 20 01:17:53 XXXXX md_stripe: [ID 641072 kern.warning] WARNING: md: d23: write error on /dev/dsk/c1t0d0s3
Aug 20 01:17:53 XXXXX scsi: [ID 107833 kern.warning] WARNING: /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w21000004cf6f64ca,0 (ssd0):
Aug 20 01:17:53 XXXXX SCSI transport failed: reason 'timeout': giving up
Aug 20 01:17:53 XXXXX md_stripe: [ID 641072 kern.warning] WARNING: md: d20: write error on /dev/dsk/c1t0d0s0
Aug 20 01:17:53 XXXXX md_mirror: [ID 104909 kern.warning] WARNING: md: d23: /dev/dsk/c1t0d0s3 needs maintenance
Aug 20 01:17:53 XXXXX md_mirror: [ID 104909 kern.warning] WARNING: md: d20: /dev/dsk/c1t0d0s0 needs maintenance



Checked performed:



#iostat -En
c1t0d0 Soft Errors: 13 Hard Errors: 0 Transport Errors: 4 <<< No hard errors.

# metastat | more

d0: Mirror

Submirror 0: d20

State: Needs maintenance <<< The meta device is in “Needs Maintenance” state for d0 & d3 that is / and /var

Submirror 1: d10

State: Okay

Pass: 1

Read option: roundrobin (default)

Write option: parallel (default)

Size: 16779312 blocks (8.0 GB)



d20: Submirror of d0

State: Needs maintenance

Invoke: metareplace d0 c1t0d0s0

Size: 16779312 blocks (8.0 GB)

Stripe 0:

Device Start Block Dbase State Reloc Hot Spare

c1t0d0s0 0 No Maintenance Yes





d10: Submirror of d0

State: Okay

Size: 16779312 blocks (8.0 GB)

Stripe 0:

Device Start Block Dbase State Reloc Hot Spare

c1t1d0s0 0 No Okay Yes





d3: Mirror

Submirror 0: d23

State: Needs maintenance

Submirror 1: d13

State: Okay

Pass: 1

Read option: roundrobin (default)

Write option: parallel (default)

Size: 12584484 blocks (6.0 GB)



d23: Submirror of d3

State: Needs maintenance

Invoke: metareplace d3 c1t0d0s3

Size: 12584484 blocks (6.0 GB)

Stripe 0:

Device Start Block Dbase State Reloc Hot Spare

c1t0d0s3 0 No Maintenance Yes



# metadb -i

flags first blk block count

a m p luo 16 1034 /dev/dsk/c1t1d0s7

a p luo 1050 1034 /dev/dsk/c1t1d0s7

a p luo 2084 1034 /dev/dsk/c1t1d0s7

a p luo 16 1034 /dev/dsk/c1t0d0s7

a p luo 1050 1034 /dev/dsk/c1t0d0s7

a p luo 2084 1034 /dev/dsk/c1t0d0s7



<<<< Metadb replicas seems to in well shape >>>>>



Action taken to address the issue:


# format c1t0d0
format> analyze
analyze> read

Ready to analyze (won't harm SunOS). This takes a long time,

but is interruptable with CTRL-C. Continue? yes

pass 0

24619/26/53

pass 1

24619/26/53

Total of 0 defective blocks repaired.

#metasync d0 ; metasync d3

# metareplace -e d0 c1t0d0s0
d0: device c1t0d0s0 is enabled

# metareplace -e d3 c1t0d0s3
d3: device c1t0d0s3 is enabled

Where option "e" stands for - Transitions the state of component to the available state and resyncs the failed component.

# metastat | grep %
Resync in progress: 72 % done
Resync in progress: 72 % done

After sync completes check metastat output, everything should be fine.

NOTE:

The error was “SCSI transport failed: reason 'timeout': retrying command” and according to me the root cause is - when the disk tried to send or receive data to the drive it could not. This could be cable or SCSI controller. I would get the data off and try a new drive be fore it fails.

2 comments:

  1. Thanks a lot for this information......
    it is a real help from you..
    Sitansu

    ReplyDelete