Saturday, January 16, 2010

Debug tip in case if container is hung while shutdown.

Sometimes if you have NFS share mounted to NGZ through GZ & if you initiate shutdown to that container then most of the times it hangs & stuck at shutting_down state. If zone hung nfs mount one should be able to see it still mounted in the /etc/mntab file in the global zone: grep nfs /etc/mntab. It will be mounted under the zonepath.  You should then be able to do a umount -f / from the global zone and if you're really lucky the zone will finish shutting down.

Also see if any process is holding zone getting shutdown & try to kill them.  Yet another good command is truss, this will be a helpful while performing debugging. Like when you initiate shutdown then it start some process so you can simply truss on that process ID & see what it is exactly doing & where it is stuck.

If above tip won't work then you can use mdb to debug further -

Sun Container ref_count

# mdb -k
::walk zone | ::print zone_t zone_name zone_ref

The zone_ref > 1 means that something in the kernel is holding the zone.

# mdb -k

# mdb -k
::kmem_cache | grep rnode
ffffffffa6438008 rnode_cache               0200 000000      656    70506
ffffffffa643c008 rnode4_cache              0200 000000      968        0

Then run -

ffffffffa6438008::walk kmem | ::print rnode_t r_vnode | ::wnode2path

See if this gives any hints for solution. The out put from this command may show you few files/filesystems which may be hold back from the zone & it is causing shutting down zone.

In case if nothing workout then you have to take a call & recycle GZ server.

One important thing I came to know out of this experience that - zsched process is always unkillable.  It will only exit when instructed to by zoneadmd.

