UNIX my first love...

Hello Friends, This is Nilesh Joshi from Pune, India. By profession I am an UNIX Systems Administrator and have proven career track on UNIX Systems Administration. This blog is written from both my research and my experience. The methods I describe herein are those that I have used and that have worked for me. It is highly recommended that you do further research on this subject. If you choose to use this document as a guide, you do so at your own risk. I wish you great success.

Friday, September 25, 2009

Oracle Kernel Parameters in Solaris 10

In case of Solaris9, edit shared memory and semaphore kernel parameters in /etc/system using the calculations and rules of thumb from the oracle install manual.

The following table identifies the now obsolete IPC tunable and their replacement resource controls.

In case of Solaris 10, the Solaris resource management facilities are used to set resource limits for Oracle rather than /etc/system parameters. The /etc/system should not be edited with resource limits, so only a single parameter should be added to /etc/system:

set noexec_user_stack=1

For Solaris10, add a project for the oracle user and set the recommended resource limits:

# projadd -U oracle user.oracle
# projmod -s -K "project.max-sem-ids=(priv,100,deny)" user.oracle
# projmod -s -K "project.max-sem-nsems=(priv,256,deny)" user.oracle
# projmod -s -K "project.max-shm-memory=(priv,4294967295,deny)" user.oracle
# projmod -s -K "project.max-shm-ids=(priv,100,deny)" user.oracle

(To check the project settings: prctl -i project user.oracle)

- Reboot machine after changing /etc/system to put new settings into effect.

Thursday, September 24, 2009

AIX extendvg issue

AIX extendvg issue -

While performing some task related to expanding the VG I got below error -

0516-1163 extendvg: datavg2 already has maximum physical volumes. With the maximum
number of physical partitions per physical volume being 2032, the maximum
number of physical volumes for volume group datavg2 is 16.
0516-792 extendvg: Unable to extend volume group.

When I tried adding LUN/disk to VG system thrown above error on my beautiful face... :)

When I checked what is the reason that I received such a error then I realized that I have reached to maximum number of PV limit and now my next job is to increase the number of PVs for problematic VG.

# lsvg datavg2
VOLUME GROUP: datavg2 VG IDENTIFIER: 002703ff00004c0000000109f5693a39
VG STATE: active PP SIZE: 16 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 24598 (393568 megabytes)
MAX LVs: 512 FREE PPs: 9020 (144320 megabytes)
LVs: 78 USED PPs: 15578 (249248 megabytes)
OPEN LVs: 78 QUORUM: 16 (Enabled)
TOTAL PVs: 16 VG DESCRIPTORS: 16
STALE PVs: 0 STALE PPs: 0
ACTIVE PVs: 16 AUTO ON: yes
MAX PPs per VG: 128016
MAX PPs per PV: 3048 MAX PVs: 16
LTG size: 128 kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY: relocatable

I tried with command -

# chvg -B datavg2

However I got error while converting this VG to BIG VG -

Anyways now question is how to get rid of this? We have to convert this normal Vg to BIG VG.

IMP: When you want to convert normal VG to big VG the mandatory condition is - There must be enough free partitions available on each physical volume for the VGDA expansion for this operation to be successful, at least 2 PP as per my observation.
There must be enough free partitions available on each physical volume for the VGDA expansion for this operation to be successful. Because the VGDA resides on the edge of the disk and it requires contiguous space for expansion, the free partitions are required on the edge of the disk. If those partitions are allocated for user usage, they will be migrated to other free partitions on the same disk. The rest of the physical partitions will be renumbered to reflect the loss of the partitions for VGDA usage. This will change the mappings of the logical to physical partitions in all the PVs of this VG. If you have saved the mappings of the LVs for a potential recovery operation, you should generate the maps again after the completion of the conversion operation. Also, if the backup of the VG is taken with the map option and you plan to restore using those maps, the restore operation may fail since the partition number may no longer exist (due to reduction). It is recommended that backup is taken before the conversion, and right after the conversion if the map option is utilized. Because the VGDA space has been increased substantially, every VGDA update operation (creating a logical volume, changing a logical volume, adding a physical volume, and so on) may take considerably longer to run.

When I checked for all PVs assocoated with VG datavg2 and looked for their Used PPs and Free PPs status, I guess I am getting above error because I have following problem - see few of my disk does not have free PPs at all - like below you can see hdisk7, hdisk50, hdisk50 and so on...

# lsvg -p datavg2
datavg2:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk43 active 475 136 64..72..00..00..00
hdisk71 active 475 131 00..03..90..38..00
hdisk22 active 2047 236 00..236..00..00..00

[.....]

hdisk25 active 2047 856 00..409..137..00..310
hdisk7 active 952 0 00..00..00..00..00
hdisk50 active 475 0 00..00..00..00..00

[.....]

hdisk62 active 475 0 00..00..00..00..00
hdisk38 active 2047 824 00..24..00..390..410
hdisk36 active 2047 603 00..409..194..00..00

[.....]

Then I checked PP and LV allocations for hdisk7

# lspv -M hdisk7 | more
hdisk7:1 udbdevl4:1152
hdisk7:2 udbdevl4:1153
hdisk7:3 udbdevl4:1154
hdisk7:4 udbdevl4:1155
hdisk7:5 udbdevl4:1156

[.....]

hdisk7:949 lv52:1801
hdisk7:950 lv52:1802
hdisk7:951 lv52:1803
hdisk7:952 lv52:1804

OK.. So hdisk7 is totally full and it seems we have good enough space or Free PPs on hdisk38, so how about migrate the logical partitions from hdisk7, hdisk50, hdisk62 and so on to hdisk38 which has enough space. It seems to be a good idea.

So let’s go for it.

Now first I going to see on hdisk38 how the PPs are laid out -

# lspv -M hdisk38

[.......]
hdisk38:1244 JDQD.db:208
hdisk38:1245 JDQD.db:209
hdisk38:1246 JDQD.db:210
hdisk38:1247 JDQD.db:211
hdisk38:1248-2047

Ok.. from 1240 to 2047 we have a free PPs. Now let us migrate on LP from hdisk7 to hdisk38 on next available PP that is 1248

# migratelp lv52/1804 hdisk38/1248
migratelp: Mirror copy 1 of logical partition 1804 of logical volume
lv52 migrated to physical partition 1248 of hdisk38.

# migratelp lv52/1803 hdisk38/1249
migratelp: Mirror copy 1 of logical partition 1803 of logical volume
lv52 migrated to physical partition 1249 of hdisk38.

So here I made at least two PPs available for hdisk7 and like the same way we are going to do it same for hdisk50, hdisk62 and hdisk66 which are full, see below O/P -

# lsvg -p datavg2
datavg2:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk43 active 475 136 64..72..00..00..00
hdisk71 active 475 131 00..03..90..38..00
hdisk22 active 2047 236 00..236..00..00..00

[.....]

hdisk25 active 2047 856 00..409..137..00..310
hdisk7 active 952 2 00..00..00..00..02
hdisk50 active 475 2 00..00..00..00..02

[.....]

hdisk50 active 475 2 00..00..00..00..02
hdisk38 active 2047 816 00..24..00..382..410
hdisk36 active 2047 603 00..409..194..00..00

[.....]

You can also see hdisk38 status or changes -

# lspv -M hdisk38

[.....]
hdisk38:1248 lv52:1804
hdisk38:1249 lv52:1803
hdisk38:1250 lv52:696
hdisk38:1251 lv52:695
hdisk38:1252 lv52:995
hdisk38:1253 lv52:994
hdisk38:1254 udbdevl4:1124
hdisk38:1255 udbdevl4:1123
hdisk38:1256-2047

Now we are going to try converting VG to BIG VG orI can say best as increase number of PPs to a VG

# smitty chvg

[Entry Fields]
* VOLUME GROUP name [datavg2] +

[Entry Fields]
* VOLUME GROUP name datavg2
* Activate volume group AUTOMATICALLY yes +
at system restart?
* A QUORUM of disks required to keep the volume yes +
group on-line ?
Convert this VG to Concurrent Capable? no +
Change to big VG format? yes +
Change to scalable VG format? no +
LTG Size in kbytes 128 +
Set hotspare characteristics n +
Set synchronization characteristics of stale n +
partitions
Max PPs per VG in units of 1024 32 +
Max Logical Volumes 256 +

OK.. See now VG has became a BIG VG -

# lsvg datavg2
VOLUME GROUP: datavg2 VG IDENTIFIER: 002703ff00004c000000010d8f8b6358
VG STATE: active PP SIZE: 32 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 11103 (355296 megabytes)
MAX LVs: 512 FREE PPs: 4547 (145504 megabytes)
LVs: 23 USED PPs: 6556 (209792 megabytes)
OPEN LVs: 23 QUORUM: 9 (Enabled)
TOTAL PVs: 17 VG DESCRIPTORS: 17
STALE PVs: 0 STALE PPs: 0
ACTIVE PVs: 17 AUTO ON: yes
MAX PPs per VG: 130048
MAX PPs per PV: 2032 MAX PVs: 64
LTG size: 128 kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY: relocatable

Cool stuff, good learning...

Wednesday, September 23, 2009

My BIG Time FEAR.... Set user ID, set group ID, sticky bit. FIGHT YOUR FEAR

It's a shame to say that I have real BIG time fear with Set user ID, set group ID, sticky bit. I don't know why but I see myself very poor at remebering about this topic, now I decided to fight my fear and writing one small informational document on this.

What is Sticky bit?

It was used to trigger process to "stick" in memory after it is finished, now this usage is obsolete. Currently its use is system dependant and it is mostly used to suppress deletion of the files that belong to other users in the folder where you have "write" access to like /tmp.

How to set it up?

# chmod 1777 world_write
# ls -ld world_write
drwxrwxrwt 2 ignatz staff 512 Jul 15 15:27 world_write

What is SUID or setuid?

Change user ID on execution. If setuid bit is set, when the file will be executed by a user, the process will have the same rights as the owner of the file being executed.

For example, the setuid permission on the passwd command makes it possible for users to change passwords.

$ ls -l /usr/bin/passwd
-rwsr-xr-x 1 root root 22960 Jul 17 2006 /usr/bin/passwd

How to set it up?

# chmod 2551 dbprog2
# ls -l dbprog2
-r-xr-s--x 1 db staff 24576 May 6 09:30 dbprog2

What is SGID or setgid?

Change group ID on execution. Same as above, but inherits rights of the group of the owner of the file. For directories it also may mean that when a new file is created in the directory it will inherit the group of the directory (and not of the user who created the file).

How to set it up?

# chmod 4555 dbprog
# ls -l dbprog
-r-sr-xr-x 1 db staff 12095 May 6 09:29 dbprog

Hope by this medium I will always keep this in mind. Blogging helps me a lot!!!

Tuesday, September 22, 2009

v490 and MPxIO issue

I was enabling and configuring MPxIO on v490 server and after enabling MPxIO a very strange obsession I discovered on the server, when I enabled MPxIO on this system it started doing FC multipathing to the internal disks. This gave the internal disks MPxIO disk names (c2t[WWPN]d0/c2t[WWPN]d0) and updated the meta device Config etc to point to the new disks.

NOTE: Same observation is with V880 enabling MPxIO includes the internal drives if they're fibre-attached.

This can cause lot of issues at the time of server recovery in our environment. BTW it is not so bad idea to put your internal disks under MPxIO for redundancy, fail-over purpose.

As in our setup we have a problem with such a configuration so we decided to exclude the internal disks for MPxIO, to accomplish this you need to simply first disable to MPxIO using stmsboot -d command and then you need to have below entry in /kernel/drv/fp.conf file -

name="fp" parent="/pci@9,600000/SUNW,qlc@2" port=0 mpxio-disable="yes";

This entry will prevent internal disk to be multipathed.

This line tells Solaris to disable MPxIO on port 0 for all devices whose parent device is /pci@9,600000/SUNW,qlc@2. For sure, similar line should be added for all HBA’s and ports you do not want to have under MPxIO’s control. You can get the parent device from your /var/adm/messages file or from device links pointing to the internal disks.

Like -

# format

AVAILABLE DISK SELECTIONS:

0. c1t0d0
/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000000878bb811,0
1. c1t1d0
/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000000878bc25a,0
2. c4t60050768018E81F5B800000000000142d0

/scsi_vhci/ssd@g60050768018e81f5b800000000000142
3. c4t60050768018E81F5B800000000000145d0

/scsi_vhci/ssd@g60050768018e81f5b800000000000145

c1t0d0s0 & c1t1d0s0 are my internal drives... so I will find physical path for them from /dev or /devices location as below -

# ls -l c1t0d0*
lrwxrwxrwx 1 root root 70 Sep 21 10:04 c1t0d0s0 -> ../../devices/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000000878bb811,0:a
# ls -l c1t1d0*
lrwxrwxrwx 1 root root 70 Sep 21 10:04 c1t1d0s0 -> ../../devices/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000000878bc25a,0:a

Just a summary -

Enable MPxIO in /kernel/drv/fp.conf

For 280R/480R/v480 and same rage servers add below line to enable MPxIO which will exclude internal drives

mpxio-disable="no";
name="fp" parent="/pci@8,600000/SUNW,qlc@4" port=0 mpxio-disable="yes";

For a v490/v880:

mpxio-disable="no";
name="fp" parent="/pci@9,600000/SUNW,qlc@2" port=0 mpxio-disable="yes";

For v240/v440/v245/v445/T5000 Series/M#000 Series

mpxio-disable="no";

Run the command stmsboot -u to enable MPxIO followed by system reboot.

Hope someone find this helpful!

Thursday, September 17, 2009

I was wrong... My UNIX Guru Alex shown me the way!!! - Adding capped-memory to container "on-the-fly"

Yesterday I got a situation to perform - adding capped-memory to running container, before going for it I replied to end user asking for container downtime to perform this task, however later in late evening same day I saw Alex email in my mailbox explaining - "You don't need to reboot a Solaris container to increase its memory" and provided detailed step by step execution. I would like to take this opportunity to say thanks a lot to Alex publically... I am blessed with such a wonderful UNIX Guru...

Before going for this task my assumptions were as front - I was under impression that prctl command will do the temporary effect and will not remain across the reboot. This is what my understanding - prctl is an “on-the-fly” way to temporarily set Resource Control assignments & only after reboot modified parameter get permanent.

Then Alex replied explaining how exactly it works...

prctl, rcapadm modifies the running zone.

zonecfg defines the resource parameters of the zone when it boots.

So... To make a change dynamically you:
1) Update the zonecfg. The reason you do this is so that when rebooted it doesn't revert back to the old settings.
2) Use the prctl, rcapadm commands to modify the zone while it is online. The data you feed into prctl and rcapadm should match the changes you've made to zonecfg.

Below are the detail steps to add capped-memory to running container –

# zonecfg -z XXXXXX

zonecfg:XXXXXX> select capped-memory

zonecfg:XXXXXX:capped-memory> info

capped-memory:

physical: 1G

[swap: 2G]

[locked: 512M]

zonecfg:XXXXXX:capped-memory> set physical=2g

zonecfg:XXXXXX:capped-memory> set swap=3g

zonecfg:XXXXXX:capped-memory> info

capped-memory:

physical: 2G

[swap: 3G]

[locked: 512M]

zonecfg:XXXXXX:capped-memory> end

zonecfg:XXXXXX> exit

Now modify the zones runtime settings:

XXXXXX:/

# rcapadm -z XXXXXX -m 2048m

XXXXXX:/

# sleep 60

XXXXXX:/

# rcapstat -z 1 1

id zone            nproc    vm   rss   cap    at avgat    pg avgpg

9 XXXXXX             -  434M  377M 2048M    0K    0K    0K    0K

10 XXXXXX             -  452M  370M 2048M    0K    0K    0K    0K

14 XXXXXX            -  532M  328M 2048M    0K    0K    0K    0K

XXXXXX:/

# prctl -n zone.max-swap -v 3g -t privileged -r -e deny -i zone XXXXXX

Then verify you're settings have taken effect:

XXXXXX:/

# zlogin XXXXXX

[Connected to zone 'XXXXXX' pts/9]

Last login: Wed Sep 16 06:34:27 from XXX.XXX.XX.XX

Sun Microsystems Inc.   SunOS 5.10      Generic January 2005

WARNING:  YOU ARE SUPERUSER on XXXXXX!!

Your shell is /usr/bin/ksh

XXXXXX:/

# top -c

load averages:  1.36,  0.67,  0.59;                    up 101+08:46:15 08:55:12

51 processes: 50 sleeping, 1 on cpu

CPU states: 89.6% idle,  4.4% user,  6.1% kernel,  0.0% iowait,  0.0% swap

Memory: 2048M phys mem, 177M free mem, 3072M swap, 2627M free swap

It taught me a new lesson and at the same time I am still wondering “AM I KNOWING THIS BEFORE OR I WAS JUST LOST???” SHAME ON ME, VERY DISAPPOINTING HOWEVER THIS CONCEPT IS NOW GOT HARDCODED IN MY LITTLE BRAIN….

Thanks Alex, thanks a lot.

Hope this will help someone, somewhere!

Monday, September 14, 2009

Solaris Crash dump stuff

Morning I decided to concentrate and learn more on crash/core dumps, while driving my car I was recalling the things which I am aware about this subject and finally I came to know what I know is not enough!!! so decided to take a deep look at core file and core file management.

Before moving ahead first we will understand what is crash dump and what is core dump?

Crash dump --> A crash dump is the dump of the kernel. It is done in case of a crash(kernel panic) of the system. Crashing kernel produces a crash dump. configure using the dumpadm utility.

Core dump --> The core dump is the dump of the memory of a single process, crashing application can produce a core file, configure using the coreadm utility.

Okay... Lets start with "how to generate crash dump or infact how to force crash dump or core dump"

Generating crash dump on solaris - This section will educate us on what all ways are available to generate core file.

There are 4 methods that I am aware for getting core files generated on solaris host.

1. OK> sync

This is most common method for generating core dump on Solaris.

2. # reboot -d

This will reboot the host, and will generate a core file as part of the reboot.

3. # savecore -Lv

Where -

L - Save a crash dump of the live running Solaris system, without actually rebooting or altering the system in any way. This option forces savecore to save a live snapshot of the system to the dump device, and then immediately to retrieve the data and to write it out to a new set of crash dump files in the specified directory. Live system crash dumps can only be performed if you have configured your system to have a dedicated dump device using dumpadm.

NOTE: savecore -L does not suspend the system, so the contents of memory continue to change while dump is saved. Dumps taken by this method are not so self-consistent.

v - verbose mode.

4. Using uadmin administrative command

# uadmin 5 0
Sep 14 06:01:06 slabinfra4 savecore: saving system crash dump in /var/crash/XXXX/*.0

panic[cpu1]/thread=300024a60a0: forced crash dump initiated at user request

000002a100aa1960 genunix:kadmin+4a4 (b4, 0, 0, 125ec00, 5, 0)
%l0-3: 0000000001815000 00000000011cb800 0000000000000004 0000000000000004
%l4-7: 0000000000000438 0000000000000010 0000000000000004 0000000000000000
000002a100aa1a20 genunix:uadmin+11c (60016057208, 0, 0, ff390000, 0, 0)
%l0-3: 0000000000000000 0000000000000000 0000000078e10000 00000000000078e1
%l4-7: 0000000000000001 0000000000000000 0000000000000005 00000300024a60a0

syncing file systems... 2 1 done
dumping to /dev/md/dsk/d1, offset 859701248, content: kernel
100% done: 82751 pages dumped, compression ratio 3.05, dump succeeded
Program terminated
{1} ok

===============================================

Create Crash dump file using savecore also know as "LIVE CRASHDUMPS"-

Okay.. While trying to generate the core file on the fly without suspending the system, I got some issue shown below -

# savecore -Lv
savecore: dedicated dump device required

I checked if my dump device is configured or not,

# dumpadm
Dump content: kernel pages
Dump device: /dev/md/dsk/d1 (swap)
Savecore directory: /var/crash/XXXXXX
Savecore enabled: yes

Well, it is configured then what is the issue?

After few mins of search I found Sun Document ID: 3284 - "How to capture a live system core dump without having a dedicated dump device" & I decided to go with this.

This solution talk about creating additional swap space and configuring dumpadm to use this file.

Cool, no issues we will now execute the steps...

1. Check the current dump device configuration.

# dumpadm
Dump content: kernel pages
Dump device: /dev/md/dsk/d1 (swap)
Savecore directory: /var/crash/XXXXXX
Savecore enabled: yes

2. Try to create a core dump on the live system using savecore command.

# savecore -L
savecore: dedicated dump device required

3. To eliminate this, create a new metadevice or a blank file using mkfile.

# metainit d43 1 1 c4t60050768018A8023B800000000000132d0s0
d43: Concat/Stripe is setup

4.Change the dump device to point to the newly created file. Also configure dumpadm to dump only the kernel memory pages. You can omit the -c option to dump all memory pages instead.

# dumpadm -c kernel -d /dev/md/dsk/d43
Dump content: kernel pages
Dump device: /dev/md/dsk/d43 (dedicated)
Savecore directory: /var/crash/slabinfra4
Savecore enabled: yes

5. We can now dump the system core on the new dedicated dump device.

# savecore -L
dumping to /dev/md/dsk/d43, offset 65536, content: kernel
100% done: 80868 pages dumped, compression ratio 3.23, dump succeeded
System dump time: Mon Sep 14 06:00:52 2009
Constructing namelist /var/crash/XXXXXX/unix.0
Constructing corefile /var/crash/XXXXX/vmcore.0
100% done: 80868 of 80868 pages saved

6.We have now saved the core files in the /var/crash/directory.

# cd /var/crash/SystemName
# ls -lrt
total 1312738
-rw-r--r-- 1 root root 1699974 Sep 14 06:01 unix.0
-rw-r--r-- 1 root root 670072832 Sep 14 06:01 vmcore.0

7. Cool... we are done with our job so let us revert the dump device to original

# dumpadm -c kernel -d /dev/md/dsk/d1
Dump content: kernel pages
Dump device: /dev/md/dsk/d1 (swap)
Savecore directory: /var/crash/XXXXX
Savecore enabled: yes

8. If you dont want newly created metadevice you can remove it to save your storage.

# metastat -p
d0 -m d10 d20 1
d10 1 1 c2t0d0s0
d20 1 1 c2t1d0s0
d3 -m d13 d23 1
d13 1 1 c2t0d0s3
d23 1 1 c2t1d0s3
d1 -m d11 d21 1
d11 1 1 c2t0d0s1
d21 1 1 c2t1d0s1
d43 1 1 /dev/dsk/c4t60050768018A8023B800000000000132d0s0
d42 1 1 /dev/dsk/c4t60050768018A8023B800000000000137d0s0
d41 1 1 /dev/dsk/c4t60050768018A8023B800000000000136d0s0
d40 1 1 /dev/dsk/c4t60050768018A8023B800000000000135d0s0
d30 -p d4 -o 2097216 -b 2097152
d4 -m d14 d24 1
d14 1 1 c2t0d0s4
d24 1 1 c2t1d0s4
d31 -p d4 -o 32 -b 2097152

# metaclear d43
d43: Concat/Stripe is cleared

=================================================

How to panic your own system?

First we will see how we can crash our system in a pretty shopisticated way -

We'll start by crashing a Solaris 2 system. Is savecore ready? Okay, then, let's panic your system!

Ok.. adb is very old tool and now it is replaced by mdb (modular debugger). I am giving both examples for crashing your system using adb or mdb.

# mdb -kw
Loading modules: [ unix krtld genunix specfs dtrace cpu.generic uppc pcplusmp ufs ip hook neti sctp arp usba uhci s1394 fctl nca lofs audiosup zfs random cpc crypto fcip ptm sppp nfs ipc ]

> rootdir/W 123
> $q

do ls or something and see your system is panic-ED.....

BTW, one good book for knowing panic well is - PANIC book by Chris Drake and Kimberley Brown.

# adb -k -w /dev/ksyms /dev/mem
physmem 1e05
rootdir/X
rootdir:
rootdir: fc109408
rootdir/W 0
rootdir: 0xfc109408 = 0x0
$q

How does this procedure crash your system? Solaris keeps track of the address of the root vnode structure in a symbol called rootdir. If this vnode pointer is zero, the next time the system tries to do anything that would require walking down a directory path, it will fall over trying to read location zero looking for the root directory's vnode. Reading memory location zero is an illegal operation which results in a bad trap, data fault.

Using adb we will write a zero into rootdir and the system will quickly panic.

If your system doesn't panic immediately, just use the UNIX ls command to get a directory listing of the root directory, /. That will surely do the trick!

==================================================

Fine, now let us learn little tricks of analysing dump, we are going to use adb for now. There are lot many tools/debuger available like mdb,SCAT are the best ones!

# adb -k unix.0 vmcore.0
physmem 7d8b4

NOTE: adb returns the number of pages of physical memory in hexadecimal and then waits for your first command. Note that most versions of adb do not offer the user any prompt at this point. Don't be fooled by this!

${
sysname = [ "SunOS" ]
nodename = [ "XXXXXXX" ]
release = [ "5.10" ]
version = [ "Generic_139555-08" ]
machine = [ "sun4u" ]
}
hw_provider/s
hw_provider:
hw_provider: Sun_Microsystems
architecture/s
architecture:
architecture: sparcv9
srpc_domain/s
srpc_domain:
srpc_domain: uu.XXXX.com
$q

Fine, now we will check out mdb debuger.

# mdb unix.0 vmcore.0
Loading modules: [ unix genunix specfs cpu.generic uppc scsi_vhci ufs ip hook neti sctp arp usba nca lofs zfs random nsctl sdbc rdc sppp ]
> ::ps
S PID PPID PGID SID UID FLAGS ADDR NAME
R 0 0 0 0 0 0x00000001 00000000018398b0 sched
R 3 0 0 0 0 0x00020001 00000600118fb848 fsflush
R 2 0 0 0 0 0x00020001 00000600118fc468 pageout
R 1 0 0 0 0 0x4a004000 00000600118fd088 init
R 24662 1 24662 24662 0 0x52010400 00000600155a5128 vasd
[.....]
> 00000600157b5138::pfiles
FD TYPE VNODE INFO
0 FIFO 00000600170ac300
1 FIFO 00000600170ac200
2 SOCK 00000600200921c0 socket: AF_UNIX /var/run/zones/slabzone1.console_sock
3 DOOR 0000060014c47600 [door to 'zoneadmd' (proc=600157b5138)]
4 CHR 0000060014b13a00 /devices/pseudo/zconsnex@1/zcons@1:zoneconsole
5 DOOR 000006001334d6c0 /var/run/name_service_door [door to 'nscd' (proc=6001217c020)]
6 CHR 000006001538d7c0 /devices/pseudo/zconsnex@1/zcons@1:masterconsole
{Here above I am trying to lookup, which files or sockets where opened at the moment of the crash dump for a perticular process}
> ::msgbuf
MESSAGE
sd1 is /pci@1e,600000/ide@d/sd@0,0
pseudo-device: llc10
llc10 is /pseudo/llc1@0
pseudo-device: tod0
tod0 is /pseudo/tod@0
pseudo-device: lofi0
lofi0 is /pseudo/lofi@0
pseudo-device: fcode0
fcode0 is /pseudo/fcode@0
[....]
IP Filter: v4.1.9, running.
/pseudo/zconsnex@1/zcons@0 (zcons0) online
/pseudo/zconsnex@1/zcons@1 (zcons1) online
/pseudo/zconsnex@1/zcons@2 (zcons2) online
[....]

There is pretty much to learn in mdb however I am not considering it right now for this artical.

There is lot more however I am running short on time, I have to get back to work now! I will try adding SCAT knowledge append to same post soon...

Cool...finally I got some time to write about SCAT - Solaris Crash Analysis Tool.
Now a days there are several versions available for SCAT like - SCAT 4.1, SCAT 5.0, 5.1 & very fresh release is SCAT 5.2 - I am going with Solaris CAT 5.2 version. Lets install it quickly and start working with it.

Download SCAT 5.2 from sun website.

#gunzip SUNWscat5.2-GA-combined.pkg.gz
#pkgadd -G -d ./SUNWscat5.2-GA-combined.pkg

Here we go! SCAT is ready to use at location /opt/SUNWscat. If required get scat command in PATH as shown below -

#export PATH=$PATH:/opt/SUNWscat/bin

Now navigate to the crash dump location.

#cd /var/crash/XXXX
# ls -lrt
total 2655650
-rw-r--r-- 1 root root 1699974 Sep 14 06:01 unix.0
-rw-r--r-- 1 root root 670072832 Sep 14 06:01 vmcore.0
-rw-r--r-- 1 root root 1699974 Sep 15 01:57 unix.1
-rw-r--r-- 1 root root 685514752 Sep 15 01:59 vmcore.1

Ok, now let us execute the scat to analyze the crash.

# scat 0

Solaris[TM] CAT 5.2 for Solaris 10 64-bit UltraSPARC
SV4990M, Aug 26 2009
[.........]
core file: /var/crash/XXXXX/vmcore.0
user: Super-User (root:0)
release: 5.10 (64-bit)
version: Generic_139555-08
machine: sun4u
node name: XXXXXXX
domain: whois.XXXX.com
hw_provider: Sun_Microsystems
system type: SUNW,Sun-Fire-V240 (UltraSPARC-IIIi)
hostid: XXXXXXX
dump_conflags: 0x10000 (DUMP_KERNEL) on /dev/md/dsk/d43(15.9G)
time in kernel: Mon Sep 14 06:01:04 CDT 2009
age of system: 10 days 17 hours 13 minutes 33.78 seconds
CPUs: 2 (4G memory, 1 nodes)
panicstr:

sanity checks: settings...
NOTE: /etc/system: module nfssrv not loaded for "set nfssrv:nfs_portmon=0x1"
vmem...CPU...sysent...misc...
WARNING: 1 severe kstat errors (run "kstat xck")
WARNING: DF_LIVE set in dump_flags
NOTE: system has 2 non-global zones
done
SolarisCAT(vmcore.0/10U)>

Look at O/P carefully, It is shows how many non-global zones are installed on system, what all modules are available for debuging and what all missing for which parameter in /etc/system file...

The available commands a broken down into categories which you can see using the "help" command. The first group are for "Initial Investigation:" and include: analyze, coreinfo, msgbuf, panic, stack, stat, and toolinfo.

There are lot of things to write for SCAT however I will endup this entry with my all time favorite ZFS stuff -

SolarisCAT(vmcore.0/10U)> zfs -e
ZFS spa @ 0x60010cf4080
Pool name: zone1-zp00
State: ACTIVE
VDEV Address State Aux Description
0x60012379540 UNKNOWN - root

READ WRITE FREE CLAIM IOCTL
OPS 0 0 0 0 0
BYTES 0 0 0 0 0

EREAD 0
EWRITE 0
ECKSUM 0

VDEV Address State Aux Description
0x60012379000 UNKNOWN - /dev/dsk/
c4t60050768018A8023B800000000000134d0s0

READ WRITE FREE CLAIM IOCTL
OPS 68803 2107043 0 0 0
BYTES 5.39G 14.3G 0 0 0

EREAD 0
EWRITE 0
ECKSUM 0

ZFS spa @ 0x60011962fc0
Pool name: zone2-zp00
State: ACTIVE
VDEV Address State Aux Description
0x60011962a80 UNKNOWN - root

READ WRITE FREE CLAIM IOCTL
OPS 0 0 0 0 0
BYTES 0 0 0 0 0

EREAD 0
EWRITE 0
ECKSUM 0

VDEV Address State Aux Description
0x60011962540 UNKNOWN - /dev/dsk/
c4t60050768018A8023B800000000000133d0s0

READ WRITE FREE CLAIM IOCTL
OPS 5367 166547 0 0 0
BYTES 252M 795M 0 0 0

EREAD 0
EWRITE 0
ECKSUM 0
SCAT is very powerful tool and I am really impress with this tool. Using SCAT you can learn lot of details of you system...

Hope it will help someone, somewhere! Wish you a Happy Debug!

Thursday, September 10, 2009

How to check if Linux has Fibre card installed or not?

I got a request to check if Linux box has f Fibre card installed or not…. So, thought of putting this information on my blog too. Below are few methods that, you can tell end user if underline Linux server has Fibre card installed.

# lspci | grep -i emulex

0b:00.0 Fibre Channel: Emulex Corporation Zephyr-X LightPulse Fibre Channel Host Adapter (rev 02)

<<<< For more details like the speed card is operating etc >>>>

# lspci -vv | less

Fibre Channel: Emulex Corporation Zephyr-X LightPulse Fibre Channel Host Adapter (rev 02)

Subsystem: Emulex Corporation Zephyr-X LightPulse Fibre Channel Host Adapter

Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B-

Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-

SERR-

Latency: 0, Cache Line Size 10
Interrupt: pin A routed to IRQ 217
Region 0: Memory at f9ff0000 (64-bit, non-prefetchable) [size=4K]
Region 2: Memory at f9fe0000 (64-bit, non-prefetchable) [size=256]
Region 4: I/O ports at 4000 [size=256]
Capabilities: [58] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [60] Message Signalled Interrupts: 64bit+ Queue=0/4 Enable-
Address: 0000000000000000 Data: 0000
Capabilities: [44] Express Endpoint IRQ 0
Device: Supported: MaxPayload 2048 bytes, PhantFunc 0, ExtTag+
Device: Latency L0s <4us, L1 <16us
Device: AtnBtn- AtnInd- PwrInd-
Device: Errors: Correctable- Non-Fatal+ Fatal+ Unsupported-
Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
Device: MaxPayload 512 bytes, MaxReadReq 4096 bytes
Link: Supported Speed 2.5Gb/s, Width x4, ASPM L0s, Port 0
Link: Latency L0s <4us, L1 unlimited
Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
Link: Speed 2.5Gb/s, Width x4
Capabilities: [100] Advanced Error Reporting
Capabilities: [12c] Power Budgeting

<<<< You can check if required driver is installed or not >>>>

# dmesg | grep -i emulex
Emulex LightPulse Fibre Channel SCSI driver 8.0.16.34
Copyright(c) 2003-2007 Emulex. All rights reserved.

Another way to check this out is check for /proc/scsi

Fibre channel is visible in the /proc/scsi hierarchy, but the exact path depends on the manufacturer (Emulex, Qlogic) of the Fibre Channel adapter + device driver. in my case it is as below -

# ls -l /proc/scsi/lpfc/0
-rw-r--r-- 1 root root 0 Sep 10 03:36 /proc/scsi/lpfc/0

Yet another option is to check if kernel module is installed or not

# lsmod | grep -i lpfc
lpfc 170561 0
scsi_transport_fc 12353 1 lpfc
scsi_mod 120269 5 scsi_dump,lpfc,scsi_transport_fc,cciss,sd_mod

Wednesday, September 9, 2009

Modify tcp_nodelayack parameter on AIX

Disabling TCP-delayed acknowledgements on AIX systems -

On AIX systems, the default behavior for TCP connections results in delayed acknowledgements (Ack packets). When tcp_nodelayack is set to 0 (the default setting), TCP delays sending Ack packets by up to 200ms, the Ack attaches to a response, and system overhead is minimized.

Setting the tcp_nodelayack parameter to 1 causes TCP to send immediate acknowledgement (Ack) packets to the sender.

Setting tcp_nodelayack to 1 will cause slightly more system overhead, but can result in much higher performance for network transfers if the sender is waiting on the receiver's acknowledgement.

To make the parameter setting, issue the following:

# no -p -o tcp_nodelayack=1
Setting tcp_nodelayack to 1
Setting tcp_nodelayack to 1 in nextboot file

The -p flag makes the change persistent, so that it will still be in effect at the next boot. This is a dynamic change that takes effect immediately.

To verify if the setting has been changed -

# no -a | grep tcp_nodelayack
tcp_nodelayack = 1

can't open /etc/ntp/drift.TEMP: Permission denied

Okay... today I came across with some NTP drift file related issue. The error was - "Sep 8 13:39:01 XXXXXX ntpd[4694]: can't open /etc/ntp/drift.TEMP: Permission denied"

It was a easy solution, so thought of sharing this with you all.

This error is caused by an incorrectly configured /etc/ntp.conf file. In earlier versions of RedHat Enterprise Linux, the drift file was located in the /etc/ntp directory, owned by root. As the ntp daemon does not run as root, it cannot create a new drift file. The preferred location for the drift file in the newer releases of Red Hat Enterprise Linux is the /var/lib/ntp directory.

To correct this error, change the line in /etc/ntp.conf that reads:

driftfile /etc/ntp/drift

to:

driftfile /var/lib/ntp/drift

Hope this will help someone.

Friday, September 4, 2009

Rename VG in AIX

Rename VG in AIX

# lsvg
CPIdatavg
CPIindexvg
CPIbkupvg

We need to rename above VGs to QPIdatavg, QPIindexvg, QPIbkupvg respectively.

# lsvg -l CPIdatavg
# lsvg -l CPIindexvg
# lsvg -l CPIbkupvg

See what all filesystems associated with each VG and umount all of them first.

Now check what all disks associated with respective VG.

# lsvg -p CPIdatavg
CPIdatavg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk14 active 126 126 26..25..25..25..25
hdisk15 active 254 254 51..51..50..51..51

# lsvg -p CPIindexvg
CPIindexvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk16 active 126 126 26..25..25..25..25

# lsvg -p CPIbkupvg
CPIbkupvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk17 active 510 510 102..102..102..102..102

Varyoff all VG those needs to be rename

# varyoffvg CPIdatavg
# varyoffvg CPIindexvg
# varyoffvg CPIbkupvg

# lspv <<<< This command will now show all above disks inactive (It will not show anything in last field)
hdisk14 00c21e0284e6a5b0 CPIdatavg ----No Active Status---
hdisk15 00cfaefca8f9ee92 CPIdatavg ----No Active Status---
hdisk16 00c21e0284e7e4a0 CPIindexvg ----No Active Status---
hdisk17 00cfaefca8fdef94 CPIbkupvg ----No Active Status---

Now lets export the VG

# exportvg CPIdatavg
# exportvg CPIindexvg
# exportvg CPIbkupvg

After exporting look lspv O/P

#lspv
hdisk14 00c21e0284e6a5b0 None
hdisk15 00cfaefca8f9ee92 None
hdisk16 00c21e0284e7e4a0 None
hdisk17 00cfaefca8fdef94 None

Now let us import the VGs as shown below -

# importvg -y QPIdatavg hdisk14
# varyonvg QPIdatavg
# mount -a

Do same for rest of the VGs.

This method is not recommended for HACMP setup.

Find it