Find it

Monday, October 3, 2011

Oracle-Solaris Patching using Live Upgrade

In today's post I'll be talking about a procedure for patching Solaris server having zones installed on it using Live Upgrade functionality.

The Solaris OS Recommended Patch Cluster provides critical Solaris OS Security, Data Corruption, and System Availability fixes & hence it is advisable to patch your Solaris systems twice in a year (atleast), as per Oracle-Sun's Critical Patch Update release schedule, I prefer to execute patch cycle for my environment in end of April and sometime late October every year.

Oracle-Sun CPUs are released on the Tuesday closest to the 17th of January, April, July, and October –

See - http://www.oracle.com/technetwork/topics/security/alerts-086861.html

In my environment, I use Live Upgrade to patch our Solaris systems. Reason behind using Live Upgrade for patching purpose are - 

1. Create a copy of the system environment; that is, a copy of the root (/) file system

2. Live Upgrade has build-in feature for splitting the mirrors of an SVM mirrored root (detach, attach, preserve options on lucreate) hence low overhead to deal with SVM mirror break stuffs separately etc.

3. Less downtime (not more than 15-20 mins) and minimal risk.

4. Better back out option. In case something breaks after patching revert to old BE and be at stage from where started, again that doesn’t take much downtime and safe option.

5. The most appropriate option for those Solaris servers who have zones/containers installed on it.


There might be many more benefits out there, however I find above benefits best fit for my purpose.

So to summarize, all tasks except the reboot can be accomplished on an operational production system; the impact on any running process is minimal. Live Upgrade is a combination of maximizing system availability when applying changes and minimizing risk by offering the ability to reboot to a known working state (your original environment).

Well. let's see how to do it in real life, in my current environment we have many servers which uses Solaris Volume Manager as their primary volume manager to manage the disk and data. So, let's take a look at patching procedure to patch servers who have SVM installed and configured on it along with zones installed on it sitting on ZFS filesystem.

Let us grab the output of metastat to understand metadevice placement -

# metastat -c
d32              p  1.0GB d4
d33              p  1.0GB d4
d36              p   40GB d4
d35              p  1.0GB d4
d34              p  4.0GB d4
d60              p   16GB d4
d30              p  1.0GB d4
d31              p  1.0GB d4
    d4           m  100GB d14 d24
        d14      s  100GB c1t0d0s4
        d24      s  100GB c1t1d0s4
d103             m   10GB d23 d13
    d23          s   10GB c1t1d0s3
    d13          s   10GB c1t0d0s3
d100             m   10GB d20 d10
    d20          s   10GB c1t1d0s0
    d10          s   10GB c1t0d0s0
d1               m   16GB d11 d21
    d11          s   16GB c1t0d0s1
    d21          s   16GB c1t1d0s1

Alright, my / is on d100 and /var is on d103. Let us create an alternative boot environment out of it.

# lucreate -c Sol10 -n Sol10pu -m /:/dev/md/dsk/d0:ufs,mirror -m /:/dev/md/dsk/d20:detach,attach -m /var:/dev/md/dsk/d3:ufs,mirror –m /var:/dev/md/dsk/d23:detach,attach

Here I'm trying to create a metadevice d0 representing / UFS filesystem having a sub-mirror d20 (sub-mirror d20 first gets detach from d100 and then attach to d0). Same thing applicable for /var filesystem and it's meta device configuration.

In above command I'm creating a new boot environment called Sol10pu using option “-n”, option “-m” Specifies the vfstab information for a new UFS-based BE.

NOTE: The -m option is not supported for BEs based on ZFS file systems.

NOTE: In case you're performing upgrade and patching in one go then point to be ponder - Before upgrading, you must install the Oracle Solaris Live Upgrade packages from the release to which you are upgrading. New capabilities are added to the upgrade tools, so installing the new packages from the target release is important. Example, you need to upgrade from Oracle Solaris 10 update 4 to Oracle Solaris update 8, so you must get the Oracle Solaris Live Upgrade packages from the Oracle Solaris update 8 DVD.


Once above command finishes, you will see you meta device configuration changed as follows -

# metastat -c
d32              p  1.0GB d4
d33              p  1.0GB d4
d36              p   40GB d4
d35              p  1.0GB d4
d34              p  4.0GB d4
d60              p   16GB d4
d30              p  1.0GB d4
d31              p  1.0GB d4
    d4           m  100GB d14 d24
        d14      s  100GB c1t0d0s4
        d24      s  100GB c1t1d0s4
d103             m   10GB d23
    d13          s   10GB c1t1d0s3
d100             m   10GB d20
    d10          s   10GB c1t1d0s0
d3               m   10GB d13
    d23          s   10GB c1t0d0s3
d0               m   10GB d10
    d20          s   10GB c1t0d0s0
d1               m   16GB d11 d21
    d11          s   16GB c1t0d0s1
    d21          s   16GB c1t1d0s1


d0 and d3 has one sub-mirror and d100 and d100 has one sub-mirror associated.

Also you will be able to see two boot environments on your Solaris system -

# lustatus
Boot Environment           Is       Active Active    Can    Copy
Name                       Complete Now    On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
Sol10                      yes      yes    yes       no     -
Sol10pu                    yes      no     no        yes    -



Fine, so now we have 2 boot environments with us and we are going to patch the alternate BE (sol10pu) using a patching tool called PCA, BTW - I use PCA (Patch Check Advance) tool to apply patches to our Solaris systems. PCA has been setup to download patches via local web proxy to access outside systems.

PCA setup tips can be found at - http://www.par.univie.ac.at/solaris/pca/usage.html

What PCA needs in case setting it up -

- PERL distribution
- At least one server which is internet facing (this server will then act as a proxy to rest of servers)
- Patch cross-reference file called patchdiag.xref (latest one always while patching)
- Valid Oracle support (MOS) user ID and password
- If at all required, some wrapper scripts to PCA


To do so, let us mount the alternate BE on mount point say /a

# lumount Sol10pu /a
/a


Now I'll create a temporary directory to download the missing & required patches,

# mkdir -p /patchman/patches
My next job is to generate patch_order file,

# /bin/pca missingrs -R /a -H --format "%p-%c" > /patchman/patches/patch_order

Where -R stands for Alternative root directory
Where -H stands for Don't display descriptive headers
Where --format stands for Set output format to FORMAT


And go get them, download -

# /bin/pca missingrs -R /a -d -P /patchman/patches/ 

Where -d stands for download patches
Where -P stands for Patch download directory


Unmount the ABE -

# luumount /a

Now if you populate the /patchman/patches directory then you will see the list of patches in there.

Unzip all those patches -

# for i in `ls *.zip`;do
  unzip $i
  rm $i
done


Okay, at this stage we are ready upgrade ABE with patches available -

# cd /patchman/patches; luupgrade -n Sol10pu -s /patchman/patches -t `cat patch_order`

NOTE: Reactive patching may occasionally be necessary to address break-and-fix issues so in this case you can use LU with something like -

Apply single patch to ABE -

# luupgrade -n Sol10pu -s /patchman/patches -t

This will update patches on global as well as non-global zones.

Once the patches are installed it will automatically un-mounts the ABE sol10pu mounted on mount point /a.

Now it's time to activate the ABE sol10pu which just been patched using Live Upgrade utility.

# luactivate Sol10pu
A Live Upgrade Sync operation will be performed on startup of boot environment .
**********************************************************************

The target boot environment has been activated. It will be used when you
reboot. NOTE: You MUST NOT USE the reboot, halt, or uadmin commands. You
MUST USE either the init or the shutdown command when you reboot. If you
do not use either init or shutdown, the system will not boot using the
target BE.

**********************************************************************

In case of a failure while booting to the target BE, the following process
needs to be followed to fallback to the currently working boot environment:

1. Enter the PROM monitor (ok prompt).

2. Change the boot device back to the original boot environment by typing:

     setenv boot-device /pci@0/pci@0/pci@2/scsi@0/disk@0,0:a

3. Boot to the original boot environment by typing:

     boot

**********************************************************************

Modifying boot archive service
Activation of boot environment successful.


# init 6
updating /platform/sun4v/boot_archive
SYSTEM GOING DOWN!!!!

NOTE: Live upgrade always uses init 6 or shutdown commands. Halt and reboot commands will create big time bang, be aware!!!

Once system is up it should show new kernel patch version.

Great, it's been a week after patching and application, DB owners are happy with patching stuffs and now we need to perform post patching stuffs upon certain confirmations.

POST PATCHING WORK TO DO - A Week LATER....
=========================================

Now a week later I need to delete the old boot environment and rebuild the metadevices to be in mirror layout.

# lustatus
Boot Environment           Is       Active Active    Can    Copy
Name                       Complete Now    On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
Sol10                      yes      no     no        yes    -
Sol10pu                    yes      yes    yes       no     -


# ludelete Sol10
Determining the devices to be marked free.
Updating boot environment configuration database.
Updating boot environment description database on all BEs.
Updating all boot environment configuration databases.
Boot environment deleted.

So, the meta devices are looking like as below -

# metastat -c
d32              p  1.0GB d4
d33              p  1.0GB d4
d36              p   40GB d4
d35              p  1.0GB d4
d34              p  4.0GB d4
d60              p   16GB d4
d30              p  1.0GB d4
d31              p  1.0GB d4
    d4           m  100GB d14 d24
        d14      s  100GB c1t0d0s4
        d24      s  100GB c1t1d0s4
d103             m   10GB d23
    d13          s   10GB c1t1d0s3
d100             m   10GB d20
    d10          s   10GB c1t1d0s0
d3               m   10GB d13
    d23          s   10GB c1t0d0s3
d0               m   10GB d10
    d20          s   10GB c1t0d0s0
d1               m   16GB d11 d21
    d11          s   16GB c1t0d0s1
    d21          s   16GB c1t1d0s1

Now clear the d100 & d103 mirrors.

# metaclear d100 d103
d100: Mirror is cleared
d103: Mirror is cleared

# metastat -c
d32              p  1.0GB d4
d33              p  1.0GB d4
d36              p   40GB d4
d35              p  1.0GB d4
d34              p  4.0GB d4
d60              p   16GB d4
d30              p  1.0GB d4
d31              p  1.0GB d4
    d4           m  100GB d14 d24
        d14      s  100GB c1t0d0s4
        d24      s  100GB c1t1d0s4
d3               m   10GB d13
    d23          s   10GB c1t0d0s3
d0               m   10GB d10
    d20          s   10GB c1t0d0s0
d1               m   16GB d11 d21
    d11          s   16GB c1t0d0s1
    d21          s   16GB c1t1d0s1
d13              s   10GB c1t1d0s3
d10              s   10GB c1t1d0s0

Next attach the sub-mirrors d10 & d13 to metadevices d0 and d3 respectively.

# metattach d0 d10
d0: submirror d10 is attached

# metattach d3 d13
d3: submirror d13 is attached

Hence my final meta device placement looks like as follows -

# metastat -c
d32              p  1.0GB d4
d33              p  1.0GB d4
d36              p   40GB d4
d35              p  1.0GB d4
d34              p  4.0GB d4
d60              p   16GB d4
d30              p  1.0GB d4
d31              p  1.0GB d4
    d4           m  100GB d14 d24
        d14      s  100GB c1t0d0s4
        d24      s  100GB c1t1d0s4
d3               m   10GB d23 d13 (resync-25%)
    d23          s   10GB c1t1d0s3
    d13          s   10GB c1t0d0s3
d0               m   10GB d20 d10 (resync-45%)
    d20          s   10GB c1t1d0s0
    d10          s   10GB c1t0d0s0
d1               m   16GB d11 d21
    d11          s   16GB c1t0d0s1
    d21          s   16GB c1t1d0s1

That's it. Now your done with patching your Solaris server and zones deployed on it.