Find it

Thursday, March 18, 2010

Solaris Link Aggregation using dladm

A Solaris "Link Aggregation" is the act of bonding several interfaces together to create a single logical interface.

A link aggregation consists of several interfaces on a system that are configured together as a single, logical unit. Link aggregation, also referred to as Trunking, is defined in the IEEE 802.3ad Link Aggregation Standard.

The IEEE 802.3ad Link Aggregation Standard provides a method to combine the capacity of multiple full-duplex Ethernet links into a single logical link. This link aggregation group is then treated as though it were, in fact, a single link.

The following are features of link aggregations:

1. Increased bandwidth – The capacity of multiple links is combined into one logical link.

2. Automatic failover/failback – Traffic from a failed link is failed over to working links in the aggregation.

3. Load balancing – Both inbound and outbound traffic is distributed according to user selected load balancing policies, such as source and destination MAC or IP addresses.

4. Support for redundancy – Two systems can be configured with parallel aggregations.

5. Improved administration – All interfaces are administered as a single unit.

6. Less drain on the network address pool – The entire aggregation is assigned one IP address.

Before going toward HOW-TO, Your aggregation configuration is bound by the following requirements:

1. You must use the dladm command to configure aggregations.

2. An interface that has been plumbed cannot become a member of an aggregation.

3. Interfaces must be of the GLDv3 type: bge, e1000g, xge, nge, rge, ixgb.

4. All interfaces in the aggregation must run at the same speed and in full duplex mode.

5. “Legacy” data link provider interfaces (DLPI ), such as the ce interface do not support Solaris link aggregations. Instead, you must configure aggregations for legacy devices by using Sun Trunking. You cannot configure aggregations for legacy devices by using the dladm command.

How aggregation works?

The MAC layer, which is part of GLDv3, is the central point of access to Network Interface Cards (NICs) in the kernel. At the top, it provides a client interface that allows a client to send and receive packets to and from NICs, as well as configure, stop and start NICs. A the bottom, the MAC layer provides a provider interface which is used by NIC drivers to interface with the network stack. In the figure above, the client is the Data-Link Service (DLS) which provides SAP demultiplexing and VLAN support for the rest of the stack. The Data-Link Driver (DLD) provides a STREAMS interface between Nemo and DLPI consumers.

The core of the link aggregation feature is provided by the "aggr" kernel pseudo driver. This driver acts as both a MAC client and a MAC provider. The aggr driver implements a MAC provider interface so that it looks like any other MAC device, which allows us to manage aggregation devices as if they were a regular NIC from the rest of Solaris.

Make sure your eeprom’s local-mac-address? variable is set to true.

# eeprom local-mac-address?


# eeprom local-mac-address? = true
# eeprom local-mac-address?


PS: Above step is not applicable for x86.

1. Unplumb the interfaces to be aggregated:

# ifconfig bge0 down unplumb
# ifconfig bge1 down unplumb

2. Create a link-aggregation group with key 1. key Is the number that identifies the aggregation. The lowest key number is 1. Zeroes are not allowed as keys.

passive mode by default:

# dladm create-aggr -d bge0 -d bge1 1

# mv /etc/hostname.bge0 /etc/hostname.aggr1

3. Perform reboot. Good to go for reconfigure reboot but not required.

# reboot -- -rv

After reboot –

You can check the status of the aggregation with the dladm command...

# dladm show-aggr
key: 1 (0x0001) policy: L4 address: 0:14:4f:2b:be:18 (auto)
device address speed duplex link state
bge0 0:14:4f:2b:be:18 1000 Mbps full up attached
bge2 0:14:4f:2b:be:1a 1000 Mbps full up attached

Hope that helps.

Friday, March 12, 2010

FB DIMM slots... By the way what is Fully Buffered DIMM (FB DIMM)?

Well, Just got a new Sun SPARC Enterprise T5220 Server & was just reading through the server specs. While reading on it I saw FB DIMM memory slot so to be honest I'm not a very good at Sun hardware knowledge so obliviously I was not aware of what is FB DIMM slot means?

Let's have a tour of introduction on FB DIMM (Fully Buffered DIMM) memory slot.

FB-DIMM (Fully Buffered DIMM) is a memory module technology targeted to servers developed recently created in order to increase the memory speed and the maximum memory capacity of a server.

The main difference between FB-DIMM modules and regular DIMM modules is that on FB-DIMM the communication between the memory controller (chipset) and the module is serial, in the same way that occurs with PCI Express, while on standard DIMM modules this communication is parallel.

Using serial communication the number of wires needed to connect the chipset to the memory module is lower and also allows the creating of more memory channels, what increases memory performance. With FB-DIMM technology it is possible to have up to eight modules per channel and up to six memory channels. So this technology increases both memory capacity and speed.

Another important aspect of FB-DIMM is that it uses different paths for data transmission and data reception. Standard DIMM modules use the same path to both transmit and receive data. So DIMM is like a road with two-way traffic so more chances of traffic jam or some times collision however FB-DIMM is like a one-way road so smooth & super fast traffic & almost no chances of collision.

The system used on FB-DIMM modules helps to increase the performance of the memory subsystem.

So the bottom line is using FB-DIMM slots we can get greater memory capacity and higher performance.

Some of the implementation of Fully Buffered DIMM are as follows -

• Sun Microsystems is using FB-DIMMs for the Niagara II (UltraSparc T2) server processor. Few server that I know -

Sun SPARC Enterprise T5220 Server
Sun SPARC Enterprise T5140 Server
Sun SPARC Enterprise T5240 Server
Sun SPARC Enterprise T5440 Server
• Intel has adopted the technology for their newer Xeon 5000/5100 series and beyond, which they consider "a long-term strategic direction for servers".
• Intel's enthusiast system platform Skulltrail uses FB-DIMMs for their dual CPU socket, multi-GPU system
• Some of the HP ProLiant servers

Thursday, March 11, 2010

/tmp: File system full, swap space limit exceeded

Solaris 10 by default places /tmp on swap. This is good for speed, but not so good on a general purpose box where some applications may fill up /tmp. If you fill /tmp, you essentially reduce the amount of available swap to 0. This can lead to trouble, run out of physical ram, and new processes may not start.

You will get errors like shown below -

# ps -ef
-bash: fork: Not enough space

# prstat
-bash: fork: Not enough space


# dmesg
Jan 7 02:56:51 tmpfs: [ID 518458 kern.warning] WARNING: /tmp: File system full, swap space limit exceeded
Jan 7 02:56:57 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 8223 (exim)
Jan 7 02:57:26 genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 563 (httpd)

The easiest way to fix this is to immediately disable any services that eat RAM using svcadm disable, and clear out /tmp however in this case you will have to take the application downtime. You can then either move /tmp to a physical partition by editing /etc/vfstab, increase the amount of swap, or THE best way out is limit the amount of swap /tmp can use by adding a mount option to /etc/vfstab:

# grep /tmp /etc/vfstab
swap - /tmp tmpfs - yes SIZE=2048M

Unfortunately with this you have to reboot the box!!

Tuesday, March 9, 2010

Solaris – Add/remove network interface to a running zone (dynamic Change)

Solaris – Add/remove network interface to a running zone (dynamic Change)

This will describe how to add a network interface to a running non-global zone, without having to reboot the zone. The new interface will persist between reboots.

First you add the entry to the zone configuration. This is the part that lets it persist between reboots. This is done from the global zone:

# zonecfg -z zone1
zonecfg:slabzone1> add net
zonecfg:slabzone1:net> set address=XXX.XXX.XX.XXX
zonecfg:slabzone1:net> set physical=bge0
zonecfg:slabzone1:net> end
zonecfg:slabzone1> verify
zonecfg:slabzone1> commit
zonecfg:slabzone1> exit

Now we have to manually add a new interface to the running zone. Do this from the global zone as well

# ifconfig bge0 addif XXX.XXX.XX.XXX netmask XXX.XXX.X.X zone zone1 up

Created new logical interface bge0:3

Note: The ‘addif’ tells ifconfig to create a logical interface using the next available.

# ifconfig -a
lo0:1: flags=2001000849 mtu 8232 index 1 inet netmask ff000000
bge0:1: flags=1000843 mtu 1500 index 2 inet XXX.XXX.XX.XXX netmask ffffff00 broadcast XXX.XXX.XX.XXX
bge0:3: flags=1000843 mtu 1500 index 2 inet XXX.XXX.XX.XXX netmask ffffff00 broadcast XXX.XXX.XX.XXX

That's it! you're done.

In case you want to remove the interface -

To remove the interface from a running zone. From the global zone, remove the interface. You must first determine which logical interface [alias] you wish to remove.

# ifconfig bge0:3 down
# ifconfig bge0:3 unplumb
# zonecfg -z zone1
zonecfg:slabzone1> remove net address=XXX.XXX.XX.XXX
zonecfg:slabzone1> commit
zonecfg:slabzone1> exit


Monday, March 8, 2010

Solaris mdb debugger tool & memstat information.

 Mdb is a good debugger tool in Solaris. We had an issue with one of out production Oracle DB server. The issue was with excessive memory usage so using mdb we easily figure out who is using how much memory. So using mdb we easily figure out who is using how much memory & it helped us to at least figure out what causing the performance bottleneck.

# mdb -k
Loading modules: [ unix krtld genunix md ip ipc usba ptm cpc random nfs ]
> ::memstat
Page Summary Pages MB %Tot

------------ ---------------- ---------------- ----

Kernel 868241 6783 11%

Anon 7077614 55293 88%

Exec and libs 7308 57 0%

Page cache 8933 69 0%

Free (cachelist) 131769 1029 2%

Free (freelist) 18446744073709539318 17592186044319 0%
Total 8081567 63137

Where kernel indicates, system memory usage by kernel. Anon indicates that of anonymous memory.

What is anonymous memory?

Anonymous memory refers to pages that are not directly associated with a vnode. Such pages are used for a process's heap space, its stack, and copy-in-write pages.
Exec and libs indicates executable/shared library paging.

Next it shows page cache. The page cache is used for caching of file data for file systems other than the ZFS file system. The file system page cache grows on demand to consume available physical memory as a file cache and caches file data in page-size chunks. Pages are consumed from the free list as files are read into memory. The pages then reside in one of three places: the segmap cache, a process's address space to which they are mapped, or on the cache list.

Freelist and the cache list hold pages that are not mapped into any address space that have been freed by page_free(). Though pages in the cache list are not really free, they still contain a valid vnode, offset pair and are a valid cache of pages from files. But pages in the free list are not associated with any vnode, offset pair. Pages are put on a free list when a process using those pages exit.

Hope this will help.