Adventures in upgrading to System Release 10.5

I took some time to help a customer upgrade their system from CSR (Cisco System Release) 9.1 to 10.5 recently.  As any good upgrade goes, it wasn’t without some significant drama….

[In particular, they were upgrading from CUCM 9.1(2) to CUCM 10.5(2);  CUCM IM&P 9.1 to 10.5(2);  CUC 9.1(2) to 10.5(2) and UCCX 9.0(2) to 10.6(1); Expressway x8.1 to x8.5]

Expressway-C and E was a textbook upgrade using the .gz upgrade files.

Unity Connection (aka CUC) was the first core component that we chose to bite off because it doesn’t have any version dependencies with the other components.  It was a textbook upgrade without issues.

The minor snag was CUC going unlicensed immediately because it was pointing to ELM on CUCM 9.1 and CUCM didn’t have 10.x licenses installed.  So watch out for that.  It was rather odd that CUC didn’t give us a 60-day grace period.  I moved it to it’s own PLM with appropriate licensing installed there.

We next chose to upgrade UCCX as 10.6(1) is compatible with CUCM 9.1(2). [http://docwiki.cisco.com/wiki/Unified_CCX_Software_Compatibility_Matrix_for_10.6%281%29]

This is a refresh upgrade so you must install a refresh upgrade COP file on UCCX before installing 10.6(1).  Because it is a refresh upgrade the system upgrades the underlying VoS (RHEL-based Voice Operating System) and CCX is down while the system reboots and upgrades the OS.  I selected to have CCX stay on 9.0(2) after the upgrade, so that 10.6(1) is the inactive version and will just need a version switch reboot to go live.

The switch version reboot turned into a bit of a mess.  I issued it but the server didn’t seem to actually do anything.  I issued a CLI utils system reboot command about 2 minutes later and it screamed back that I shouldn’t do that as the system was in a version switch and the database could be corrupted.  I let it sit about 30 minutes and tried again.  This time it rebooted without complaining and came up on 10.6(1).

I had to run the typical process of updating the CAD client (this customer will move to Finesse in the next phase of the upgrade), using the Client Configuration tools you download and install from CCX.

Next was the CUCM publisher.  This ended up being a multi-hour affair.  I’d already heard about the very common problem of the Common partition being full, so I’d taken the liberty to use RTMT to clear out old logfiles.  You basically go to Trace and Log Central in RTMT, select Collect Files, choose a time period (I chose the previous 5 month period) did not ZIP the files (for time sake) and most importantly checked the box to Delete the files from the server.

I had the Common partition down to 50% utilized before the upgrade.

It failed citing the good old bugid CSCuc63312:

There is not enough disk space in the common partition to perform the
upgrade. For steps to resolve this condition please refer to the Cisco
Unified Communications Manager 9.1(1) Release Notes or view defect CSCuc63312
in Bug Toolkit on cisco.com.

So second-guessing myself, I decided to use the sledge hammer known as ciscocm.free_common_space_v1.1.cop.sgn COP file to clean out the common partition (this COP file script nukes the currently installed inactive version).  I gave that a run and rebooted the server.

The next attempt at install also failed with the same CSCuc63312 error!

Knowing that the Common partition wasn’t the issue (which you can see in the install logs that you can copy and paste from the GUI), I started doing some digging around.  It turns out that if the main partition doesn’t have enough room (which you can see in the log files by searching for the word “needed”), it will throw the CSCuc63312 error erroneously.

I found a couple of TAC cases where the next step to clean up room on the main partition is to clean up the TFTP directory.  This customers TFTP directory was over 5GB in size from multiple versions of large firmware for endpoints like the DX650 and Telepresence codec firmware for endpoints like the C40, SX20, SX10, etc.

Even after cleaning up the TFTP folder I was still hit with the same stupid error message:  Not enough space in the Common partition.  Even though I knew Common had plenty of room and I was fighting a main partition space issue.

Some enlightenment

I remembered that from CUCM 10.0 on, the OVA template had increased the disk size from 80GB to 110GB.  It turns out that if you increase the size of the VM’s disk in 10.0 or greater, CUCM will automatically see the space and take advantage of it.  CUCM 9.x doesn’t do this automatically.

The special sauce to get 9.1 to expand the disk is to install the ciscocm.vmware-disk-size-reallocation-1.0.cop.sgn COP file, shut the VM down, resize the VM HDD in ESXi to 110GB and then boot it back up.  CUCM will reboot a couple of times during the process and then come all the way up.

The detailed instructions are here — http://www.cisco.com/web/software/282204704/18582/ciscocm.vmware_disk_size_reallocation_v1.0.pdf

After doing this CUCM 10.5(2) was able to install successfully.  Keep in mind that this is a refresh upgrade so the system will be down for an hour or so while VoS is updated, and the server will go through a couple reboots.

CUCM IM&P (CUP)

I initially tried to install the upgrade ISO and received an invalid checksum error.  Thinking it was the version of IM&P I’d downloaded from CCO (it’s been updated twice in the past week) I re-downloaded and hit the same error.  Note the current version of CUP is 10.5(2a) as of this writing.  If I’d paid attention to the documentation I’d have realized that you need to install the ciscocm.version3-keys.cop.sgn COP File which has the new keys that the 10.5(x) software images are signed with.  After installing this, the upgrade would recognize the ISO and upgrade.

Advertisements