TFS Production upgrades: stay calm and stick to the plan!

When planning a big migration upgrade to TFS 2017 (from TFS 2013) on new hardware, the exact planning of all actions can be very important to make sure the downtime of the TFS environment can be as short as possible and there’s at least some buffer to fix unexpected issues. That’s why I always try to perform production migrations in a week-end and that’s why you should always run a trial migration to have an idea about the total duration.

For this specific migration I’m doing this week-end, TFS 2017 is only an intermediate step because the customer also wants to migrate to VSTS from the TFS 2017 environment. I managed to do this without any issues during a trial run.

So, the plan for the production migration was: bringing the TFS environment offline on Friday evening and already launching the TFS 2017 upgrade wizard on Friday evening to make sure the long upgrade process can continue to run during the night. During the trial upgrade, this process took about 4 hours.

Unfortunately when logging back in on Saturday morning, I noticed the upgrade process failed after more than 3 hours (step 1523 of 1621) due to error TF30042: The log file for the database is full.

UpgradeError

The dedicated log disk on the server was indeed full. Seeing this error might freak you out because first you will believe that the complete upgrade failed and you need to start all over again. This might jeopardize the full plan to have a working VSTS environment on Monday morning.

This is for sure a moment to stay calm and to properly assess the situation and read all text which is available for you in the log file and also have a good look at the warning message in the TFS Upgrade wizard:

One or more project collections failed to upgrade … Start the Administration Console and navigate to the Team Project Collections node to attempt retrying the upgrade for each failed collection.

No need to start all over again! Fix the error which can be found in the error log and try to resume the upgrade process. In my situation I had to clean up the dedicated log file disk before rerunning the job from the TFS Administration Console.

ResumeUpgradeProcess

And indeed, the upgrade process resumed from step 1523 …

I only lost a bit of processing time, but still ok to finish the complete upgrade process before Monday morning …

Having done about 50 TFS upgrades in the last couple of years, I never had to cancel a production upgrade. I always delivered the new environment on time. Of course, there were times were unexpected issues came up or where I needed to perform some aftercare when the new environment was already up-and-running.

Rule #1: always have a backup plan in case of a hard failure

Rule #2: stay calm and properly assess the situation

Rule #3: call help before doing crazy stuff in a production environment

Leave a comment