From the frontline: Octopus Server performance tuning
We’ve been running Octopus Deploy since 2.0 back in January 2014 as part of our head-to-head battle royale with Release Management 2013. We decided on Octopus in February and our pilot testing started in late March early April with a handful of teams and users. Our shiny server (*ahem* virtual machine) with Octopus Deploy was chugging along and we all had an oh so wonderful user experience. Since 2.0, we’ve upgraded steadily through the releases up to 2.5.4 as of late July and which we’re currently running. Over the past few months, our Octopus UI response time has been less than stellar – but perhaps that’s too positive a spin on things. It’s been bad. Awful. But before I go on, there are several factors to take into account so I’ll start with the basics:
- Started out with 2 cores but now we’re up to 10; more than enough power.
- Initially we had 4GB of RAM but after our upgrade, we now have 16GB.
- Disk we have plenty of disk space; system disk and data (for RavenDB + backups).
- Configuration – we use Active Directory.
- Database size – growing very rapidly each day. In December, 1 full backup was 500MB. Today, we’re over 800MB. The rate seems to grow from 5 to 10MB/day.
- Network – our Server is used all over the world. Network I/O can be a factor.
Hardware-wise, things look fine. But the nature of the slow response time slowly grew and late last year it was very noticeable. But there are more details to provide:
- Version – We’re not using the latest and greatest. As of this time, that’s 2.6.1. We’re still on 2.5.4. We’re scheduling to upgrade to 2.5.12 shortly for the UI performance patches. 2.6 actually breaks* some of these improvements but that’s not the entire reason we’re avoiding 2.6 for the interim.
- Load – Since our user base has increased substantially (500+ users) and the number of project groups (150+), projects (240+) and tentacles (700+), it’s safe to say our resource demand has substantially increased since August of 2014.
- NuGet – We’d had issues crop up with our NuGet server (completely independent of Octopus) that led to some odd interactions with Octopus Deploy which affected performance.
* Update: Vanessa Love brought to my attention that my phrasing was a little off and she’s right. I should be more clear about detail #1. Version 2.5.5 makes a serious dashboard performance improvement as listed on the release notes (issue 1084). Those changes are included within 2.6 – however, if you read the release notes for 2.6, it appears at first glance that they revert those performance fixes (issue 1392) – but they’re not. They’re simply reverting to the previous dashboard to keep the performance fix in place but they’re not using the newer 2.6 release dashboard that was included in the alpha. So this is my attempt to rectify that poor explanation.
Several weeks ago, performance was terrible. Something had to be done. Pages were taking minutes to load. And yet.. our CPU and RAM utilization was average at best – they hovered between their normal ranges of 20-40%. No peaks/spikes/valleys or anything. 10 cores and 16GB of RAM – yet no visible load. What gives? Doing some research on the Octopus Deploy documentation site, I stumbled onto this gem: Switch from single-threaded workstation GC to concurrent server GC. The article mentions this as a solution if you’re running into server crashes (which we weren’t) but the single-threading title caught my eye. On our server, 8 of the 10 cores were slacking off compared to the two cores that were doing all the work. Hmmmmm. How’d I miss this?
As the directions indicate: in the Octopus.Server.exe.config file, find the <runtime> tag. If the gcConncurrent tag is there with the enabled attribute set to false, you’re running in Workstation Garbage Collection (single-threaded) mode.
<runtime> <gcConcurrent enabled="false" /> </runtime>
My jaw dropped. My inner monologue went something like this:
“Oh… my.. gaaaaaaawd. You must be kidding me. Why the hell didn’t I think to look here?”
After a ten to fifteen seconds of disbelief I changed the tag to <gcServer> and set the enabled attribute to “true”.
<runtime> <gcServer enabled="true" /> </runtime>
Over ten months of running a huge Octopus Server VM in single-threaded Concurrent Workstation Garbage Collection mode. Deskpalm.
I decided to do a little research and Chris Lyon’s blog post: Server, Workstation and Concurrent GC. His post is pretty dated (2004) but the explanation is really good (and I don’t know if anything has drastically changed since then):
Server GC is only available on multi-proc machines. It creates one GC heap (and thus one GC thread) for each processor, which are collected in parallel. This GC mode maximizes throughput (number of requests per second) and shows good scalability (performance really shines with 4 or more processors).
Workstation is the default GC mode. On a single-proc machine, it’s the only option.
Concurrent GC is used in Workstation mode on a multi-proc machine. It performs full collections (generation 2) concurrently with the running program, minimizing the pause time. This mode is particularly useful for applications with graphical user interfaces or applications where responsiveness is essential.
I’ve heard of garbage collection modes before but I never worked with it first-hand in C# or as an application user. That said, once we changed to Server GC and restarted the service, Octopus Deploy’s UI response time improved tremendously. I don’t have official HTTP Watch sessions saved or any kind of coded UI response tests or metrics to measure, but even from my day-to-day usage, the UI response time is far better than it was in Concurrent Workstation GC mode. I’m going to revisit garbage collection modes in the near future, but this is a really simple fix if your situation calls for better performance (and you have the resources to take the hit).
Now, hopefully, if you’re running a relatively large Octopus Deploy server, this may be of some help in terms of UI performance. Make sure to check that Octopus.Server.exe.config for the garbage collection mode. That said, 3.0 (when it arrives) will most certainly change the landscape of Octopus performance tuning. With SQL Server back-ends and Octopus Application Servers and Load balancers, GC mode will still be important, but there will be other factors into scaling Octopus Deploy for the enterprise from 3.0 onward.