RFC Octopus Data Center Manager

I’ve been using Octopus Deploy since early 2014. Around version 2.2 or 2.3 is when I started investigating Octopus and then implementing it in an enterprise scale environment. It’s been over a three year long journey for myself. A ton of changes have been made to Octopus during that time and it’s not without a sense of appreciation and respect that I have for the folks at Octopus Deploy in part due to fact that:

it’s simply awesome
very easy to use and understand
it keeps getting better
they take input from request for comments (RFC)

It’s not often that companies do RFCs this although I will say UserVoice forums for numerous products are far more common today then just a few years ago. While I have remained silent on numerous RFCs as I felt my issues were more a matter of scale and would not apply to the majority of Octopus users but this latest RFC for Octopus Data Center Manager (ODCM) has my full attention and within this post, has my comments on their RFC.
If you’re looking at deploying at enterprise scale, we’ve already demonstrated that it can be done and I’ll explain what/why/how we customized Octopus for the enterprise in more detail in another post. ODCM is a big step within the enterprise space towards mastering enterprise-level deployment administration. I emphasize mastering because up to this point, I’ve managed to administer Octopus using .. well frankly Octopus and PowerShell. So, it’s with the hope I have that ODCM to either remove and replace some of the custom built automation we’ve made over that past few years in favor of more broader approach like with ODCM or at the very least smooth some of the rough points for us and our users. While implementing our own home-grown PowerShell to extend Octopus is awesome in and of itself, I’d prefer not to roll too much customization if possible. Some of our custom automation is exclusive for our purposes and should not be removed but for a lot of what we do, I can easily imagine something like ODCM taking the reigns.
While many Octopus Deploy users are more likely within the small to medium size organizations, the fact that Octopus can be applied at nearly any size organization speaks to its versatility. Looking forward to Octopus 4.0, just the notion of ODCM is enticing for enterprises that already have Octopus installed but should also be a benefit for small and medium sized organizations as well. To know that Octopus can grow with an organization is a big benefit that I think is often overlooked.
My Current State
Today, our Octopus installation stands at 16 High-Availability (HA) nodes; 8 groups of 2-HA node clusters, a SQL Server Always-On cluster, and 2 Windows DFS servers for storage of NuGet, Artifacts, and TaskLogs. Since January of this year through March we’re averaging about 13,656 deployments per month. Of those 13,636 deployments, 1534 on average per month were to production. From January to March, we had an increase between 5%-6% of deployments month over month. We have 320 octopus instances, however we have some instances that share multiple project groups so we actually have 413 octopus project groups which paints a more accurate picture. With all of our instances we have approximately 3000 servers being deployed to in datacenters, AWS, and Azure.
For what used to take 30 minutes (usually longer) teams now deploy within minutes and have little to no interaction with the process. The time gained and budgeting recouped from all the deployment automation has been estimated to be at least seven-figures. So to answer the question if Octopus can scale, it most assuredly can. I mentioned earlier we are using Octopus to manage all of our Octopus HA front ends as well as our NuGet/Artifact/Tasklog storage within DFS. Using PowerShell and the Octopus REST API or Client API library, we’ve managed many activities all through out Octopus to automate most repetitious of tasks on our behalf.
Fully leveraging the REST/Client APIs has made us more robust and efficient and we’re going to step it up a notch in the months to come. I’ll discuss more about that in another post, but I still get a sense that many people treat or view Octopus as a toy or something niche – it looks really cool but it doesn’t quite look like the solution. Look, I get it. Octopus looks like a toy. Oh look, it’s little deployment automation tool! How cute! I know there are other tools – Visual Studio Team Services definitely comes to mind. But I would make the argument that while Octopus is a fantastic deployment automation tool, I would state that it is one of the best automation tools (regardless of deployments or not). Yes, Octopus caters to applications but the infrastructure and plumbing make it feasible to automate other tasks that aren’t directly related to applications themselves. I think a lot of people realize this but often Octopus is sold as a deployment automation tool when in fact it can do far more than just automate deployments.
Regardless of my minor diatribe, Octopus Deploy’s user base appears to be only growing larger and ultimately more diverse – which is great but a diverse audience can introduce a lot of different wish-list features, functionality, and priorities. This latest RFC for the Octopus Data Center Manager while seemingly targeted squarely on enterprise level installs also affects those who are not at such a scale. So to properly respond to Octopus’s RFC on ODCM, I’ll address their topics one by one.
Giving teams their own Space
I think the term space is significant because for the longest time I had no idea what to call it. Currently, we implement the concept of spaces by using individual octopus instances for teams with their own unique URL. On a project-by-project basis, we create a brand new octopus instance for any team that is not sharing servers with any other projects. For projects that share servers they all get lumped into one octopus instance. The benefit primarily is tentacle management and within Octopus 3.x HA installations you’ll want to go with listening tentacles as they are significantly easier to configure and manage. Polling tentacles work but you need to configure each tentacle individually for each Octopus HA server they need to poll. We rarely move an Octopus instance to another Nodeset but it does happen. If we had polling tentacles, we would have to find a way to automate their configuration changes as well. It’s one more thing we don’t need in the enterprise to do but having it as an option is good enough. If you have several hundred or thousands of servers, re-configuring polling tentacles becomes too time consuming and challenging to manage en masse. I’m not knocking polling tentacles – they’re best used when the situation is appropriate however listening tentacles are far simpler and easier to manage at a large scale. Having each listening tentacle listen for a specific octopus instance based on thumbprint makes life a lot easier when you’re working with a high volume of tentacles.
Additionally, by having shared octopus instances the tentacles are very easily shared across environments. There’s no need to install multiple tentacle instances and although we can, we prefer not to. Less is more. The only penalty of having one tentacle instance per server is if two teams are deploying to the same server at the same time, one of the two projects has to wait – but that’s really not a major concern. Once the first deployment is done the tentacle will pickup the second deployment and process it. The Octopus tentacle today is (oddly enough) dumber from previous versions but it performs better than previous versions.
We can leverage the concept of spaces by using octopus instances primarily because of our custom teams, roles and using Active Directory makes things so easy that our team effectively does zero user management. This is a must in the enterprise. Active Directory (AD) is the glue that makes it all work with TFS/VSTS AD groups and having Octopus Deploy simply leverage said AD groups makes things very easy for us to manage and teams can control whom gets access to what without needing full administrative access in Octopus. We (the DevOps team) stay out of the picture as we should. There are too many teams to support and to know each and every team’s minutiae of details is simply not feasible or supportable.
For the RFC itself, the term spaces seems to be more of a narrative piece that ties everything together. Thanks to Octopus Deploy for phrasing such a simple thing that didn’t really have a name. We refer to spaces today as ‘instances’ as it’s the most literal term describing our environment, but spaces seems more appropriate and doesn’t obfuscate the implied meaning of instances within Octopus Deploy.

Identity management

Leveraging Active Directory and creating custom roles and teams in a consistent manner has been the foundation of our identity management. While we’re able to have spaces-like feature using Octopus instances, it still requires users to constantly re-log into an instance as each ‘space’ is a different URL. The Identity management feature suggested in the RFC would be a huge feature for us as we basically avoid all that friction altogether. As long as there’s some Active Directory integration that makes this all work, I don’t see any problems with feature. In fact everyone across any organization size will easily benefit from this because this allows an organizational boundary between two spaces, yet makes the user visibly see both spaces and allow the user to easily switch from one to another. This is a win-win for everyone.

I suspect Octopus will of course require space names (as all names within an instance such as project group, project, teams, etc.) need to be unique. I would be interested to see if Octopus will apply scopes for environments/teams/etc. based on spaces. That allows for sharing of sorts but that topic is covered below.

Access control

Sharing
Sharing isn’t too common for our users today. I could imagine some scenarios when teams want to share assets across octopus instances (or spaces) and that becomes more painful. Will spaces allow for sharing scopes between each other? Who is to say it’s not a good idea or a bad one. Either way, sharing today is not impossible but I would argue it’s more important to achieve synchronicity between shared assets and that is far more difficult to maintain. Now we could leverage the REST API or Client API to make synchronization happen, but it wouldn’t be an integral part of Octopus and thus, always an external process. But let’s breakdown the sharing points Octopus Deploy mentioned:

Step templates and server extensions – Step templates are really easy for teams to obtain independently; we don’t have server extensions as we’re not using the newer versions of Octopus, but eventually we will upgrade and have to deal with server extensions.
Variables – While there are variable sets that cover this issue at the instance or space level, being able to spread variables or even just variable sets across all Octopus instances or spaces – that would certainly be a helpful feature. Syncing changes too from a “master” variable set would be desirable as well.
Releases – I’m not sure why one space would want to share releases with another space. Granted the RFC says they’ll follow up on this at a later time so I can’t say I fully understand this yet.
Tentacles – Tentacle sharing had been a major pain point for us in Octopus 2.x. We let teams share tentacles across Octopus Servers back in 2.5.13 and it led to mass confusion (for the teams that shared their servers of course). Let’s be clear, Octopus is perfectly capable of sharing one tentacle instance or making multiple tentacle instances on a server and having those instances being used for different Octopus instances. In an enterprise environment, uniformity and consistency are preferred rather than high customization. When you have thousands servers with the tentacle agent running, having unique configurations leads to troubleshooting nightmares. So for our 3.x environment, our rule is simple: 1 octopus tentacle per server for 1 octopus instance (and use the same port). If we have teams that want to share servers, we lump them into a shared instance and that alone has saved our sanity from our previous nightmares of troubleshooting tentacle communication issues back from 2.5.13.

Multiple Octopus Deploy versions
This is probably our biggest issue today – versions and upgrading. Because Octopus cannot really roll-back or revert to a previous version (without backing up the database – keep in mind I have 320 databases!), we have to be extremely cautious about what we upgrade in our production. Today, we primarily run 3.2.24 (a release that’s over a year old!). It works just fine but for teams that want to leverage the Azure Resource Manager (ARM) you need to have version 3.3.x. So to get around this, we have one specific Nodeset within our environment that runs version 3.3.27 exclusively for teams that need ARM support. Since each octopus instance has its own database in our SQL Server Always On cluster, there’s no need to worry about upgrades on a single Nodeset as it won’t affect other database. Since each instance has it’s own database, we can easily upgrade Octopus instance per nodeset – but not down to the individual instance (or space) within a given nodeset.

After running both versions in parallel (far more 3.2.24 instances than 3.3.27), we decided to start looking into streamlining upgrades. Part of our problem is when you have a lot of projects and different deployment patterns and styles, it’s difficult to figure out how to regression test anything before upgrading. Having this option of having multiple Octopus Deploy versions on the same host would make things easier for sure. I’m not sure as a practice if it’s good to allow different version of Octopus instances running concurrently on the same host or not. Either way, we are looking to upgrade our environment far more frequently and in an automated fashion and with HA cluster pairs, it should be feasible to have zero impact on availability. We’re still in the early stages of figuring this out, but if upgrading to all instances or spaces in one shot or concurrently running different versions on the same host would both be welcome features.

Octopus Deploy monitoring and reporting

Reporting – There’s a basic spreadsheet template for reporting for our teams. It would be nice to have some sort of reporting baked into the Octopus UI with specific permissions to allow a select few to view the history and progress of deployment history. From a metrics perspective, it’s hard to say what’s useful as the default template today is still pretty nice to have. I think I would prefer to omit the spreadsheet altogether and have it all in Octopus. Granted, this is another feature that could have more performance and feature requirements but if the reports are simple enough perhaps this request wouldn’t be too demanding. Another thought I have is to link PowerBI to Octopus REST API for reporting metrics. I’m sure it’s possible, I just haven’t tried it out myself.
Monitoring – We have managed to cobble together some basic reporting using our own stored procedures that query the databases across all instances (spaces) within SQL Server. It only makes sense within the ODCM to include comprehensive monitoring in a dashboard style. To be able to view deployment history, errors, tentacle communication issues from a dashboard would be helpful as well, but I’m sure given enough time I could think of a bazillion widgets or dashboard items for Octopus Deploy to incorporate.

Licensing

Yeah I can only see this being a separate add-on type of product. Not everyone needs ODCM.
Conclusion
The discussion around Octopus Data Center Manager is exciting. Not only for the enterprise but for all sizes of organizations. As I mentioned earlier, ODCM is a step towards mastering the enterprise-level deployment administration. A lot of the issues the Octopus team has described in their RFC blog post is really centered around larger installments of Octopus Deploy. If ODCM is a separate product, it’s likely it will have a significantly smaller user base compared to Octopus Deploy. Having said that, it’s utility and functionality would be extremely helpful for those organizations who are trying to manage everything from one console or site. If ODCM is made in a similar fashion as Octopus Deploy, meaning everything is designed from the ground-up/API-first approach, then ODCM has the potential to be extremely powerful and to allow organizations to customize ODCM first before customizing down to the Octopus Deploy instances or spaces level. That would be a remarkable asset for any organization.
A big thanks and congratulations to the Octopus Deploy team for all their hard work and dedication towards building a fantastic deployment automation tool for .NET developers!
* Octopus image from Zsolt Vajda

Categories:ALMContinuous DeploymentDevOpsOctopus DeployRequest For Comment (RFC)