Zen of Venn — DevOps, DevSecOps, and Ops
Like Information Technology (IT) and Agile, the term Operations (Ops) has become overloaded and confusing. We'd argue Ops is Ops, or at least that there is no need to over complicate the differences and necessary separations. In this post, we'll discuss the Venns of Ops. If you build or integrate software, have (or are) an Ops staff, or are just keen on Venn diagrams — this post has something for you.
We'll assume an understanding of Ops generally and DevOps and DevSecOps, but there are embedded links for background. The Venn of any DevOps have a lot of intersection to start, so we'll consider those equal for this post. We'll roll-up and focus on the Venn of DevOps and the more traditional "Prod(uction)Ops" (a.k.a. operations, hosting, IT, etc.). ProdOps "owns" the infrastructure and deployments in a private cloud, public cloud, or hybrids — noting that public clouds (e.g., Azure and AWS) are forcing major changes across Ops. We enjoy debates on the right Ops Venn, but advocate less separation via a thickening of the intersection of the DevOps :: ProdOps Venn. Ops convergence and a broader intersection reduces complexity and can yield BIG benefits in efficiency, cost, security, and organizational harmony that everyone from technologists to CFOs will appreciate.
For the these benefits, the non-intersecting aspects of the Ops Venn are a worry and need extra scrutiny, especially for security and privacy. Differences, seams, and gaps are the natural habitat for vulnerabilities that hackers seek to exploit. There are important differences between "other" Ops and ProdOps (e.g. you need some controls and gates "production" deployments with limited, MFA, and tracked admin accesses), but the processes and practices should be similar or the same. However, differences and non-intersections should be the exception versus the rule, and must be justified and documented. If you are working in an environment where there are significant differences between Dev and Prod, unnecessary duplication, or unaligned priorities (or cost centers), convergence should be a high priority! Let's consider an example:
WitJets Company — Developed private cloud services for use by its internal business units. Over time, these services were extended for use by external partners and clients. Now, the usage is about 80:20 (external:internal) with corresponding prioritization of investments. WitJets is agile and leveraging DevOps and continuous integration and deployment (CI/CD) for its development processes. They have a public cloud migration in their road map, but no immediate plans past to execute.
For production releases or anything client facing (e.g., beta or acceptance), development or release management provides packages and documentation to ProdOps for deployment. The WitJets' ProdOps organization evolved from and are still connected to its IT/LoB Ops teams, which makes a hand-off such as this more natural. ProdOps "owns" the environments and deployments with responsibility to make the services generally available to the clients at scale via a private cloud vendor (of infrastructure and connectivity). So, ProdOps has a direct effect on revenue and has that "stick" when there is pushing and shoving on resourcing or decisions. The ProdOps team uses some aspects of DevOps procedures and practices, but there are significant differences to account for production specific needs: scaling, tenanting, configuration, authentication and authorization, certificates, endpoints, perimeters, and controls such as firewalls, endpoint protection, anti-virus and anti-malware (AV/AM), etc. They also must maintain at least two versions of the services to support all tenants/clients. [Cylidify recommends one, but no more than three production supported versions.] So, while the operational intentions and capabilities are similar between Dev and Prod, the implementations are very different and mostly manual.
Consequently, WitJets has experienced serious problems in consistency and reliability including functional and performance issues in production (that cannot be reproduced in development), and an added-on tenanting model (which requires organization and replication not present in development systems). WitJets has been able to handle the issues as they arise, but they are increasing in frequency and severity — the Dev::Prod relationship is strained, clients are unhappy, and the business does not have a handle on the overall impacts and costs of their Ops situation. WitJets have had at least one outage due to a development certificate making its way to production and expiring unexpectedly and struggles with change management. Even though the company is aware and concerned about the situation, it is waiting to restructure the technology and business to dovetail with their public cloud transformation and migration; which continues to slide further out in the road map.
The WitJets example illustrates a difficult situation with some aspects that have affected every technology business at some point. Cylidify believes that there are big benefits in operational convergence that can be realized in the short-term (in a tactical or agile fashion) without waiting for long-term restructuring of the technology or business. The earlier in the process you can align Dev and Prod, the larger the benefits. We recommend that a business does not commit to any development efforts without due diligence and good confidence in an aligned operational plan. Why build something if you can't cost-effectively make it available at scale to paying customers?
Getting a deployment and configuration correct for a specific target is difficult even in ideal conditions. Cylidify recommends operational convergence with automation and orchestration covering any differences between targets — whether they are pre-Prod (Dev, acceptance, stage, etc.) or Prod (service versions, instances, tenants, etc.). Security is just one of the areas that will benefit from these recommendations, but it should not be underestimated. Production security incidents have many hidden costs and are usually visible to the public which can tarnish your brand and reputation. Here is a list of some important security aspects that should be understood and aligned across operations:
Anti-Virus/Anti-Malware (AV/AM) and Active Endpoint Protection (AEP) implementation and configuration — these can have unexpected impacts on performance or functionality
Data(base) replication and backups including accesses or service accounts
Deployment structures and orchestrations (either technology or controls) — trust boundaries and controls have wide ranging, but subtle impacts
Parity of structure and configuration in scaling of systems, virtual instances, and stores — regardless of life-span
Certificates, keys, and cryptography implementations
Access Control Lists (ACLs)
Remote access (RDP) and configuration — few in production with white-listing, MFA, and tracking required
A deployment checklist including the above is a good start toward the goal of having a small set of documented, justified, and verifiable differences and separations for ProdOps. If Dev and Prod aren't aligned early, there will be gaps and disconnects including some that won't be found until well after a deployment (e.g., a new tenant is added or resources are added for scale). Fixing these later stage issues will always cost more and can impact your partners and clients.
So, converge and align your Ops! It doesn't have to be strategic and won't be a complete intersection. However, specific short-term alignments and incrementally converging Ops over the long-term will save time and money, and reduce risk.