Talking DevOps with Dave Mangot
I recently interviewed Dave Mangot, founder of Mangoteque, on an episode of the Private Equity Funcast to discuss the ins and outs of DevOps. Dave has a wealth of experience on the front lines at companies like Salesforce, Barracuda Networks, Cable & Wireless, MeridianLink, and SolarWinds. The following is an edited transcript of our conversation.
[Jim Milbery] Today’s topic is DevOps. So tell us a little bit about your background. You’ve done a lot of exciting stuff.
[Dave Mangot] So I moved out to Silicon Valley in the late ’90s and had no idea that this thing was going on called the dot-com boom. I stumbled upon it because I knew a bit about computers. I’ve been lucky enough to work with startups and big enterprises. Eventually, I wound up being an architect in infrastructure engineering for Salesforce. There, I designed a lot of the way that Salesforce runs today. Many of the concepts we introduced back then are still used today. I went on from there to run a global engineering organization for SolarWinds. I had engineering teams worldwide, and that was site reliability engineering. And then, a few years ago, I started working with private equity portfolio companies to help them grow during the holding period, using their technology organization and a lot of techniques, ideas, and stuff from DevOps. And DevOps borrows very heavily from many existing paradigms, like lean software development, extreme programming, agile development, and automation. All of which get folded into this umbrella word, DevOps, which Andrew Shafer and Patrick DeBois came up with back in 2009.
[Jim Milbery] What is DevOps? Define DevOps for me because we all use it loosely, and I think many people don’t know what it means.
[Dave Mangot] When I talk about DevOps, I talk about the international software delivery movement, which has two ideas. We want to deliver software with speed, and we want to deliver software with quality. So what are the things that go into delivering software with speed and with quality? It was tough to talk about those specifics for a long time. But this book came out around 2018 called Accelerate, based on the State of DevOps Report. And they identified four key areas that are indicative of high-performing engineering organizations:
Deployment frequency — How often do I deploy?
Lead time for changes — How long does it takes from when I want to make a change until it’s actually out in production?
Change failure rate — When I make a change, how often does something fail?
Time to recover — When something does break, how long does it take me to fix it?
What they discovered is that organizations that are high performing are twice as likely to meet or exceed their organizational performance goals. And these goals can be any business goal; customer satisfaction levels, revenue, or customer retention. We have mathematical proof that following these concepts and trying to get good at deployment frequency, lead time for changes, speed, change failure rate, and time to recovery is excellent for your business. I have the luxury of working with recently acquired private equity portfolio companies to look at their engineering processes and how they deliver software to see where the bottlenecks are causing them to have trouble delivering software or recovering from failures. And I work with those engineering organizations on fixing those things.
[Jim Milbery] Let’s talk about that for a minute. Silicon Valley has this reputation of delivering features every day, every hour. Yet, many enterprise software companies don’t WANT frequent feature deliveries. What do you say to companies who think they don’t need to do DevOps because they’re not releasing features every hour or daily?
[Dave Mangot] Well, I think, in specific organizations, technology isn’t the problem. I’ve gone into organizations and done assessments, and they’re meeting the business requirements, and that’s fine. Maybe their problem is in sales. Perhaps the problem is in the product. But for people trying to grow, there are still things we do with software that are not features per se.
[Jim Milbery] If you don’t have to release software, that becomes what I see as one of the problems. If you’re in a mode where your features take relatively long to develop, the customers don’t want any visible change to the system or workflow change, so you’re releasing a few times a year with some new features or some new updates. And because they don’t release software often, they’re not good at it. And that’s when all hell breaks loose. Do you agree with that idea of, “Hey, you should get into the cadence of releasing more often, so you’re good at it when you have to do it?”
[Dave Mangot] I don’t believe in releasing just for the sake of releasing, but there is a book called Project to Productby Dr. Mik Kersten, where he talks about the four flow items:
Features
Defects
Risk
Technical debt
Features are just one piece of the puzzle. Maybe you’re fixing bugs — those are defects. Risk is security issues or things like that. You are asking about technical debt. The bottom line is that releasing software is not always visible to the customer. For example, maybe we optimize how we store customer data so that it only takes up 25% of the space, and we can save a bunch of money on data storage costs. And that increases our above-the-line profits. It depends on what the business needs, but people should always be able to deploy software. And companies should be able to make improvements continually. Sure, sometimes there is software out there that’s on deathwatch. It just sits there. And I don’t advocate that people go in and muck around if there’s no reason to. In software, there are many things to do that aren’t just “what’s happening on the screen.”
[Jim Milbery] I just talked to a friend of mine in the manufacturing business. He worked for one of the large automotive manufacturers and recently moved into the software business. He said, “When I build a car, I know exactly how long it will take, from design to delivering that car to customers. Now, I’m managing software companies that don’t work that way. It’s incredibly frustrating.” As a large manufacturer, he controlled all his suppliers. But on the software side, you don’t always control your “suppliers.” Microsoft can change a version of the operating system and patch a security hole that you now need to respond to on a piece of software that is on deathwatch. Maybe they no longer support the operating system version, or the database has gone out of support. We don’t control our infrastructure in the software world, and it’s a real risk.
One thing that surprises most of our portfolio companies that have jumped into mobile now is that initial development is just one part of the overall costs. When I ask, “What’s your budget on an annual basis for maintaining your mobile app?” They don’t get what I mean. And I have to remind them that every June, Apple will come out with a bunch of IOS changes that you’ve got to respond to stay in the App Store and stay compliant. So you will be maintaining this thing whether you want to or not.
[Dave Mangot] The maintenance part is way longer and more expensive than the initial development. I think the interesting thing in your example was about car manufacturing. We talk a lot in the DevOps movement about Toyota and how they do not rest on their laurels. They are always trying to improve on their delivery and their quality. We are heavy into W. Edwards Deming and a lot of the stuff he did with manufacturing quality. We bring many of those principles into software delivery so that we can have more faith in the quality of the process of delivering software.
If we have crazy old legacy things like PowerBuilder applications, how do we increase the confidence in what we’re delivering when we make changes so that we know we’re not going to break something? We want to deliver software with speed and quality, yet many old frameworks and languages weren’t written to be unit tested. I have customers all the time who have some big bundle of software that’s not broken apart into manageable pieces. They claim it “wasn’t written to be tested,” but we have to do something about it. The answer isn’t; throw your hands up and say, “It’s over. We can never test it.” We talk about what we can do about end-to-end testing. Instead of worrying about the little components, if I can write automated tests that go against that web interface and then see what comes up in the box, maybe that’s a way of testing it. I’m not saying any of this is easy. But we have the math to prove that it’s a worthwhile investment. Mickey Dickerson came over from Google when healthcare.gov was launching. Healthcare.gov was having a lot of challenges, to put it nicely. And he walked in and said, “So what do you use for monitoring? What do you use to see if the site is up or down?” And they pointed it to a TV in the other room and said, “CNN. When the site’s down, CNN will talk about it.” And he was like, “Yeah, I think we’re going to want to close the loop on that a little bit so that we find out about problems before our customers do.”
[Jim Milbery] So let me change gears. What does an assessment from you look like? Let’s say it’s a software company with about 25, 30 million in revenue, maybe a couple of thousand customers, and several servers. Walk me through your typical engagement.
[Dave Mangot] I developed an assessment during the lockdown when I had a bunch of time to write code. I based it on Accelerate, the book that we talked about earlier. Because we know that high-performing organizations are twice as likely to meet or exceed their organization performance goals, we look at what we call a “pipeline,” which is a way of delivering software. And so we joke with the engineers that we only have three questions:
What are all the things that happen from the moment code is checked-in until that code is out in production? That includes all the testing; performance tests, security tests, end-to-end tests, and unit tests.
The second question is the same idea, but for infrastructure. So I want a new load balancer or a new database. How hard is that to do?
And the third thing we ask them is what are all the things that happen from when you decide that you have an incident until that incident is fully resolved? We ask that because we want to hear how it is fixed and what happens after it’s resolved. Are we learning about how or why it broke in the first place? How can we make it better?
It usually takes about seven to eight hours to do interviews where we score people on approximately 50-something capabilities. In the end, we have a report with a bunch of scores, but it also has some suggestions in two formats:
Items that we think you’d get a considerable boost if you were to invest some engineering time in this.
Things that you’re doing well, but here’s how you take that to the next level.
We’ve had companies all over the board telling us they are struggling. Maybe they have just moved from on-premise into the cloud and have a thousand databases they manage by hand. By the way, I recommend always moving into the cloud because that’s where you can move fast. You can’t move fast in a data center. Move your stuff into the cloud because what you want to do when you get there is to reduce the variation between all your different layers (databases, application servers, operating systems). If you’re used to having everything on-prem, every customer is a custom, bespoke, beautiful little creation that is hard to manage. When you get into the cloud, you want to reduce that variation. And that’s how you start moving more towards being a true SaaS platform. Suddenly, the tools, processes, and procedures you put in place apply to 1,000 customers instead of one customer. And then you can start iterating on changing your architecture and all your other fun stuff so that you’re getting all these operational efficiencies, which is what all the PE firms want.
[Jim Milbery] My most significant challenge in Private Equity is that I am a software architect working with ex-investment bankers. So my job is mainly translating. My standard joke is that a founder trades in his BMW every year but wants to know why a 10-year-old server needs replacing. It happens more often than you think. Many of our portfolio companies were started by non-technical founders. They didn’t like the software out there and wanted to improve it.
For example, we own a company that does invoicing and billing for law firms. The founders were CPAs, and they ended up with many law firms as clients. They weren’t software engineers, so they started building this thing, which was all on-prem. But when you make your first application, that first version is for one customer, and it will be highly bespoke because they’re helping drive requirements. But then they get to a steady state where they have 50 or 100 customers. And they fall into the pattern of, “Well, every time we land a new customer, they want a few changes. Let’s agree to make those changes.” They don’t have product management driving requirements on behalf of customers. So when we first engage with these companies, we’re still in that situation where there’s a lot of bespoke software. That move to the cloud becomes a painful process because when you’ve got a lot of legacy programming languages, legacy databases, and operating systems, and you move that to an Amazon or an Azure, all you’re doing is trading out your data center for Amazon’s data center. I have portfolio companies with a monolithic stack that I can’t break apart running on VMware. It costs me more to host it on Amazon than in my own data center. But we still moved it. It’s the right decision because we’ve got to move away from the challenges of running our own hardware. But initially, that’s what scares a lot of our portfolio companies. They’ve got this legacy stack that doesn’t fit well with the cloud vendors.
[Dave Mangot] I had a client that came to me and said, “Hey, Dave, I hear you with this newfangled cloud stuff. It’s all great. But you have to admit that the cloud is way more expensive. I went on Amazon. I priced out this server for $38,000 a year, then priced out the equivalent server with Dell for $25,000 a year. So how can you tell me that the cloud is even close?” I responded with, “What happens when your server dies?” Dell support might come around, and if they have the right part, you might be fine, but when my server dies, I make an API call, and I get another one that looks exactly like it. If you want to do that, you’d better buy another server, which means that now you’re paying 50 grand, and I’m still paying the same 38.”
The advantage of the cloud is speed and flexibility, not cost. You’ll get cost advantages out of the cloud, but that’s on you, not Amazon. Amazon is giving you a platform to move super fast and break up your thing faster than you could have ever done inside your data center. The ability to reduce costs is called engineering. Amazon can’t make your engineering organization better. All they can do is offer you all this hardware and equipment.
[Jim Milbery] The cost to move initially is difficult for many companies, mainly because they’ve got a monolith. And I agree it’s not a hard requirement that they need to go to a cloud-native architecture. It’s about making sense for the application, not chasing the latest trends.
[Dave Mangot] The trend will change, so who cares?
[Jim Milbery] Yes. Exactly. I started on PDP-11s and then upgraded to VAXes. We had terminals, and along came client-server. And all of the pundits claimed that this was the last thing we’ll ever need: client-server. But wait a minute, we should add an application server and make it a three-tiered architecture. Then mobile, then reactive design. The point is that something new is ALWAYS going to come along.
[Dave Mangot] The ability to embrace the next new thing depends on where you are. I have had clients who told me, “The answer, of course, to any question, is Kubernetes. And in two years, we’re going to be on Kubernetes.” And I’m thinking, “You can’t even run in the cloud yet. Yet somehow, you know the answer in two years is Kubernetes.” I’m very much a big fan of crawl-walk-run.
Get into the cloud.
Learn how to operate.
Learn what all the problems are.
Learn what happens when your server magically disappears.
Get good at breaking up your stuff into pieces that you can manage because you can’t just decide that in two years, we’ll be on Kubernetes unless you’ve mastered all those other skills. So in two years, if that’s still the correct answer, you will more easily migrate to Kubernetes once you’ve mastered all sorts of basics.
[Jim Milbery] If somebody’s going to engage with you, do you recommend they buy Accelerate and read the book first? What’s the best way to dive into it with you?
[Dave Mangot] I recommend everybody reads Accelerate anyways. When the history books are written about the computer software industry in 50 years, people will look back on the day that Accelerate was published and see how much that changed the entire software industry. So whether people engage with me or not, Nicole Forsgren, Gene Kim, and Jez Humble gave a gift to the world. It’s a fantastic book.