[Podcast] The Through-lines of Business Continuity

Compliance Unfiltered is TCT’s tell-it-like-it is podcast, dedicated to making compliance suck less. It’s a fresh, raw, uncut alternative for anyone who needs honest, reliable, compliance expertise with a sprinkling of personality.

Show Notes: The Through-lines of Business Continuity

Quick Take

On this week’s episode of Compliance Unfiltered, Adam and Todd cover the widely applicable topic of Business Continuity.

Business Continuity seems like a simple enough concept, but many organizations seem to struggle with it year after year.

Is it a lack of planning?
Are there technical needs and considerations your organization just isn’t accounting for?
Do you have issues with your Backups?
Is your DRaaS up to snuff, or do you need a refresher on exactly a DRaaS is?

Either way the CU guys have you covered. These topics and more on this week’s episode of Compliance Unfiltered!

Follow Compliance Unfiltered on Twitter and Instagram at @compliancesucks

Read The Transcript

So let’s face it, managing compliance sucks. It’s complicated, it’s so hard to keep organized, and it requires a ton of expertise in order to survive the entire process.

Welcome to Compliance Unfiltered, a podcast dedicated to making compliance sucks less. Now, here’s your host, Todd Coshow, with Adam Goslin.

Well, welcome in to another edition of compliance. I’m filtered on time to show alongside a man who’s probably solved the compliance issue or three before this podcast has even been recorded this morning.

Adam, how the heck are you? Adam Goslin.

I’m doing great, Todd.

How about yourself.

You know, I can’t complain. It’s a Friday where we are. And so all good things are, you know, start on Friday.

So today, we’re gonna have a, an amazing chat about something that, um, I don’t know, folks may over simplify in concept, and that is business continuity. So business continuity seems simple enough. Yet it appears, at least from my perspective, Adam, that organizations tend to struggle from time to time with it. And I’m curious why that is.

Well, I mean, similar, our last our last podcast had to do with with incident response. And, you know, it would be it’d be a similar notion for the the realm of business continuity. You know, one of the biggest struggles is just lack of planning. You know, if you’ve got a business continuity plan, you know, that’s obviously a good first step.

But, you know, putting the time, the effort, the thought, the brainstorming into the various realms of the organization that could be negatively impacted, that require, you know, kind of some some sense of order to, you know, how do we go about navigating this particular challenge? And, you know, for for every business, it’s different. You know, there’s it’s kind of two fundamental sides of it. Right. There’s the there’s the technology side. There’s the technology side of the business. So, you know, thinking about, you know, what happens if my production environment or my Web server or my DB server, my firewall goes poof, you know, those are all kind of the technical elements. How do we keep things moving when technology fails us? But there’s also in and I think COVID was a was a great learning lesson for a number of organizations, you know, the non-technical side, what happens when it’s not ideal to have everyone driving into the office? You know, I had I had some entertaining conversations with some different folks as they as they walked into walked into that arena, you know, there’s some organization. I mean, they literally didn’t have a game plan. They had, you know, they had desktops, you know, kind of attached to desks at corporate. And now they’ve got to go from that to, well, we need everybody working full remote, you know, you know, starting from that point with without a game plan, you know, gave them some challenges. So, you know, there’s all sorts of things that can happen. And again, it depends on the business itself. Right. So there’s there’s just a number of different things that need to be thought through and considered and really that lack of planning is what you know, what kind of comes down to the center of where a lot of organizations will you know, what kind of struggle with their business continuity.

Sure, I know that makes sense. Now from the tech side, though, what are some of the considerations there?

So, you know, you kind of look at, and this obviously isn’t a, you know, some type of a, you know, complete compendium, you know, blah, but this is just some, you know, some of the highlight reels out of the technology side. So, you know, one realm being, you know, the notion of high availability, in other words, kind of making sure that you’ve got redundancy in, you know, kind of in your technology, and especially as it relates to your production systems. Yeah, I’d recommend folks go through and, you know, go through and review your tech stack. You know, as you go through and you do that review, looking for, you know, various single points of failure, you know, and it’s surprising.

Part of the challenge here to just be able to answer this, you know, you can’t really answer it generically because, you know, every single client, every organization is different. You know, it’s different, you know, it’s on one end of the spectrum, you know, if you’re doing either self-hosting, where your server room or closet, as the case may be, you know, is on site and, you know, and or if someone’s doing something like colo, their own equipment that they’re hosting in a hosting facility, you know, in those cases, everything that’s inside of that room or that rack, you know, is your responsibility. So, you know, as folks are, you know, kind of going through the going through the process of the review, you know, just kind of walk through from the outside in, you know, you’re you’ve got your internet pipe coming in, you know, trace it through the various devices every time that, you know, that that wire is hitting another device, you know, you start asking yourself questions, you know, do I have, you know, multiple, you know, multiple connections from the internet into the, you know, into the rack, whether that’s handled physically or logically, you know, then going through that, looking at the first device that they, you know, that they, you know, that they connect into. What happens if that device goes down? Is there another one that’s sitting there that’s, you know, that’s that’s ready to go? Does it have automated failover to, you know, to that secondary instance? And you basically go down and trace all the way through, making sure that for each piece of equipment that you have, you know, some manner or mechanism for handling, you know, various things that could happen with that piece of equipment. So, you know, you’ve got, you know, different realms of consideration there, you’ve got things like, you know, things like the networking and or the, you know, the physical cabling that would, you know, that would go into carry network connectivity.

So that’s one side of it. You’ve got power. Is there a single power cord that goes into fill in the blank? You know, does it have a receptacle for, you know, for being dual powered? If you’re plugging dual plugs into the back of something, then, you know, are each of those plugs kind of coming from a, you know, almost an A and a B side for power.

That way, if the facility happens to lose power on the A side, then do you have everything connected in with a B side? You know, I’ve seen some entertaining stuff when you go in to do the, you know, do the physical connectivity review of a, you know, a cover rack or stack where, you know, somebody inadvertently had attached both power cords to fill in the blank device both came from the A side. Well, guess what? The minute that the power goes out, even though you have a B side available, well, you’re dead in the water type of thing.

So, you know, just going through all of those various, you know, the traffic, the devices, looking at network, looking at power, looking at even things like, you know, like device failovers. Is there redundancy in terms of, you know, processors, RAM, things on those lines, disk that’s on the, you know, on the device? What happens if it’s only got a single disk and the disk goes poof, you know, that type of thing. So, there’s really a lot of considerations when you’re looking at the, you know, kind of the physical connectivity side. When you get into the cloud hosting arena, that provides organizations with just more options. You know, now you’re not, it’s not your responsibility from a, you know, continuity perspective to be worrying about a lot of the physical kind of connections, but now you’ve got, logical considerations to, you know, in a similar vein to kind of think through as you’re looking at your, you know, kind of cloud hosting environment.

A different arena for the, you know, in the technology aspect is backups. So, you know, gee, I don’t know, making sure that you actually have them is one good one, you know. It’s a good start. Well, there’s nothing like a, there’s backups, you know. So, you know, and a lot of this sounds like common sense, but in the same sense, I’ve seen organizations that have run into this wall.

You know, the other, the other piece for, as it relates to backups is making sure that, that your backups are segmented from your production network. So, you know, if you, if you’re going in and you’re doing backups and let’s say for the sake of this discussion, you know, I’ve got a server and on the server, I’ve got my production systems, but on the same server, I also store my backups there. Well, even if they’re on a separate disc within the, you know, within the server itself, if something happens to that physical device, well, now you don’t have geographic segregation or separation. You don’t have, you know, logical segmentation between your production network and your backup network. So you want to make sure that the backups are going, you know, off-prem, off-site somewhere else, you know, et cetera, so that you can, you know, kind of avoid those geographic issues. In addition, you know, you want to make sure they’re logically separated from your production arena because if you get hit with ransomware, now you got a problem. So, you know, there’s organizations where, you know, just for ease, right, that we’re going to go run the backup job and the backup, the backup location, even though geographically disparate is set up as a map drive on the same box.

Well, if it’s logically connected to your existing prod environment and that prod environment gets hit with ransomware, the ransomware is just going to start going through, you know, going through the network, going through the devices and encrypting everything that it possibly can. Well, if you’ve got a direct connection to your backups, well, now you’ve got, now you got the issue of not only has your production box been encrypted, but now all your backups are encrypted too. And that’s actually happened to folks. So, you know, you want to pay, kind of pay attention to that.

Depending on your business, you also got to put some consideration into, you know, how many daily backups do you need to keep, weeklies, monthlies, annuals, so that you can mitigate the various risks to your, you know, to your business or your environment. You know, that’s another, you know, kind of another realm of consideration. A lot of folks will, you know, kind of use a third party for signing up for, you know, some form of disaster recovery as a service. So where, you know, your backups are, are effectively a snapshot in time of the current state of this technology asset. A disaster recovery as a service system, you know, will effectively replicate, replicate your existing production box over to a secondary environment. And it’s effectively like a, almost like a hot standby mode where the client would come in, declare a disaster and set the wheels in motion to fail over to that instance.

And the big difference between backups versus disaster recovery is that in that disaster recovery, there’s a couple of different, you know, kind of elements that you want to pay attention to, typically referred to as RPO and RTO. And what those mean is, is RPO is your recovery point objective. In other words, how near real time is the replicated instance? It’s typically stated in either minutes or hours. And the recovery time objective is how long is it going to take you to, you know, basically go from your existing, you know, production instance over to this secondary once I declare a disaster. So, you know, it’s, it’s one thing, it depends on the organization too, right? If the business is not hyper critically dependent on, you know, on the uptime of their existing production environment, well, who cares if it’s going to take three days to, you know, go ahead and spin it up in the secondary location. But if, you know, if you have some type of a massive issue, you know, massive issue in the, you know, in your production arena, and it’s time sensitive that you get everything back up and running, well, then you want that, you know, the recovery point to be as near time as, as you can stand. You also want the amount of time it takes to be able to go over there, you know, as, as, as least as you can stand as well.

Now, I mean, that all tracks to me, but I’m curious if there’s any remaining pro tips that we haven’t covered yet on the business continuity fund. We want to make sure that we hit today.

Sure. First and foremost is just making sure that you go through and you’re doing testing. It sounds odd, but I’ve seen organizations where they go and they set up this amazing program, et cetera, but then they don’t actually exercise it. Every now and then declaring a maintenance window, actually testing your high availability assumptions.

We’re talking earlier about how we’ve got two different cables that come in to carry the internet type of thing. Well, go ahead and during a maintenance window, pull your primary. Does it automatically flip over to the secondary? Pull your primary power. Does that automatically keep the box going? It’s interesting. Unfortunately for those that don’t test, they get their opportunity to do so when there’s a true something went horribly wrong opportunity. That’s about the last time that you want to be figuring this stuff out, going over to the backup arena. When was the last time that you actually validated that the backups are complete? Are you able to go ahead and restore a box? Have you ever done that? Even if the software is telling you, oh, yeah, this backup was successful. Okay, great. But every now and then, go in and bring a backup back into place. Some of the considerations in the backup arena, and it really comes down to the use of the system and something for the organization to kind of think through. Are you able to just go back to a point in time easily? Here’s an example. Let’s say it’s a web server that doesn’t have a whole bunch of information data on it. It gets updated once a week or whatever it may be. Well, that’s not going to be the end of the world to go grab the last image for the web server, go bring that back and go drop it in. But if my database is a snapshot of the physical device, that last snapshot was 16 hours ago when I need to go ahead and kind of take advantage of it. Meanwhile, you have a whole bunch of transactional level data that has been being generated and updated on your system since the last backup that you took. Whether it’s something in that arena, or we talked about DRAS earlier, where you’ve got transactions up to date within a particular period of time. Regardless, if you have a transaction-based system, now you’ve got to go look at, how would I go about putting Humpty Dumpty back together again if I need to come from this older doesn’t have all the transactions instance, regardless whether it’s DRAS or backups. It’s impossible. I don’t know. Maybe you just have to bite the bullet and say, hey, this is going to be the way it is for how we go about doing this. It really just depends. All of this stuff that we’re talking about, DRAS as an example, making sure that you’re actually testing that failover to that secondary instance. Depending on the organization, a couple of times a year type of thing, but just make sure that it’s actually up, running, working, functioning, that you don’t run into any glitches, et cetera. It’s easier.

A lot of the stuff that we’ve been talking through, for some it sounds like common sense, and yet for some it’s not quite so common.

Just making sure that you get this arena together and that you go through and you do that testing on a regular basis so that you’re not running into unexpected issues, if you are.

Sure. So, so far, you know, we talked about the technical side of things, but what about some of the more non-technical side of things? Because I’m sure that applies.

Well, it used to be, you know, it used to be a lot more, you know, of a consideration, you know, before, you know, before COVID, as many companies just weren’t taking the non-tech side very seriously, but, you know, I think the COVID experience taught many organizations a great deal about just how unprepared they were. You know, I recall, you know, kind of that, you know, the March to, I don’t know, March to June, you know, timeframe. I mean, it was just nuts in 2020. It was nuts because so many organizations were struggling with, you know, with issues, problems. Can you just imagine needing to acquire, you know, whatever, 200 laptops during that period of time? Holy moly. Yeah, I just, yeah, there were, I think it got real ugly for a while for, you know, for folks. But, you know, the reality is that you need to go and consider your business operations and those non-technical implications.

So, you know, some examples, you know, within that space, you know, how do you go about interacting with your vendors, you know, and in this non-technical kind of business continuity arena, you know, looking at various scenarios like what happens if we all have to immediately work from home and yet my vendor is somebody that because of my business needs to physically meet up or drop something off or pick something up. How are we going to handle this, you know? You know, what happens to the staff if corporate is inaccessible? And, you know, COVID was, you know, one scenario, but, you know, you could bring a number of different ones. There was a tornado sometime within the last year or so that it was down south. It hit a candle like a candle factory and basically, you know, raised it to the ground type of thing. You know, what happens if your corporate facility, you know, has a gas leak, blows up, you know, is deemed off limits, whatever it may be. Well, how are you going to go ahead and, you know, go ahead and do that? The working remotely considerations for staff, I think a lot of folks learned a lot, you know, during that last period, but it could even be, you know, kind of regional impacts too. I mean, depending on the type of organization that you’ve got, you know, you could have, you know, you could have six, eight different offices, you know, you could have, you know, something where you’ve got pockets of personnel, you know, type of thing. What happens if something happens to that region? You know, you look at what’s going on, you know, in the world today. You know, what happens if your development is outsourced and it happens to have been outsourced to folks that are in Eastern Ukraine, right? I mean, what do you do now? And so, you know, it’s thinking about things kind of outside the box, making up scenarios, you know. Every single business is different, but, you know, we just need to go through those exercises and kind of stretch and flex the stretch and flex, the, you know, the testing and the scenarios you can possibly run through.

All right, it’s about that time, Adam. Business continuity horror stories, break them down.

I don’t know, I’ve got a couple. So, it was a data center. They thought that all their power systems had been set up properly for when the grid power goes down, then we flipped to the UPS, the UPS kind of holds everything until the generators come online and poof, everything’s off and running. And it wasn’t even their fault.

Somebody in the building, it was kind of a multi-floor, but this was quite a while ago, but it was a multi-floor building and a fire alarm had gone off on one of the floors. And so, the fire department was saying, I’m doing a thing. I’m not sure if you’ve seen these, you know how when you go up and down an elevator, there’s this key hole that’s on the wall? Have you seen that? Sure. Yeah, I didn’t realize that at the time. Some of them are kind of operating the elevators, but in other cases, that’s actually the kill switch for the power for the floor. So, as the fire department’s in there and they’re doing their thing, they’ll go ahead, key into that, turn the key, and boom, the power literally drops to the floor. And so, the fire department was going through doing their checks. They hit the kill switch on that floor. And what was supposed to happen is that a switch was supposed to get flipped, which kind of allowed the UPS to kick on and the generators. I forget the specifics, but it was like the line that was coming from the UPS and generators to get back to the floor. There was a switch that was supposed to flip. The switch failed. And so, effectively, they dropped the power to the entire floor and the secondary systems didn’t come on. If you were doing testing, then you would have gone ahead and figured that out. But that was an entertaining question. There were organizations that had cut planned out their backups. They tested. They validated. They got hit with a ransomware attack. And it started going through, it hit the network, it started encrypting devices, and they had a direct connection to the location where the backups were, it went through, it encrypted all the backups as well. You know, this business literally wasn’t able to do it, they couldn’t do anything. They couldn’t bring a system back online, they had no choice but to go down the road of negotiating with the folks for the ransomware, having to pay a boat ton of money, just literally so they could get back in business, but kind of underscores that earlier point, which is that looking at kind of how you’ve got your various repositories stored, that you’ve got a separation between your various technology assets so you’re not in that position.

I’ve seen other organizations that were faced with a similar notion, but they had, you know, GAAP backups and, or DRAS system, you know, et cetera, where they could basically, you know, go to that, you know, blitz or wipe their existing, you know, it took them a minute, but, you know, go blitz or wipe your existing, you know, production, you know, boxes, go to those backups, restore them from there, and kind of put Humpty Dumpty back together again. And in another case, they had DRAS, they basically went in and just dialed up.

Now, keeping in mind, the DRAS is gonna replicate everything down and over the DRAS, so they just had to dial the DRAS point back to, you know, back to a stage before FIT hit the shed. And, you know, and then they were able to get that handled as well.

But if you’ve got these things in place and they’re working properly, you’ve got ways to be able to, you know, to be able to get around, you know, basically ways to be able to get around the whole notion of having to, you know, to pay these folks with the ransomware. You know, part of the problem is that, I saw some numbers recently, I don’t remember, it was approximately 50%, but, you know, they said even if you pay the ransom, only about half of these people are gonna get, are gonna actually get their data back. And for the other 50%, if you think about it this way, so just because I go ahead and I pay these people, so, you know, to go ahead and get my data back in hand, does not mean that they aren’t also going to spill it out onto the black market and give you just as big of an eye, a black eye anyway. So, you know, there’s just no guarantees when you got the, when the bad guys are inside the walls. You know what I mean?

No doubt about it, no doubt about it. Well, listen, as we head down the home stretch here, any remaining just kind of, you know, pro tips or tricks of the trade that we wanna make sure to pass along.

Did I, did I say we needed to do testing, testing, testing, more testing, test it again? Yeah, exactly.

I mean, honestly, it centers on, it centers on prep and testing. You know, run the, run the scenarios, you know, from your, from your non-technical side. So I remember back in the day, one of the organizations that I worked for, we were showing up to, and they did, it was actually, it was actually brilliant because they, they did it with no notice to anybody. And this is, you know, multiple hundreds of people that were working at this place. There was a, there was a note on the door that says, that says, we are, we are practicing our business continuity plan today. The scenario that you have just encountered is that there was a major gas leak. The city has shut down our corporate headquarters. So please get back in your car and head over to this hotel, you know, for further instructions type of thing. And that’s literally what everybody walked into. You couldn’t get into your office. You weren’t allowed to go in there. It had to be just whatever you had on you, you know, at the time. So, you know, it, it, it taught the organization a good amount about, you know, the, the, the folks that are, oh, this is the best one. So there were folks that were issued laptops that had, that were supposed to be taking them home every day that had left them on their desk, you know, type of deal. And that was just kind of their, their thing. They’re like, I don’t want to have to drag this laptop over every day, you know? And so they’d left it on their desk. Well, guess what? They couldn’t get to their laptop and they had to somehow figure out how to work all day without their laptop. Yeah, it was, it was just, it was, it was actually really, really entertaining.

The other one, the last one that I’ve got is, you know, paying special attention to any of those business critical vendors. So this is going to be different for every organization, which vendors, the critical vendor, but you know, depending on what type of business you’re in, you know, having redundancy in those, you know, in those vendors, do you, do you, you know, if this is a parts manufacturer, do you have a secondary source that you can go leverage? Do you already have a relationship set up? Maybe you order a little bit from them, you know, regularly so that you can take advantage of that in an emergency. You know, that type of thing, but look through your list of your critical vendors and, and put scenarios together for how would I, if I had to, how would I replace, you know, one of these guys? Because just like your organization is, is kind of planning for that contingency of, of what happens if I, you know, I can’t get to my corporate office. Well, any of your vendors can be in the same boat, right? However, they do what they do. You know, maybe they run into some type of a disaster that then ripples over to your organization. So certainly looking at those business critical vendors, how to handle them would be a really good idea as well.

Outstanding. That’s the good stuff. Well, that’s all the time we have for this episode of Compliance Unfiltered. I’m Todd Coshow and I’m Adam Goslin, hope we helped to get you fired up to make your compliance suck less.

Show Notes: The Through-lines of Business Continuity

Quick Take

Read The Transcript

You may also like

[Podcast] Navigating the Dangers of Adopting A.I.

[Podcast] A.I. Grab Bag

[Podcast] BEWARE: Promptware