[Podcast] Central Logging Sanity Checks - Total Compliance Tracking

Compliance Unfiltered is TCT’s tell-it-like-it is podcast, dedicated to making compliance suck less. It’s a fresh, raw, uncut alternative for anyone who needs honest, reliable, compliance expertise with a sprinkling of personality.

Show Notes: Central Logging Sanity Checks

Quick Take

The CU Guys dive into the critical topic of central logging sanity checks. They explore the common pitfalls organizations face when they set up central logging systems and then leave them on autopilot.

Adam emphasizes the importance of regular sanity checks to ensure that logging systems are functioning as expected and highlights the risks of assuming everything is working perfectly. The discussion also covers the need for compliance professionals to validate assumptions, spot-check logs, and ensure that alerts are being properly handled.

Tune in to learn how to maintain a robust compliance program that truly supports organizational security.

Read The Transcript

So let’s face it, managing compliance sucks. It’s complicated. It’s so hard to keep organized and it requires a ton of expertise in order to survive the entire process.

Welcome to Compliance Unfiltered, a podcast dedicated to making compliance suck less. Now here’s your host, Todd Coshow with Adam Goslin.

Well, welcome in to another edition of Compliance Unfiltered. I’m Todd Coshow alongside the James Hentfield of your compliance Metallica. Mr. Adam Goslin, how the heck are you, sir?

Oh man, I would just love to be that dude. I think it would be the best life ever. Ever. That’s just cool as hell.

Well, sir, I can’t disagree in any way, shape or form, even if it was just for one day, I kind of like where I’m at, but even for one day, being that dude would be awesome.

Today, we’re going to talk about, you know, another central theme here, not just a central member to a band, but central logging, specifically central logging sanity checks. So a lot of companies that have mature compliance programs set up their central logging and then kind of put it on autopilot. What are the downsides there, Adam?

Well, I mean, I’ve been for a long time, a huge fan of trust, but verify. And, you know, when the, when the companies go in and, and kind of set up their, their central logging, you know, they, they really do just kind of, okay, we’re done, you know, we’re done, we’ve, we’ve established all the things, you know, we’ve done all the checks and we’ve set up the system and we have all the right processes and, you know, we, the, the reviews are happening and alerts are flying and, you know, so then they just, you know, move into this mode where they just literally let her roll and, you know, and then don’t tend to go back to it, you know, for, you know, for a recheck or a sanity check or, or whatnot. They just go into the guiding assumption that everything’s good because it’s up and it’s, nothing’s gone boom and, you know, blah, blah, blah.

So, you know, the, the, the most important part for, for these organizations is that they, they go back in and, you know, double check, you know, is, is what I think happening, is it actually happening? You know, but, you know, they got, they got to go back in and, and just do a sanity check on, you know, on things. So, you know, that’s kind of the, the, the driving force here with the, with this particular topic.

Sure. Now with that in mind, what are some of the concerns that compliance professionals should be focusing on?

Well, I mean, first and foremost, you know, is everything that I think is logging actually logging, you know, is it are things that I set up to, to, you know, to log, are they still logging? Did something go off the rails? Um, it’s really, really easy, uh, depending on the system and the, and the structure that’s set up, what checks and things that they put in place, it’s really easy to, I don’t know, I’m just gonna make a number up. So let’s just pretend, you know, out of the gate, there were a hundred different things that were, you know, that were sending stuff to central logging. Well, you know, fast forward a couple of months or in a lot of cases, a couple of years, um, you know, the, uh, are the things that we, uh, are those hundred things still, still doing what they’re doing?

I mean, you know, there’s, there’s all sorts of possibilities for something going wrong. You know, you’ve got, you know, updates or patches that, you know, may go ahead and interfere with the, with the capability for those devices to push their logs. I mean, it could be something as simple as, you know, somebody was messing with a firewall rule to try to do some troubleshooting and, you know, lock down some ports so they could get some things isolated, et cetera. And then forgot to put every, put Humpty bumpy back together, you know, back together again and blah. And in the process, you know, block the, you know, the outbound logging, you know, capability from, you know, fill in the blank device, that type of thing. Hey, there’s a billion things that can happen between initial setup and validation and whenever you come back to it down the road. So, you know, that’s one side, um, you know, and, and, and really, if you think about it, right? For the, for the compliance folks, we’re, we’re running under an obvi- well, okay. It’s not just compliance, but it’s, you know, compliance and the security of the, of the target organization. You know, for, certainly from a compliance perspective, we’ve got an obligation to be, um, pushing those logs into central logging and having them actually get there, et cetera. And if I’m not, then I am missing the mark for, you know, my obligations to maintain a compliant organization.

Uh, the flip side of it from a security perspective is if all of the, if all of the logs are just literally dropping into, uh, you know, kind of not getting pushed into central logging, but are, you know, are just kind of, they’re happening on the device itself, never getting past the central logging. Well, now I’ve got a gigantic blind spot in my, you know, in my purview over my environment, uh, you know, and, and that creates a substantial risk for the organization. You know, you’ve got, the other thing that the, that the compliance folks should be focused in on is, you know, what about new devices that got added into the network that ought to be logging, you know, are they, don’t know.

Um, you know, maybe you’ve got process and procedure with your, you know, within your change control to, to go ahead and enable that logging and that it’s supposed to be sanity check that it actually got there, but we’re also dealing with human beings, right? Um, you know, human beings that screw up, make mistakes, uh, forget to do things, get busy with some other shiny object, whatever it may be.

And now all of a sudden I’ve got, you know, newly added devices that have been blocked onto the, uh, onto the network that ought to be logging that aren’t. So, you know, there’s a number of things that the, that the compliance folks should be, uh, you know, interested in and focused on.

Well, if the organization has established rules for logging inspection, how can those become bloated over time?

Well, you know, you figure, you know, in any type of a mechanism where we’re doing log inspections, et cetera, you know, oftentimes there’s rules that get, there’s rule patterns that get set up based on specific devices, based on specific versions of devices, you know, et cetera. And so, you know, as you have this, I’m going to call it a natural churn on a, you know, on a given, you know, target scope network, et cetera, you’ve got, you know, devices being deprecated.

You’ve got devices that are getting, getting added back in. You’ve got consolidation happening. You’ve got brand new devices that are hitting the, hitting the scope network, et cetera. So as you’re, you know, kind of doing these swap outs, right? You’ve got certain of these devices where you now have established logging rules and whatnot in there, but, you know, what happens when, you know, in a smaller organization where you’re only deprecating or swapping something out, you know, once or twice a year, probably not as big of a deal, but where I have a, you know, where my scoped environment is, you know, certainly hundreds, if not thousands of endpoints that are, you know, pushing logs, the level of churn in those organizations is going to be substantively higher. And so as I’m pulling certain elements out of rotation, if you will, you know, now I could, I could end up with, you know, kind of log pattern recognition, you know, on the, on the target central system, that’s no longer even applicable to the, to the environment that we’re in, you know, so, you know, as you’ve got these deprecated devices, do you have to do it with every single deprecation? No, I wouldn’t go that crazy, but, you know, certainly, you know, looking at when you, when you kind of, when you pull back and you look at your, at your asset management, you know, procedures, anything that got deprecated and whatnot, you can, you can typically find a correlation between deprecated devices and log patterns that are no longer, no longer being hit. Usually when, with the central system that’s doing the logging, you can pull some type of a report or a tally on, you know, how many times has this particular log pattern been recognized across the, you know, across the logs that are happening and where I’m starting to see, let’s say I do a quarterly check, right? Or even a semi-annual check, you know, of the, you know, of the rules engine. If I’m looking at the rules and I’m saying, Oh, you know what, in the last six months, this rule hasn’t been hit at all, or the number of times that this rule was hit suddenly dropped by 50 or 75%. So let’s say in the prior six months, this particular rule got triggered 12,000 times and in the, you know, in the, in the prior six months, and now in the latest six months, this particular rule only got hit, you know, 1500 times. Well, you’re probably going to find there’s a correlation between deprecated assets and that rule, that rule is basically on its way to, you know, in the beginning of this six month period before they actually deprecated it, you know, now all of a sudden that rule is no longer being needed to provide coverage for the environment.

Now I’m not saying that you need to go ahead and straight delete any of those rules, but maybe you can, you know, pull some of those, you know, some of those rules out, put them over into like a parking lot in the event that you need to come back or refer to it again, et cetera. You don’t want to lose the brain trust that went into formulation of the rule.

But the whole point here is that the fewer rule, if I don’t have non pertinent rules within the, uh, the inspection engine, now I can improve the efficiency of the overall, you know, kind of logging process. Um, and it doesn’t sound like a lot, but, you know, let’s say that, you know, I’ve got an environment where I have, I don’t know, 15,000 rules, um, that I’ve got log patterns for in my engine. If two, if two, only two thirds of those are actually being leveraged, then I’ve literally got 5,000 rules that are having to be inspected against the traffic with zero effect, um, pulling those out will, you know, will assist in streamlining, speeding up the, you know, the log inspection process, um, you know, shortening the time, uh, between log inspection and, and, and appropriate alerts getting fired, et cetera. So there’s a lot of good reasons to, uh, to, to, to limit that. And especially when you talk about people that are, you know, people that are in, you know, I’m going to call a public cloud style environment where the providers are nickeling and diming you for every ounce of, you know, CPU and RAM and distored space, et cetera, um, you know, and whatnot. Hey, you know, cleaning house a little bit, you know, pulling out, pulling out those rules you don’t need, um, you know, certainly we’ll, we’ll kind of free up some of your resources as well.

No doubt. Now, in central logging, there are a certain amount of assumptions, right? What type of assumptions in central logging are good to periodically validate?

Well, certainly, sanity checking that you actually have logs from all of the things that you expect to be getting logs from, that’s a good assumption to validate because, like I said earlier, there’s so many things that can happen that can cause the logs to stop coming in. And so, you know, it’s one thing to say, oh, well, I have this, I have this alert set up. And so if I don’t see logs from fill in the blank device, then I’m going to get an alert and now I’m just depending on the alert. Well, what happens if your alerting system pukes, you know, et cetera, remove all of the assumptions that you got in play about how it should be operating and working and just, you know, roll up your sleeves and go in and do the sanity check.

So, if I’m supposed to be getting logs from a hundred things and I get logs from a hundred things now, okay, well, which ones are missing and now you can start, you know, going down that rabbit hole, if you will. The other thing is spot checking of the inbound logs against your inspection engine to make sure that things that you expect should be caught actually are getting caught. You know, there’s a, it sounds funny, but, you know, you think about it, right? If it’s one of the most dangerous things about the central logging and the rules that are set up, et cetera, if somebody screws up, makes an oops, you know, whatever, it just wasn’t thinking that day, didn’t have enough coffee, whatever, and, you know, all of a sudden, you know, deems a particular log as benign when it should be a holy moly effin’ alert, you know, now you’ve got a massive problem. So, you know, it’s good to go back and, you know, sanity check the inspection engine, look at the actual logs, you know, and whatnot, you know, going ahead and going through those and making sure that, you know, it’s behaving and acting as you expect. You don’t have to be doing this daily or weekly or whatever. I mean, you know, maybe it’s a once a quarter thing, maybe it’s a once every six months thing, but it’s just good pruning to, you know, go in and do that, you know. Another sanity check is that making sure that the alerts that are supposed to be sent out from your logging, that the alerts that are raised, people are actually doing something with. I, you know, and also test the process periodically, throw some traffic that you know is going to trip, you know, trip an alert. And almost treated, you know, the recommendation that I’d have for the, you know, kind of for the compliance crew is you get a couple core members off of the team that’s responsible for, you know, operationally responsible for the logging and, you know, things along those lines. Get one person, you know, in tow, get some assistance from them for literally manually triggering things that ought to be varying degrees of, you know, of alerting that shipped out and quite honestly use it as a test. Almost look at it as, you know, you go and you run your pen testing and you go, you don’t tell any of the teams because you want to see if they, if they’re getting alerts and stuff and how they handle those alerts.

Almost as a, you know, kind of an incident response, you know, incident response approach. Do the same thing with your logging, you know, you know, trigger logging alerts and monitor and watch and see what happens, you know. If you trigger alerts that ought to be raising five alarm fires and you’re not hearing anything, you know, it was just crickets, now you know you got a problem. So go in, trigger those, you know, et cetera, you know, sometimes, you know, individuals will inadvertently, you know, bury the alerts that they’re supposed to be monitoring, you know, through their mail rules. And so maybe the alerts that were intended, you know, aren’t, you know, aren’t really, you know, aren’t really, you know, actually, you know, actually triggering, you know, for the, you know, for the, you know, for the folks that are frontline. So you want to, you want to go through and you want to double check, you know, and make sure, you know, I had, I had one instance relatively recently where an organization’s vendor was the one responsible for doing, you know, logging and, you know, logging and alerting. And, you know, because we were going through the, you know, their annual compliance run, they were pulling samples, you know, samples of logs and samples of alerts, et cetera. And, you know, there’s like a monthly report that would come out of logging activity. And in there, there were some things that I basically was like, wait a second, hold on a second. You’re seeing this on your monthly report. This stuff happened three weeks ago. You’re telling me that there was not a, you know, a red rocket flare, you know, that was flung when this type of activity was happening on your network. They said, no, we didn’t hear anything about it. So it’s that type of thing.

You know, you want to make sure that you’re getting alerted on stuff you should that, you know, and that all the systems are working properly. You know, another assumption is, you know, is that all of these recommendations that I’m kind of walking through, you know, the important part is this should apply to all manner of your logging systems, whether we’re doing the logging and log reviews through internal systems, you know, and solutions that we’ve either, you know, built or grabbed off the shelf, whatever, whether it’s a vendor that’s doing them, and honestly, especially those that have, you know, the AI integrations. I know I’ve harped on it, you know, over the course of the last couple of years, but there’s a lot of AI zombie mode going on where, oh, if it’s AI, then it just must work perfectly and it’s just going to solve world peace, right? You know, the AI systems, it’s just as good as the people that are doing the training of the AI system. So it doesn’t mean that it’s perfect or impervious to screwing up. So, you know, it doesn’t matter. Internal vendor provided, AI stuff, sanity check at all. You may be quite surprised by what you end up finding out, you know, and whatnot. So those are kind of some of the things that, you know, that I’m kind of, you know, keying in on for, you know, assumptions that are good for periodic validation.

Sure. Well, I mean, I guess there’s some, some other sanity checks that need to happen as well, right?

Like how often would you recommend a mature compliance program sanity check things like their logging configuration or processing alerting or like alert response? Thank you.

Well, you know, we, I guess I would kind of break this up into, you know, it kind of into different pieces. I think that there’s a number of factors that kind of come into play. So every, every organization’s going to be different, but you know, at the end of the day, it really depends on, you know, kind of the complexity of the, of the environment that we’re talking about. If I’m dealing, if on the one hand, I’m dealing with a, with an environment that’s a grand total of five devices. Oh, do I really need to be checking that? You know, with, with, you know, astounding regularity, probably not, but in the same sense, doing the, doing the sanity checks is going to be much simpler, uh, when I’m dealing with five things, where if I’m in another environment where they’ve got 2000, then, you know, 2000 endpoints that are supposed to be logging, you know, now I’ve got a far more complicated environment and the more devices that there are, the more chances there are that things are going to get, uh, merged, replaced new assets are going to be hitting, uh, assets are being deprecated and reissued or, uh, deprecated and consolidated, et cetera.

So, you know, in those larger scale environments, you’ve got a lot more, kind of a lot more turnover and a lot more opportunity for things to go sideways, um, than you do in the smaller environment. But, you know, I think the most important parts of the, you know, of the process certainly is may you’re not doing you or the company any damn good if the stuff that’s supposed to be logging isn’t actually getting the logs to central logging. So even capable of being inspected. So, uh, you know, I, I would say that, uh, um, the most important element is the validation are the logs we expect to get there getting there. So, you know, certainly for a really small organization, uh, small organization, uh, you know, that has the right sanity checks and whatnot in place at least once a year. Um, if I’m in a larger scale environment, quite honestly, this is a sanity check that I may want to, you know, may want to generate some, you know, some capabilities for streamlined automation of the sanity check through reporting and, you know, and systematic approaches, I would say do that as near term as you can, um, because it is a big F and deal if you’ve got devices that aren’t even, aren’t even logging, holy crap, man, that is a gigantic risk.

So I, I’m not sure I would have the stomach for, you know, for, for a huge risk profile there, um, you know, is daily, you know, is daily too much? Maybe, um, is once a month too infrequent. Depends on how much, how much you’ve got going. So for most organizations, I would say, um, you know, if you’re a moderately sized organization, maybe it’s quarterly or monthly, um, you know, that you go in and make sure that all the logs you expect to be there or there, um, but in a really big organization, I mean, I, I wouldn’t, I wouldn’t be opposed to doing that check once a week because it’s sooner that I can tell,

you know, that I’ve got a problem. Um, the sooner I can go ahead and, and, and effect, uh, uh, a fix or resolution, you know, now, when you get into the, you know, into the middle ground, which is, you know, the processing of the logs in the, in the processing of the logs, arena, uh, making sure that, uh, the, you know, the appropriate things are getting handled properly and sanity checking there, you know, somewhere in the region of maybe quarterly, um, you know, type, type of a thing, quarterly to semi-annually, uh, to make sure that you’re, you know, you’re, you’re doing the cleanup on non-used, uh, you know, non, non-used interaction, things along those lines, um, and then, um, you know, as I go into the, you know, the, kind of the smaller organizations, maybe that, that goes more like, you know, semi-annually to annually, uh, type of a thing, um, you know, certainly the alerting and alert, you know, am I shipping alerts?

Are people, are there people on the team responding to alert responses? Again, I would, I would condition that based on the size or scale of the organization. Um, I certainly, if I was in a larger scale organization, um, I would direct that testing, uh, to, uh, there’s obviously going to be multiple teams of people that are receiving alerts. So I would plan on probably a quarterly cadence of a, of a sanity check on the alert and alert response. Um, but I would sprinkle that out between, and you don’t want to hit the same team every single half and quarter. They’re just going to expect it, right? Um, so maybe if I’ve got, you know, if I’ve got six teams, you know, that are, that are responding to alerts coming off of central logging, you know, maybe, you know, Q1, I do two teams, Q2, I do one team, Q3, I do two teams, Q4, I do one team, you know, that type of deal. And just kind of spread the love every team once a year, you know, at least, um, you know, if you’re experiencing problems, Hey, guess what? You can ratchet it up. Um, but I, you know, I, I would, in a larger scale organization, I wouldn’t, I wouldn’t do the sanity check less than quarterly, um, in smaller scale organizations, maybe it is twice a year, once a year, um, that, that, that type of an arena, but, you know, really the, the kind of pruning that I would expect, uh, out of the, uh, out of the sanity checking of the, of the logging infrastructure, I would expect that to be commensurate with the, you know, with the organization’s, um, you know, appetite for risk, uh, it would be the best way I could put it.

That makes sense. Parting shots and thoughts for the folks this week, Adam.

Well, you know, you hear it a lot, you used to hear it a lot more, actually, people, you know, talking about compliance is not security. And, and my, you know, my thought on, on this at its, at its basis of levels, I agree with the notion of what these people are saying, that if I’m literally just making a fervent attempt to do the bare F and minimum of checking the boxes for compliance, then no, it doesn’t play as strongly into the security arena that sad for those organizations that can use some form of a compliance framework to put in place the controls that they need for their organization, you know, take the scope of that, you know, of those controls seriously, apply them everywhere within the organization that has exposure to sensitive data, then, then it has all of the bones of making for a compliance program that actually gives an F about security.

And, you know, the vast majority of the folks that I’ve met and interacted with in the security and compliance space, you know, at the end of the day, everybody actually does care about security. It’s not that they don’t give an F, you know, but, you know, taking that sanity checking seriously, you know, is, is going to mean an, an absolutely direct related benefit for the organization. And, you know, really in, in its truest sense, fulfill the, you know, the, the intended objective, you know, of those controls that the organization has around their, around their central logging, logging capabilities, and, and quite frankly forms a, you know, yet another realm of active protection for the organization itself.

And that right there, that’s the good stuff. Well, that’s all the time we have for this episode of Compliance Unfiltered. I’m Todd Coshow and I’m Adam Goslin, hope we helped to get you fired up to make your compliance suck less.

Show Notes: Central Logging Sanity Checks

Quick Take

Read The Transcript

You may also like

[Podcast] Navigating the Dangers of Adopting A.I.

[Podcast] A.I. Grab Bag

[Podcast] BEWARE: Promptware