If you’re considering acquiring an information security product or service which touts its AI capabilities (which you should be), then you need to understand the difference between a rule, a correlation, and a model. They are definitely not the same thing, and there is an important difference in the efficacy of the security solution provided.
So, let’s start with what a rule is. Arguably, the best known use case of rules in information security is Snort, a well-known IPS product. Rules are written in a fixed format and syntax, as far as the rule header is concerned. There are rule options available (e.g., general and description), but if an option is selected, then it must again follow a fixed format and syntax.
Rules are very fast to process / execute because their syntax is fixed and known. They are also extremely accurate; which should result in a very low FP rate. However, the obvious downside of rules is that they are good only at catching known threats for which a rule has been written and deployed, leading to a significant false negative rate. Rules can not detect unknown or new attacks for which no rule has been written and deployed.
To lessen this inflexibility of rules, correlations were devised. While correlations are an improvement over rules, they still have limited flexibility. Correlations too have a fixed syntax (i.e., commands). However, their format and available options to use are far greater than what rules provide. Additionally, there is other important flexibility provided by correlations over rules. For example, correlations support use of wildcards, Boolean logic, RegEx, and Splunk’s use of CASE() and TERM() to match phrases. This flexibility is considerable compared to rules. Some correlations are quite simple to write. For example, live but completely inactive accounts, or excessive failed logins. However, some correlations are extremely complex in order to reduce FPs. That complexity can effectively turn correlations in static rules.
I say “effectively” because comprehensive, effective correlations are difficult to write and even more difficult to test for efficacy. This means that once written, correlations are rarely tuned to adjust for changed circumstances (e.g., changes in network topology). Additionally, correlations are only as good as: 1) my knowledge of the query language and commands; 2) my knowledge of my environment being monitored, and 3) my knowledge of threats to watch for. Obviously, a further downside of correlations is that they are good only at catching known threats for which a correlation has been written and deployed. Correlations are not effective at catching unknown or new attacks for which no correlation has been written and deployed.
With machine learning, we use models, not rules or correlations. So, what is a model? The short answer is that a model is the output / product of an algorithm. (An algorithms is a self-contained step-by-step sets of operations to be performed. Algorithms perform calculation, data processing, and / or automated reasoning tasks. Among other things, algorithms can be used to train machine learning models.) And, a model of what? Behaviors involving systems or users, involving entities such as src_ip, dest_ip, domains, users, and applications. While processing of algorithms is considerably more processing intensive than processing of rules, that has been largely mitigated by vastly increased capabilities in processing. And, use of algorithms to create these models allows for far greater flexibility in effectively describing what should be looked for. So now, instead of being limited to looking for known threats, I can search for unknown threats too, with considerably lower FP and FN rates.
While rules are effectively “dead,” many practitioners are still using correlations. However, truly savvy practitioners are moving, or have already moved, to models used in machine learning platforms. Maybe you should be investigating use of models in a machine learning platform?