We often encounter confusion and hype surrounding the terminology of Artificial Intelligence. In this post, it is hoped that the security practitioner can have a quick reference guide for some of the more important and common terms.
Note that this is a limited set. We discovered that, once you start defining these terms, the terms themselves introduce new terms that require definition. We had to draw the line somewhere...
- Artificial Intelligence - “The goal of work in Artificial Intelligence is to build machines that perform tasks that normally require human intelligence.” This quote from Nils Nilsson is an excellent definition, but it is not the only one. There are many definitions of Artificial Intelligence here and here.
- Algorithms – are a self-contained step-by-step sets of operations to be performed. Algorithms perform calculation, data processing, and/or automated reasoning tasks. Among other things, Algorithms can be used to train Machine Learning models.
- Machine Learning - A discipline or subfield of Artificial Intelligence. Paraphrasing the definition by Tom Mitchell, ML is the study of computer algorithms that learn from experience to perform a set of predefined tasks.
- Machine Learning Models - The output of a machine learning algorithm. There are two types of machine learning models, those generated by Supervised algorithms and Unsupervised algorithms. See below.
- The difference between “algorithms” and “models”: this is a common question and still quite difficult to answer. In the context of Artificial Intelligence, we can say that learning algorithms generate models. The learning algorithms are either Supervised or Unsupervised. N.B. People often use “models” and “algorithms” interchangeably which is a common source of confusion. To the layman, think that algorithms are like programs and the models are the output of the program.
- Unsupervised Learning (algorithm) - a family of machine learning algorithms that learn without labels (labels defined below). The output of Unsupervised Learning algorithms are models that capture the structure of the data, can identify groups, or find statistical outliers. For example, Unsupervised Learning models can show you behaviors that are unlike other behaviors in a corpus of data.
- Supervised Learning (algorithm) - a family of machine learning algorithms that learn from labeled data. The output of Supervised Learning algorithms are predictive models that can classify or assign a score to a data pattern. For example, trained Supervised Learning models can classify behaviors patterns into different attack tactics, or can assign a risk score to a behavior. In cyber-security, Supervised Learning models predict what a human would label a given behavior pattern.
- Labeling – is the act of classification or describing something. For PatternEx, the act of labeling is something that a human analyst does every day. He or she marks something as a malicious or benign behavior. The more labels are provided, the more accurate the system becomes.
- Active Learning - Active learning is a machine learning process in which a learning algorithm interactively requests inputs from an external source to improve a model. It is most commonly applied when only unlabeled data is available, the goal is to train Supervised Learning models, and the external source is a human expert that provides labels, and the labeling process is expensive and/or slow. Active learning strategies are also useful when, as in the case of InfoSec, the data changes fast.
- Behavior Vectors – a quantified description of the activity of the modeled entities.
- Entities - a thing with distinct, independent existence against which the behaviors relate. In cyber-security, examples would be users, IP’s, domains, and so on.
- Human-Assisted Artificial Intelligence - the synergy between human intuition and artificial intelligence. Note that the humans assist the AI by providing feedback (e.g. labels) and the trained AI assists the humans by automating and scaling the tasks requiring human intelligence.
- Predictions – are the activity of the system anticipating how an event would be classified by the security analyst.
- Rare Events - very similar to “anomalies” and “outliers,” Rare Events are events that activities seen in log data that are unusual or out of the ordinary but not yet determined to be either malicious or benign.
- Transfer Learning – means you can port knowledge acquired at one environment to another to improve model accuracy. For example, a model trained at company X can be transferred to companies A, B and C, increasing the detection capabilities of the entire group.
- Virtual Analyst - the term that describes the effect of a fully trained AI system. Because a trained AI system greatly scales the analytic capability of the human analyst team, we say it is like expanding your team with “virtual analysts.”
PatternEx Unique Approach
PatternEx comes with many algorithms out-of-the-box that allow it to create predictive models that select what the analyst should review. Humans will always be needed to identify in context what is malicious in a constantly changing sea of events. In this way, Human-Assisted Artificial intelligence systems learn from analysts to identify what events are malicious. This results in greater detection accuracy at scale and reduced mean time to identify an attack.