An ever-expanding list of concepts in the field of AI to give myself and others an easy reference. Each item in the list contains a short, rudimentary definition I’ve written, as well as a link to a resource that can explain it better.

Ablation Study: Removing some parts of a machine learning model to measure impact on performance

Advantage Function: The difference between a Q-value for a state-action pair and a value for the state. Useful to determine how good an action is relative to its state.

Attention: Neural networks are able to “pay attention” to specific parts of input or output, useful in translating languages or predicting sequences. For example, when trying to predict the next word in “the clouds in the”, you pay attention to the word cloud to predict the word sky

AutoencoderW: A type of neural network that attempts to take raw data, convert it into a simpler representation (usually by limiting the amount of nodes in a hidden layer for representation), and then decode the representation back into it’s data. They are primarly used to extract the useful properties from data automatically. They can do this in an unsupervised fashion, since their output targets are the given input, requiring no labeling.

AutoMLW: Systems where the entire machine learning process, from data preparation to network design, is automated.

Autoregression: A time series model that uses observations from previous time steps to predict the value at the next time step.

BackpropogationW: The algorithm and calculus behind gradient descent traditionally used in feed-forward neural networks

Bayesian: An interpretation of probability where the phrase “probability” expresses a degree of belief between 0 or 1, 0 representing false and 1 representing true. Uses expected values and random variables.

BootstrappingW: Sampling from a distribution to approximate a function. In cases of reinforcement learning, bootstrapping usually samples potential future values to approximate a current value.

Convolutional Neural Network (CNN)W: A neural network primarly used for image processing. These networks design filters for specific parts of an image to extract higher level information, filters such as detecting a certain type of edge.

Covariate Shift: The training distribution changes but the testing distribution does not change

Data Mining: Discovering knowledge and patterns from massive amounts of data, usually in an unsupervised fashion.

Deep LearningW: A subset of machine learning with a multi-step learning process, usually referring to neural networks with two or more layers.

Discount Factors: A variable (usually $\gamma$) that determines how much a model cares about rewards in the distant future compared to immediate rewards.

• $\gamma = 0$ cares only about immediate reward,
• $\gamma = 1$ cares only about sum of all future rewards.

Eligibility Trace: For temporal learning, the eligibility trace is a vector of decaying values that represent when weights were last used. When we encounter an error, this vector allows us to update recent weights harder than weights used long ago.

Feed-forward Neural NetworkW: A simple neural network where information is passed strictly from one layer to the next.

Generative Adversarial Network (GAN)W: A set of two or more neural networks that can generate new data based on existing training data. A simple example is https://thispersondoesnotexist.com/, that can generate fake pictures of humans. - GANs consist of one generator network with the goal to make realistic new data, and a distinction network with the goal to tell real data from fake. As the networks train, they each improve at their individual task, forcing their adversaries to improve in turn.

Genetic AlgorithmsW: Algorithms that try to mimic the evolutionary process by randomly modifying the best-performing sets of parameters while discarding those with the worst performance, then repeating.

GPT-2/3: A language model from OpenAI that can generate text that mimics a writing style incredibly well.

Gradient DescentW: An interative process that attempts to find a minimum of a function that works by moving in the direction that will decrease the gradient until a local minima is reached.

HyperparametersW: Manually defined parameters of the model, such as the size of a neural network, or manually defined parameters of the machine learning algorithm, such as learning rate.

(IID) Independent and Identically DistributedW: Random variables are independent and identically distributed when the value of one variable doesn’t affect the probability of another variable. For example, the outcome of one coin flip does not affect the outcome of another coin flip, so both flips are IID.

KL Divergence: Divergence between two distributions of data. Useful for determining how different fake data is from real data (GANs) or for determining how differemt two policies are for trust-based reinforcement learning.

Level 0-5W: The different “levels” of autonomy related to self driving cars. 0 represents full human control while 5 represents full vehicle control.

Long Short Term Memory (LSTM)W: A type of recurrent neural network that works exceptionally well with sequential input. A unique trait of these networks are their memory cells' “forget gates”, which allow them to control how long they hold onto data for.

The Lottery Ticket Hypothesis: A randomly-initialized, dense neural network contains a subnetwork that is initialised such that — when trained in isolation — it can match the test accuracy of the original network after training for at most the same number of iterations.

Markov Decision ProcessW: A system of states, actions, and probabilities of getting to other states given actions taken from previous states.

Mode Collapse: When the distribution of samples produced by a generative adversarial network represent a subset of the latent distribution, instead of the entire latent distribution. For example, you train a network to produce pictures of animals but it only learns to produce pictures of cats.

Multi-armed BanditW: A core component of reinforcement learning, the multi-armed bandit problem is the classic “exploration versus exploitation” tradeoff. In this problem, expected gain must be maximized in an environment with varying rewards, forcing an agent to decide between keeping an option they know to be safe versus exploring new options that might be better.

Online/Offline Learning: Online learning happens as data comes in. Offline learning happens after data is collected and made into a batch.

Parametric & Non-Parametric Models: Parametric models use a finite number of parameters, like weights in linear regression, to represent a learned hypothesis. Non-Parametric models use variable/infinite/no parameters, like data points in nearest neighbors, to represent a learned hypothesis.

PrecisionW: The fraction of relevant instances among the retrieved instances. Also known as positive predictive value.

RecallW: The fraction of relevant instances that were retrieved. Also known as sensitivity.

Recurrent Neural Network (RNN)W: A type of network that can store state, giving it a type of memory that can process a series of inputs. This can be accomplished by having a cycle within the network.

RegressionW: A set of models that determine relationships between data and a dependent value. For example, linear regression tries to approximate a dependent value while logistic regression tries to determine the probability of a dependent value.

Residual Neural NetworkW: Networks with connections that skip some layers and connect to non-adjacent ones. This type of network helps counter the vanishing gradient problem

Reinforcement LearningW: Algorithms are given a world state they are able to interact with, and learn from the reward their interactions give them.

• Model-Based: You create a model of the world and can predict what the next state and reward will be for each action
• Model-Free: You know what action to take without knowing what to expect, since you don’t have a model of the world
• Value: Networks that determine the value of a state.
• Policy: Network to choose actions. Can directly optimize model instead of computing values, useful when you have a continuous action space. Only uses reward function. Requires a score function to evaluate performance of policy, usually total rewards accumulated given a period of time.
• Actor Critic: Backbone of state of the art reinforcement learning algorithms.
• Uses a value-based Critic to measure how good the action taken is
• Uses a policy-based Actor to to choose actions

StochasticW: Randomly determined process, usually refers to probabilistic outputs of machine learning systems

Supervised LearningW: A model learns to produce a desired output and knows what that output is. Example: Image Recognition

Temporal DifferenceW: Model-free reinforcement learning design which learns by bootstrapping value samples in order to approximate a value function. Once more information is revealed about the true value during later timesteps, you can update the low information bootstrapped guess by using the newly acquired outcome as a “ground truth” to train a network.

• Temporal Difference error is the difference between consecutive temporal predictions.

Trajectories: The history of states (and potentially actions) taken during a walk of a Markov Decision Process.

Transfer LearningW: Taking parts of a network trained on one data set, and using it in a different network with a different dataset.

Transformer: A type of recurrent neural network primarily used with sequential data, like language translation. These networks use an attention model to improve performance.

Unsupervised LearningW: A model learns to produce a desired output without being told what it’s looking for. Example: Playing Chess

Vanishing Gradient ProblemW: In a network, a gradient can become vanishingly small which will stop the weight from changing it’s value, since weights are modified based on their contribution to the gradient.

Feel free to contact me with any suggested additions/changes at jarbus@tutanota.com.