We thought we would start a list of machine learning terms and terminology. Its as much benefit for the team at Black Belt Digital as it is for our readers. Here we go!

A/B testing

  • A statistical way of comparing two (or more) techniques, typically an incumbent against a new rival.

Apache Spark

  • A library for distributed computing for large-scale data manipulation and machine learning


  • An algorithm for training neural networks in which errors are propagated backwards through the network

Bag of words

  • A representation of the words in a phrase or passage, irrespective of order. Different ways of writing the same sentence


  • A machine learning problem involving the prediction of two or more classes from an observation, classifying data enables the identification of the right mathematical model to analyse data.


  • grouping data observations that are similar according to a given criteria

Confusion matrix

  • a table that summarizes how successful a classification model’s predictions were

Data Science

  • The field covering machine learning, data cleaning and preparation, and data analysis techniques such as visualisation.

Deep learning

  • structures algorithms in layers to create an “artificial neural network” that can learn and make intelligent decisions on its own, sub-set of machine learning

Graphic Processing Units

  • The use of graphics cards for high performance computing tasks as opposed to graphical tasks
  • Due to the number of individual cores, GPUs can process far more pictures and graphical data per second than a conventional CPU
  • This provides significant performance gains for tasks such as machine learning (eg; processing a large bank of images)


  • With JupyterHub you can create a multi-user Hub which spawns, manages, and proxies multiple instances of the single-user Jupyter notebook server
  • Project Jupyter created JupyterHub to support many users. The Hub can offer notebook servers to a class of students, a corporate data science workgroup, a scientific research project, or a high performance computing group


  • A data science competition, great way to test potential data scientists


  • Keras enables user-friendly and easy prototyping providing object oriented thinking and enabling the building of neural networks one layer at a time. In just the few lines of code you can create a sequential neural network with the standard bells and whistles like dropout.


  • Kubeflow helps you build composable, portable, and scalable machine learning stacks. With Kubeflow, businesses can speed up the AI tools and framework installation process, particularly leveraging GPGPUs from Nvidia
  • Kubeflow simplifies the process of building production-ready machine learning stacks and reduces the barriers to machine learning by being easy to deploy and reusable

Natural Language Processing (NLP)

  • program that process and analyses large amount of natural language. It enables computers to understand text.


  • is the fundamental package for scientific computing with Python.

Optical Character Recognition (OCR)

  • conversion of images of typed, handwritten or printed text into machine-encoded text. Widely used to read documents and convert to text


  • Python Data Analysis library.


  • programming language that is tailored to data science


  • programming language primarily used for statistical analysis


  • A machine learning problem involving the prediction of a real-valued scalar or vector.


  • Open source toolkit for Python used for data mining and data analysis


  • Open source machine learning framework providing software library for computing using data, developed by Google
  • TensorFlow is an open source software library for high performance numerical Computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices
  • It comes with strong support for machine learning and deep learning and the flexible numerical computation core is used across many other scientific domain


  • A tensor manipulation library for Python which can run code on the GPU.

Training Set

  • A set of examples/observations used for training a machine learning algorithm. Means you test your model quicker before moving to the complete set of data

Source(s): Wikipedia

Google Developers, Machine Learning Glossary