The team at SuperDataScience, Kirill Ermenko and Hadelin de Ponteves, have provided their thoughts on “Data Science Trends for 2018”. I thought it was worthwhile summarising some of the key points. If you want to listen to the whole Podcast, it is available here.

Artificial Intelligence

Companies are integrating artificial intelligence (AI) into their business processes with an estimated 60% of companies having made the move to AI with automated processes reducing costs. AI is being used with a very narrow focus, more in parts of the business and specific areas such as the marketing. Companies are looking to build AI teams.


Blockchain is a shared distributed decentralized ledger that removes the middleman, like banks, enabling secure transactions even when the participants do not know each other. The decentralised nature of Blockchain is one of its redeeming features, it enables much faster connections.


The security of data on the Internet is going to start growing again. In 2017 we saw major security breaches including May 2017’s WannaCry cyber attack. One of the biggest attacks in world history happened in 2017 when the credit rating agency Equifax announced a data breach effecting some 147m customers. The proliferation of data is making it harder to keep it safe and secure with hackers having access to more sophisticated tools.

A great analogy is that it is a war of technology. Once technology protects your data, someone develops better technology to break your technology that protects your data. So there is a leap over leap technology battle. We are going to see deep learning and machine learning in the security space that helps to find areas that can be breach.

Deep Learning to become mainstream

Deep learning insights is supporting technology such as image classification, machine translation, facial recognition, chat bots and other things.

The proliferation of open-source platforms like GitHub is democratising AI and deep learning, providing a more-easy way to apply the technology to business problems. This wasn’t available a few years ago providing a faster adoption of Deep Learning.

The most commonly used AI models are still logistic regression such as Random forest, XGBoost, decision trees with deep learning models like CNNs, RNNs or GANs used to a lesser extent but growing.

Big Data Systems

The growth of big data systems such as Hadoop, Spark, Hive and Pig are becoming more prominent as the amount of data is constantly increasing. You need these big data systems due the fast nature they manage data and organise it to leverage insights.

In order to train deep learning models and AI algorithms, you need a lot of data that is stored, and you need to be able to access it quickly.

Big Data in the Cloud

Big data in the cloud is cutting costs with the Cloud offering a more accessible and scalable proposition. Cloud computing offers economies of scales as providers finding it easier to upgrade hardware.

Digital Twins

A digital twin is pretty much the identical copy, like being connected to your objects and you can transfer some information between the digital twins and yourself, so that you can use them at a better and better rate. It is a model that learns how they behave, how they interact and what dependencies are in there.

Augmented Reality

There exists VR, virtual reality, but augmented reality offers greater potential. Augmented reality (AR) is a live direct or indirect view of a physical, real-world environment whose elements are “augmented” by computer-generated perceptual information, ideally across multiple sensory modalities, including visual, auditory, haptic, somatosensory and olfactory. Augmented reality alters one’s current perception of a real world environment, whereas virtual reality replaces the real world environment with a simulated one.

Self-Serve Analytics

Data science is growing, and it is important to have data scientists, but at the same time, the ability for everybody in your business to look into data and get insights from it.

The automation of systems that take your data as input and will return the output without needing the work of a data scientist doing all the process of data analysis. Self-managed data systems have the potential to be automated. There will always need data scientists to improve data systems, check these systems, control that they give the right insights, check that that makes sense because sometimes the decisions can only be good decisions if you include the human factor. Self-serve analytics will grow, but will never grow to the point that it will replace the data science jobs.