Artificial Intelligence and Machine Learning —How reliable are Auto ML platforms?
If you are undertaking a Data Science or Machine Learning project in your organisation for the first time, then you are likely thinking about many aspects of the project. One such key factor is whether to hire data scientists
or depend on AutoML (Automatic Machine Learning) platforms. In this article, I will explain why you need to hire data scientists or Artificial Intelligence (AI)/Machine Learning (ML) experts instead of relying on AutoML platforms.
A brief overview of AutoML platforms
Let us examine why we are even talking about relying on AutoML platforms. While ML has proven its importance indisputably, many factors limit its widespread adoption. A key factor is the lack of qualified data scientists and AI/ML experts. Data science, AI, and ML are niche skills, and it is hard to find experienced professionals in these fields. Several technology companies and other developers have stepped in to fill this gap, and they created AutoML platforms. These platforms automate several tasks in a data science project. AutoML platforms enable developers with relatively less ML experience to train ML models according to the business needs of the project. A few examples of AutoML platforms created by technology companies are as follows:
- Google’s Cloud AutoML
If you are looking for examples of open-source AutoML packages, then you can explore Auto-Sklearn or Auto-Keras. AutoML platforms might use several AI capabilities like vision, natural language processing (NLP), etc. Some observers state that AutoML platforms can reduce the dependence on data scientists and AI/ML experts. We will now examine this claim.
Data Scientists trying to make sense of data visualizations
It would be best if you had data scientists and AI/ML experts in your project. Despite the promise of AutoML platforms, you need data scientists and AI/ML experts in your project. We say this for the following reasons:
1. AutoML platforms can handle only a few parts of a data sciences project.
A data sciences project involves quite a few lifecycle stages. According to TDSP (Team Data Science Process), these stages are as follows:
- Business understanding: This is when you define the scope of your data sciences project after analysing the business requirements. Needless to state, you need data scientists here.
- Data understanding: At this stage, your team needs to understand the data sources. The team also needs to understand the environment where these data sources are available. Data scientists are indispensable for this analysis.
- Data preparation: This stage involves exploring the data and cleansing it. It would help if you had subject matter experts here, and AutoML platforms cannot help.
- Modelling: This stage involves ML model evaluation, the training of the ML model with suitable data sets, and feature engineering. AutoML platforms can help with evaluating the ML model and training it. These platforms cannot play a role in feature engineering, and we will shortly talk about this task.
- Evaluation: This stage includes activities like scoring and performance monitoring, where human expertise is indispensable.
- Deployment: Deploying a data sciences solution involves complex processes and coordination where you need experienced professionals.
Effectively, AutoML platforms can help in two tasks during the modelling stage only. These two tasks involve repetitive work since you will need multiple iterations to train the ML models you choose. Data science and AI/ML expertise aren’t easy to find. You would like to utilise skilled professionals in higher-end tasks and not repetitive ones. Utilising AutoML platforms here can help you to augment the effort of the qualified data scientists and AI/ML experts in your team. As you can see, AutoML platforms cannot play a meaningful part during the other stages of the project.
2. The feature engineering part of the modelling stage requires data scientists with domain knowledge.
We talked about how AutoML platforms can help during the modelling stage of your data science or ML project. While they help with model evaluation and the training of ML Models, this stage also includes feature engineering. Feature engineering is an important task in any data science project. Feature Engineering is where your team needs to identify relevant and meaningful aspects of the process that you are trying to model in your project. These processes vary widely depending on the industry domain. It would be best if you had data scientists with in-depth domain knowledge for this. At the time of writing this, AutoML platforms haven’t incorporated such industry domain knowledge in their platforms. You can expect such developments to take quite some time. By its very nature, feature engineering requires analysis, imagination, and flexibility. It would help if you had data scientists with significant subject matter expertise and experience for this.
Data Scientist thinking about important data features.
3. You need data scientists and AI/ML experts to take advantage of unsupervised and reinforcement-bases ML algorithms.
There are various kinds of learning algorithms in machine learning, which are as follows:
- Supervised learning algorithms: These ML algorithms work with data sets that have questions as well as their answers. We commonly call this “labelled data.” The supervised learning algorithms train the ML models based on this data. You might find it very hard to obtain labelled data sets, though, and they typically cost more.
- Unsupervised learning algorithms: These ML algorithms work without labelled data sets. The input data does not contain the answers to the questions. These algorithms work on identifying hidden patterns and structures in the data. Unsupervised algorithms train the ML models to identify such patterns and structures, which is more complicated. You need experienced data scientists and AI/ML experts to judge whether the algorithm is achieving success. They need to undertake course-corrections when the unsupervised algorithm is failing.
- Reinforcement-based learning algorithms: These ML algorithms train by trial and error. The kind of input data sets they deal with do not contain any clear answers. These algorithms find a solution to a problem. Subsequently, a feedback loop tells them whether they have found the right solution. The algorithm learns from its errors and improves by trial and error.
At the time of writing this, AutoML platforms can deal with supervised learning algorithms only. It will be quite sometime before AutoML platforms
can incorporate unsupervised and reinforcement-based learning algorithms. In your data science or ML project, you might often need to use such algorithms. Experienced data scientists and AI/ML experts who understand this are thus required.
Data Scientists are needed to intervene and build strategies using the concepts/logic needed.
4. You need data scientists and AI/ML experts to deal with complex data types.
AutoML platforms can process structured data, which is relational. Several popular AutoML platforms can also process some forms of unstructured data like images and texts. However, they have limitations when you need to deal with complex data forms like network data. You need to build ML models that that process such data and AutoML platforms cannot help. For this, you need data scientists and AI/ML experts.
We examined whether you could undertake a data science or ML project with the help of AutoML platforms alone and without experienced data scientists and AI/ML experts. Most of the Data Science Project Lifecycle stages need human expertise. AutoML platforms cannot incorporate some of the vital ML algorithms, and they cannot process some complex data forms. It would help if you certainly had experienced data scientists and AI/ML experts on your team. You should use AutoML platforms to augment their effort since these platforms can handle repetitive tasks.