Posted on 23rd August 2019 in Machine Learning

Predict the viscosity of a water-glycerol mixture using Machine Learning

In this post we will explore the use of different machine learning algorithms to predict the viscosity of a water-glycerol mixture as a function of the concentration of the glycerol and of the temperature of the mixture.

In the database used to train and test the machine learning algorithms the viscosity (cP) of the mixture is given as a function of the glycerol concentration (Wt %) and of the temperature (°C). The database consists of actual experimental data taken from Segur, J.B. and Oberstar, H., Viscosity of glycerol and its acqueous solutions, Industrial and Engineering Chemistry, 43 (9), 2117-2120 (1951).

The Machine Learning Algorithms we will consider are:

  • Linear regression
  • K-Nearest Neighbours
  • Decision tree and random forest
  • Support Vector Machine
  • Artificial Neural Network

The performance of the Machine Learning algorithms will also be compared with a state-of-the-art correlation taken from the scientific literature (Cheng, N.S., Formula for the viscosity of a water-glycerol mixture, Industrial & Engineering Chemistry Research, 47, 3285-3288 (2008)).

All the Machine Learning analyses will be performed using the well-known Scikit Learn Python library within the Jupyter Notebook environment. In addition, a home-made Python module will be used to perform an ANOVA analysis for the linear regression.

The Jupyter Notebook, the experimental database and the Python function for the ANOVA analysis can be found in this GitHub repository.

As a final remark, it is interesting to observe that Cheng’s correlation and the ANN have a very similar – and very high – accuracy. It is worth to point out that the Cheng’s correlation has been obtained by fitting the entire database (i.e. the inference step was missing). In addition, physical considerations were used in order to obtain a suitable functional representation of the relationship between the features and the response. The ANN, on the other hand, did not possess any a-priori knowledge of the physics of the problem nor of the functional relationship between the predictors and the response and and has learned about it from the training data. This relatively simple application already shows the capability of deep learning to extract “knowledge” and features from the training data in a way that can be easily extended and generalised.

If you found this post useful feel free to share it and give us a feedback in the comment section below.

Leave a Reply