Analysing non-linear relationships with partial dependency plots (PDPs), common information and feature importance


When you an initial start driving girlfriend are much less experienced and, sometimes, much more reckless. Together you age, you gain more experience (and sense) and also it i do not care less likely that you’re connected in one accident. However, this trend won’t proceed forever. As soon as you reach old age your eyesight may deteriorate or your reactions might slow. Now, together you age, it becomes more most likely that you’re connected in one accident. This method the probability of one accident has a non-linear connection with age. Finding and incorporating relationships prefer these have the right to improve the accuracy and also interpretation of your models.

You are watching: Non linear scatter plot



Source: Author

In this article, we will certainly dive right into non-linear relationships. We will check out how they deserve to be visualised utilizing scatter plots and partial dependence plots (PDPs). We will then relocate onto ways of highlighting potential non-linear relationship in her data. These encompass metrics like feature importance and mutual information. Girlfriend can uncover the R code offered for this evaluation on GitHub. Before we start, it’s worth explaining exactly what we average by a non-linear relationship.

What room non-linear relationships?


Figure 1: instance of a direct relationship

If two variables have actually a linear relationship, we deserve to summarise that connection with a right line. The line have the right to have either a confident or an adverse slope however the slope will constantly remain constant. You deserve to see an instance in figure 1. In this case, we have actually a positive direct relationship. Another way of looking at this is that boost in the variable X will an outcome in the same rise in Y nevertheless of the beginning value that X.

On the various other hand, for non-linear relationships, the adjust in change Y due to a adjust in change X would count on the starting value that X. You have the right to see some instances of this in figure 2. The age-accident relationship given above could be quadratic. The is, the probability of an accident decreases and then later increases with age. Ultimately, any kind of relationship that cannot be summarised by a directly line is a non-linear relationship. To be precise these would likewise include interactions however we focus on those types of relationship in another article.

Non-linear models, prefer random forests and also neural networks, can immediately model non-linear relationships choose those above. If we want to usage a linear model, like straight regression, us would very first have to carry out some feature engineering. Because that example, us can add age² to our dataset to capture the quadratic relationship. To do much more effective feature engineering it help to first find these relationship in our data.


To aid explain just how we can discover these relationships, we will usage a randomly created dataset. You have the right to see the list of functions in Table 1 where the price is our target variable. We space going to shot to suspect the price the a second-hand vehicle using the 4 features. The dataset has been design so the car_age and repairs have a non-linear partnership with price. Whereas km_driven has actually a linear relationship and owner_age has actually no relationship.

You have the right to see what we mean by this in the scatterplots in figure 3. Below we can see the two attributes with non-linear relationships. If this was a actual dataset us would expect some intuitive factors for these. For example, it makes sense the the price of a automobile decreases as it ages yet why does it then start to increase? Perhaps many of the older cars room classic\collectable cars and also so the price increases with age.

We can come up v a similar narrative because that the repairs feature. This is the variety of times the automobile has been because that a organization or obtained a repair. Throughout its lifetime, that is typical to take a car for regimen services. So, a little value for this feature may suggest that the automobile has to be neglected. ~ above the other hand, huge values may indicate that automobile has needed additional repairs on optimal of these traditional services. This cars will likely offer the brand-new owner much more issues in the future.

You can see the remaining relationships in number 4. As mentioned owner_age, has actually no relationship with price. We have the right to see this in the chart together the points are randomly scattered. We can also see the km_driven has a an adverse linear relationship with price. We’ve included these together it will be valuable to to compare the evaluation of these relationships to the of the non-linear ones.

Scatterplots favor these room a simple way to visualise non-linear relationships but they will not constantly work. Because that each chart, we are visualising the relationship in between the target variable and only one feature. In reality, the target variable will have actually relationships with plenty of features. This and the gift of statistical variation means the points will be spread roughly the underlying trends. We can currently see this in the charts over and, in a genuine dataset, this will be even worse. Ultimately, to plainly see relationships we should strip the end the effect of other features and also statistical variation.

Partial dependence plots (PDPs)

This brings us to PDPs. To create a PDP we very first have to fit a model to ours data. Specifics we usage a random woodland with 100 trees. In Table 2, we have two rows in ours dataset provided to train the model. In the last column, we deserve to see the predicted price the the second-hand car. These space the predictions make by the random forest given the attribute values.

Table 2: car price forecast examples

To produce a PDP, we begin by varying the value of one feature while holding the others constant. Us then plot the resulting predictions for each value of the feature. Spring at number 5, this might make much more sense. Here we have taken the two cars in Table 2. We have plotted the guess price (partial yhat) because that each feasible value the car_age while keeping the original values the the other features. (e.g. repairs will remain at 25 and 12). The two black points exchange mail to the yes, really predictions in Table 2 (i.e. Because that their actual car_age).

Figure 5: forecast plot that examples

We monitor this process for every row in our dataset. You have the right to see every these separation, personal, instance prediction present in number 6. Finally, to develop the PDP us calculate the typical predicted value for each worth of car_age. This is presented by the bolder yellow line. You have the right to now plainly see the non-linear relationship. The is the predicted price at first decreases however then later increases. Similarly, we can see the non-linear partnership for repairs in number 7.

Figure 7: PDP that repairs

In comparison, we have the right to see the PDP because that km_driven in number 8 and also the PDP for owner_age in number 9. Together mentioned, km_driven has actually a straight relationship v price. We can see this in the PDP wherein the typical predicted worth decreases linearly. Similarly, over there is no relationship with owner_age. Here the mean predicted worth remains relatively constant.

Figure 9: PDP that owner_age

These plots carry out clearer visualisations of the fads for 2 reasons. Firstly, by hold the other function values constant, we can focus on the tendency of one feature. That is just how predictions adjust due to changes in this feature. Secondly, the random forest will model the underlying patterns in the data and also make predictions using these trends. Hence, as we space plotting predictions, we are able to piece out the effect of statistical variation.

Getting the many out of her PDPs

Looking at figure 10, friend can obtain an idea the the accuracy that the random woodland used to create these PDPs. The design is not perfect however it go a fairly great job the predicting automobile price. In fact, the accuracy the the design is not that important. The goal is come visualise non-linear relationships and not make exact predictions. However, the far better your model the much more reliable your analysis will be. One underfitted model may not catch the relationships and also an overfitted design may show relationships that are not in reality there.

Figure 10: accuracy ~ above testset

The selection of design is likewise not the important. This is since PDPs room a design agnostic technique. In this analysis, we have used a random forest however you can use any kind of non-linear such together XGBoost or a neural network. Depending upon your dataset, different models might be much better at capturing the underlying non-linear relationships.

Finding non-linear relationships

Just using PDPs might not be enough to discover non-linear relationships. This is because you might have many features in her data and also trying to analyse every the PDPs will be time-consuming. We need a means of narrowing under our search. The is we require a metric the tells us if over there is a significant relationship between our features and the target variable. We can then focus on those features. In the rest of the article, we’ll explore how you have the right to do this using function importance or mutual information.

Why we can’t usage correlation

Before us dive into those metrics, that is worth discussing why correlation is not appropriate. The Pearson correlation coefficient is a typical metric used to find significant relationships. However, that is a measure of direct correlation an interpretation it have the right to only be provided to find linear relationships. We deserve to see this in figure 11, wherein there is a large negative correlation in between km_driven and also price. In comparison, the correlation because that car_age is a lot lower.

Figure 11: correlation that features and also car price

In part cases, a direct trend may be an excellent at approximating a non-linear one. So we may still view some high correlation values also for non-linear relationships. We can see this for the repairs feature where there is still a fairly large negative correlation. In general, this metric will not aid us identify non-linear relationships.

Mutual Information

Mutual information provides a measure up of just how much the uncertainty around one variable is diminished by observing another. The does this by to compare the joint circulation of the variables to the product of the marginal distributions. As viewed below, the joint distribution of live independence variables will be equal to the product of their marginal distributions. So, if the joint distribution is different it suggests there is a dependence and we will calculate higher mutual info values. Ultimately, a dependence means that the 2 variables have actually a relationship.

For indepented variables: f(x,y) = f(x)f(y)

A partnership does not need to be linear for two variables to it is in dependant. This means that this metric have the right to be supplied to highlight non-linear relationships. Watch the worths for the mutual information in between price and our 4 functions in number 13. Contrasted to correlation we have the right to now see greater values coming with for every the attributes that have a connection with the target variable.

Figure 13: Mutual info of features and also car price

For this analysis, we have actually only looked at consistent variables. Mutual info can also be provided with discrete variables. That is when one variable is discrete and also the other consistent or when both are discrete. This is another benefit over correlation together correlation have the right to only be supplied with consistent variables.

Feature Importance

Another strategy would it is in to very first train a model and also then use the function importance scores that this model. Attribute importance offers a measure of just how much a particular feature has actually improved the accuracy that a model. In figure 12, you deserve to see the scores obtained from the very same random forest we provided to create the PDPs. Special, we use the percentage boost in MSE together our measure up of feature importance.

Figure 12: attribute importances (% rise in MSE)

Like with PDPs, we have the right to use any type of model because that this approach as long as that is non-linear. Us cannot use straight models, like linear regression, together these room unable to model non-linear relationships. In other words, attributes with a non-linear relationship may not enhance the accuracy leading to lower function importance scores.

In the end, we deserve to use both common information and feature prestige to highlight non-linear (and linear) relationships. However, this metrics do not tell us anything about the nature of this relationships. I.e. Is the connection quadratic, exponential, stepwise, etc… So, once we have actually highlighted this potential relationships, we have to go ago to the PDPs to determine their nature.

As mentioned, interactions room a special type of non-linear relationship. These take place when the relationship between the target variable and feature relies on the value of one more feature. We analyse these varieties of relationships in a similar method in the write-up below.

Finding and also Visualising Interactions

Analysing interaction using function importance, Friedman’s H-statistic and ICE Plots

Image Sources

All images are my own or obtain from In the instance of the latter, I have actually a “Full license” as identified under your Premium Plan.

See more: World Of Warcraft Baby Clothing, World Of Warcraft Baby


<1> C. Molnar, Interpretable maker Learning(2021)

Risk Data Scientist — structure credit risk and also fraud models for the man. Exploring AI topics for myself.