What are the types of Regressions? Here they are:
Linear Regression
Logistic Regression
Polynomial Regression
Stepwise Regression
Ridge Regression
Lasso Regression
ElasticNet Regression
1. Linear Regression
It is one of the most widely known modeling technique. Linear regression is usually among the first few topics which people pick while learning predictive modeling. In this technique, the dependent variable is continuous, the independent variable(s) can be continuous or discrete, and the nature of the regression line is linear.
Linear Regression establishes a relationship between the dependent variable (Y) and one or more independent variables (X) using a best fit straight line (also known as a regression line).
It is represented by an equation Y=a+b*X + e, where a is intercept, b is the slope of the line and e is error term. This equation can be used to predict the value of the target variable based on a given predictor variable(s).
2. Logistic Regression
Logistic regression is used to find the probability of event=Success and event=Failure. We should use logistic regression when the dependent variable is binary (0/ 1, True/ False, Yes/ No) in nature. Here the value of Y ranges from 0 to 1 and it can be represented by the following equation.
odds= p/ (1-p) = probability of event occurrence / probability of not event occurrence
ln(odds) = ln(p/(1-p))
logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3....+bkXk
Above, p is the probability of the presence of the characteristic of interest. A question that you should ask here is “why have we used to log in the equation?”.
Since we are working here with a binomial distribution (dependent variable), we need to choose a link function which is best suited for this distribution. And, it is a logit function. In the equation above, the parameters are chosen to maximize the likelihood of observing the sample values rather than minimizing the sum of squared errors (like in ordinary regression).
3. Polynomial Regression
A regression equation is a polynomial regression equation if the power of the independent variable is more than 1. The equation below represents a polynomial equation:
y=a+b*x^2
In this regression technique, the best fit line is not a straight line. It is rather a curve that fits into the data points.
4. Stepwise Regression
This form of regression is used when we deal with multiple independent variables. In this technique, the selection of independent variables is done with the help of an automatic process, which involves no human intervention.
This feat is achieved by observing statistical values like R-square, t-stats and AIC metric to discern significant variables. Stepwise regression basically fits the regression model by adding/dropping co-variates one at a time based on a specified criterion. Some of the most commonly used Stepwise regression methods are listed below:
Since we are working here with a binomial distribution (dependent variable), we need to choose a link function which is best suited for this distribution. And, it is a logit function. In the equation above, the parameters are chosen to maximize the likelihood of observing the sample values rather than minimizing the sum of squared errors (like in ordinary regression).
3. Polynomial Regression
A regression equation is a polynomial regression equation if the power of the independent variable is more than 1. The equation below represents a polynomial equation:
y=a+b*x^2
In this regression technique, the best fit line is not a straight line. It is rather a curve that fits into the data points.
4. Stepwise Regression
This form of regression is used when we deal with multiple independent variables. In this technique, the selection of independent variables is done with the help of an automatic process, which involves no human intervention.
This feat is achieved by observing statistical values like R-square, t-stats and AIC metric to discern significant variables. Stepwise regression basically fits the regression model by adding/dropping co-variates one at a time based on a specified criterion. Some of the most commonly used Stepwise regression methods are listed below:
Standard stepwise regression does two things. It adds and removes predictors as needed for each step.
Forward selection starts with the most significant predictor in the model and adds variable for each step.
Backward elimination starts with all predictors in the model and removes the least significant variable for each step.
The aim of this modeling technique is to maximize the prediction power with a minimum number of predictor variables. It is one of the methods to handle higher dimensionality of data set.
5. Ridge Regression
Ridge Regression is a technique used when the data suffers from multicollinearity ( independent variables are highly correlated). In multicollinearity, even though the least squares estimates (OLS) are unbiased, their variances are large which deviates the observed value far from the true value. By adding a degree of bias to the regression estimates, ridge regression reduces the standard errors.
Above, we saw the equation for linear regression. Remember? It can be represented as:
y=a+ b*x
This equation also has an error term. The complete equation becomes:
y=a+b*x+e (error term), [error term is the value needed to correct for a prediction error between the observed and predicted value]
=> y=a+y= a+ b1x1+ b2x2+....+e, for multiple independent variables.
In a linear equation, prediction errors can be decomposed into two sub-components. First is due to the biased and second is due to the variance. Prediction error can occur due to any one of these two or both components. Here, we’ll discuss the error caused due to variance.
6. Lasso Regression
lasso regression, l1 regularization
Similar to Ridge Regression, Lasso (Least Absolute Shrinkage and Selection Operator) also penalizes the absolute size of the regression coefficients. In addition, it is capable of reducing the variability and improving the accuracy of linear regression models. Look at the equation below: Lasso regression differs from ridge regression in a way that it uses absolute values in the penalty function, instead of squares. This leads to penalizing (or equivalently constraining the sum of the absolute values of the estimates) values which causes some of the parameter estimates to turn out exactly zero. Larger the penalty applied, further the estimates get shrunk towards absolute zero. This results in variable selection out of given n variables.
7. ElasticNet Regression
ElasticNet is a hybrid of Lasso and Ridge Regression techniques. It is trained with L1 and L2 prior as regularizer. Elastic-net is useful when there are multiple features which are correlated. Lasso is likely to pick one of these at random, while elastic-net is likely to pick both.
elastic net regression
A practical advantage of trading-off between Lasso and Ridge is that it allows Elastic-Net to inherit some of Ridge’s stability under rotation.
How to select the right regression model?
Life is usually simple when you know only one or two techniques. One of the training institutes I know of tells their students – if the outcome is continuous – apply linear regression. If it is binary – use logistic regression! However, higher the number of options available at our disposal, more difficult it becomes to choose the right one. A similar case happens with regression models.
Within multiple types of regression models, it is important to choose the best-suited technique based on the type of independent and dependent variables, dimensionality in the data and other essential characteristics of the data. Below are the key factors that you should practice to select the right regression model:
Forward selection starts with the most significant predictor in the model and adds variable for each step.
Backward elimination starts with all predictors in the model and removes the least significant variable for each step.
The aim of this modeling technique is to maximize the prediction power with a minimum number of predictor variables. It is one of the methods to handle higher dimensionality of data set.
5. Ridge Regression
Ridge Regression is a technique used when the data suffers from multicollinearity ( independent variables are highly correlated). In multicollinearity, even though the least squares estimates (OLS) are unbiased, their variances are large which deviates the observed value far from the true value. By adding a degree of bias to the regression estimates, ridge regression reduces the standard errors.
Above, we saw the equation for linear regression. Remember? It can be represented as:
y=a+ b*x
This equation also has an error term. The complete equation becomes:
y=a+b*x+e (error term), [error term is the value needed to correct for a prediction error between the observed and predicted value]
=> y=a+y= a+ b1x1+ b2x2+....+e, for multiple independent variables.
In a linear equation, prediction errors can be decomposed into two sub-components. First is due to the biased and second is due to the variance. Prediction error can occur due to any one of these two or both components. Here, we’ll discuss the error caused due to variance.
6. Lasso Regression
lasso regression, l1 regularization
Similar to Ridge Regression, Lasso (Least Absolute Shrinkage and Selection Operator) also penalizes the absolute size of the regression coefficients. In addition, it is capable of reducing the variability and improving the accuracy of linear regression models. Look at the equation below: Lasso regression differs from ridge regression in a way that it uses absolute values in the penalty function, instead of squares. This leads to penalizing (or equivalently constraining the sum of the absolute values of the estimates) values which causes some of the parameter estimates to turn out exactly zero. Larger the penalty applied, further the estimates get shrunk towards absolute zero. This results in variable selection out of given n variables.
7. ElasticNet Regression
ElasticNet is a hybrid of Lasso and Ridge Regression techniques. It is trained with L1 and L2 prior as regularizer. Elastic-net is useful when there are multiple features which are correlated. Lasso is likely to pick one of these at random, while elastic-net is likely to pick both.
elastic net regression
A practical advantage of trading-off between Lasso and Ridge is that it allows Elastic-Net to inherit some of Ridge’s stability under rotation.
How to select the right regression model?
Life is usually simple when you know only one or two techniques. One of the training institutes I know of tells their students – if the outcome is continuous – apply linear regression. If it is binary – use logistic regression! However, higher the number of options available at our disposal, more difficult it becomes to choose the right one. A similar case happens with regression models.
Within multiple types of regression models, it is important to choose the best-suited technique based on the type of independent and dependent variables, dimensionality in the data and other essential characteristics of the data. Below are the key factors that you should practice to select the right regression model:
Data exploration is an inevitable part of building a predictive model. It should be your first step before selecting the right model like identify the relationship and impact of variables
To compare the goodness of fit for different models, we can analyze different metrics like the statistical significance of parameters, R-square, Adjusted r-square, AIC, BIC and error term. Another one is Mallow’s Cp criterion. This essentially checks for possible bias in your model, by comparing the model with all possible submodels (or a careful selection of them).
Cross-validation is the best way to evaluate models used for prediction. Here you divide your data set into two groups (train and validate). A simple mean squared difference between the observed and predicted values give you a measure for the prediction accuracy.
If your data set has multiple confounding variables, you should not choose the automatic model selection method because you do not want to put these in a model at the same time.
It’ll also depend on your objective. It can occur that a less powerful model is easy to implement as compared to a highly statistically significant model.
Regression regularization methods(Lasso, Ridge, and ElasticNet) works well in case of high dimensionality and multicollinearity among the variables in the data set.
By now, I hope you would have got an overview of regression. These regression techniques should be applied considering the conditions of data. One of the best tricks to finding out which technique to use is by checking the family of variables i.e. discrete or continuous.
End note...
To compare the goodness of fit for different models, we can analyze different metrics like the statistical significance of parameters, R-square, Adjusted r-square, AIC, BIC and error term. Another one is Mallow’s Cp criterion. This essentially checks for possible bias in your model, by comparing the model with all possible submodels (or a careful selection of them).
Cross-validation is the best way to evaluate models used for prediction. Here you divide your data set into two groups (train and validate). A simple mean squared difference between the observed and predicted values give you a measure for the prediction accuracy.
If your data set has multiple confounding variables, you should not choose the automatic model selection method because you do not want to put these in a model at the same time.
It’ll also depend on your objective. It can occur that a less powerful model is easy to implement as compared to a highly statistically significant model.
Regression regularization methods(Lasso, Ridge, and ElasticNet) works well in case of high dimensionality and multicollinearity among the variables in the data set.
By now, I hope you would have got an overview of regression. These regression techniques should be applied considering the conditions of data. One of the best tricks to finding out which technique to use is by checking the family of variables i.e. discrete or continuous.
End note...
No comments:
Post a Comment