Hi Super John
“ridge regression assumes the predictors are standardized and the response is centered”
From https://www.datacamp.com/community/tutorials/tutorial-ridge-lasso-elastic-net, which also has formulae for the bias and variance terms in OLS so you can see what they look like for your favorite OLS dataset. (Post coming soon.)
I think the above requirement should hold for Lasso as well, and certainly for elasticNet. In addition, in either Ridge or Lasso, the additional error term is L2(coefficients) or L1(coefficients), so it will help to minimize the values of these coefficients. However, if the orthogonal features ‘z’ are linear combinations of the original features ‘x’, then the coefficients are related by the inverse transform. Hence what we should be trying to do in regularization attempts to lower the variance is set the coefficients of the z’s to 0, but these will be linear combinations of the original coefficients.
Hence: before the predictors can be standardized (which I do as the second step in the code) you have to orthogonalize them w.r.t. the covariance matrix. This is what leads to WarmFuzzy encoding.
LMK what you think!