GLM is actually connected to the Kernel method in machine learning society.
One key success of GLM is that, the “linearity” of the model is:
“linear to model parameters”.
not to confuse with :
“linear to input variable”.
Example 1.
The input variable is in 2D dimensional space, denoted as (x1,x2). We have a data set D size of N, { d1, d2, …, dN}. Now for a simple GLM, we can set up a model like this:
Y = (1 ; X) * w + e;
where (1 ; X) contains all data points with a 1 vector augmented on the left, a trick to simplify the w model parameters : w = [w0; w1, … w2].
Trivially this model can be solve by generalized inverse. Therefore we get the w* optimal model parameters.
Example 2.
Same notation as above, but this time we are playing with more a sophisticated GLM model: with a polynomial fitting in GLM.
You might say, “Hey hold on for a second! Polynomial is no more linear! You are still using G-Linear-M, right?”.
Good catch. Yes we are still using GLM, but using the “kernel trick” !
Say we are using 2nd order polynomial, then basically we are constructing a new “feature space” from the original space (x1, x2). Now it’s (x1^2, x1*x2, x2^2).
Hmm.. technically, now for each data point di=[di.x1, di.x2], you compute its new coordinate in the new feature space, then leading to this:
di_new = [(di.x1)^2 , di.x1*di.x2, (di.x1)^2].
The GLM is now:
Y = [1; Z]*wz + e;
Z is a 3D space mapping of the original 2D input space; Model parameters wz = [wz0; wz1; wz2; wz3].
See? the new model is still “linear” to the parameter, but “non-linear” to the input variables!
That’s why GLM is quite successful and powerful!