GLM is actually connected to the Kernel method in machine learning society.

One key success of GLM is that, the “linearity” of the model is:

“linear to model parameters”.

not to confuse with :

“linear to input variable”.

Example 1.

The input variable is in 2D dimensional space, denoted as (x1,x2). We have a data set D size of N, { d1, d2, …, dN}. Now for a simple GLM, we can set up a model like this:

Y = (1 ; X) * w + e;

where (1 ; X) contains all data points with a 1 vector augmented on the left, a trick to simplify the w model parameters : w = [w0; w1, … w2].

Trivially this model can be solve by generalized inverse. Therefore we get the w* optimal model parameters.

Example 2.

Same notation as above, but this time we are playing with more a sophisticated GLM model: with a polynomial fitting in GLM.

You might say, “Hey hold on for a second! Polynomial is no more linear! You are still using G-Linear-M, right?”.

Good catch. Yes we are still using GLM, but using the “kernel trick” !

Say we are using 2nd order polynomial, then basically we are constructing a new “feature space” from the original space (x1, x2). Now it’s (x1^2, x1*x2, x2^2).

Hmm.. technically, now for each data point di=[di.x1, di.x2], you compute its new coordinate in the new feature space, then leading to this:

di_new = [(di.x1)^2 , di.x1*di.x2, (di.x1)^2].

The GLM is now:

Y = [1; Z]*wz + e;

Z is a 3D space mapping of the original 2D input space; Model parameters wz = [wz0; wz1; wz2; wz3].

See? the new model is still “linear” to the parameter, but “non-linear” to the input variables!

That’s why GLM is quite successful and powerful!