research resources

Understand Latent Semantic Analysis

LSA is basically a SVD on the term-document occurrence matrix, X.

X= U S V.

X: Nxd

X can be approximated by taking the top K columns of U matrix( term-term) ; K row&col of S diagnal matrix (eigenvalues); K rows of V matrix (doc-doc).

U: Nxk, orthonormal matrix, i.e. UU’ = I

S: kxk

V: kxd, orthonormal matrix, i.e. VV’ = I

3 similarity can be computed:

1) term-term similarity, which can be understand as doing a correlation on all terms that are represented by document vectors, e.g. which set of docs contains the word “computer”:

= X*X’  = USV * V’ S’ U’ = U SS U’ => basically, this means the term-term similarity (correlation) can be computed by the “squaring” the K – low rank approximation of U matrix, which is termxK size.

2) doc-doc similarity: similar to 1), just a transposed way:

= X’ * X = V’ S’ U’ U S V = V’ SS V.

3) term-doc

X_apprx = USV, where X_apprx here is the low rank approximation of the SVD, i.e. the reconstruction.

Actually, one more powerful and industry related application is the concept of “pseudo document”, which can be applied to “indexing”, enabling the better query.

Basically, a query vector (Just like a new document) will be linearly transformed into the LSA space, and compared to all “pseudo document” by doing an inner product.

Dq = Xq’ * U * inv(S)

Or:

q_new = q’ * U * inv(S)

References:

http://www.miislita.com/information-retrieval-tutorial/latent-semantic-indexing-fast-track-tutorial.pdf

Data:

human 1 0 0 1 0 0 0 0 0
interface 1 0 1 0 0 0 0 0 0
computer 1 1 0 0 0 0 0 0 0
user 0 1 1 0 1 0 0 0 0
system 0 1 1 2 0 0 0 0 0
response 0 1 0 0 1 0 0 0 0
time 0 1 0 0 1 0 0 0 0
EPS 0 0 1 1 0 0 0 0 0
survey 0 1 0 0 0 0 0 0 1
trees 0 0 0 0 0 1 1 1 0
graph 0 0 0 0 0 0 1 1 1
minors 0 0 0 0 0 0 0 1 1

Standard
research resources

Word Collocation 词语搭配

A collocation is a habitual word combination. Collocational knowledge is essential for many tasks in
natural language processing.

A collocation is a habitual word combination, such
as “weather a storm”, “ﬁle a lawsuit”, and “the
falling dollar”. Many collocations are idiosyncratic
in the sense that they are unpredictable by syntactic and semantic features. For example, “baggage”
and “luggage” are synonyms. However, only “baggage” can be modiﬁed by “emotional”, “historical”,
or “psychological”.

Standard

machine learning in industry

It’s all about how you encode the knowledge into mathematics modeling, and turn it into an algorithmic system.

Standard

SVM SMO sequential minimal optimization John Platt 1998

target = desired output vector
point = training point matrix
procedure takeStep(i1,i2)
if (i1 == i2) return 0
alph1 = Lagrange multiplier for i1
y1 = target[i1]
E1 = SVM output on point[i1] – y1 (check in error cache)
s = y1*y2
Compute L, H via equations (13) and (14)
if (L == H)
return 0
k11 = kernel(point[i1],point[i1])
k12 = kernel(point[i1],point[i2])
k22 = kernel(point[i2],point[i2])
eta = k11+k22-2*k12
if (eta > 0)
{
a2 = alph2 + y2*(E1-E2)/eta
if (a2 < L) a2 = L
else if (a2 > H) a2 = H
}
else
{
Lobj = objective function at a2=L
Hobj = objective function at a2=H
if (Lobj < Hobj-eps)
a2 = L
else if (Lobj > Hobj+eps)
a2 = H
else
a2 = alph2
}
if (|a2-alph2| < eps*(a2+alph2+eps))
return 0
a1 = alph1+s*(alph2-a2)
Update threshold to reflect change in Lagrange multipliers
Update weight vector to reflect change in a1 & a2, if SVM is linear
Update error cache using new Lagrange multipliers
Store a1 in the alpha array
Store a2 in the alpha array
return 1
endprocedure
procedure examineExample(i2)
y2 = target[i2]
alph2 = Lagrange multiplier for i2
E2 = SVM output on point[i2] – y2 (check in error cache)
r2 = E2*y2
if ((r2 < -tol && alph2 < C) || (r2 > tol && alph2 > 0))
{
if (number of non-zero & non-C alpha > 1)
{
i1 = result of second choice heuristic (section 2.2)
if takeStep(i1,i2)
return 1
}11
loop over all non-zero and non-C alpha, starting at a random point
{
i1 = identity of current alpha
if takeStep(i1,i2)
return 1
}
loop over all possible i1, starting at a random point
{
i1 = loop variable
if (takeStep(i1,i2)
return 1
}
}
return 0
endprocedure
main routine:
numChanged = 0;
examineAll = 1;
while (numChanged > 0 | examineAll)
{
numChanged = 0;
if (examineAll)
loop I over all training examples
numChanged += examineExample(I)
else
loop I over examples where alpha is not 0 & not C
numChanged += examineExample(I)
if (examineAll == 1)
examineAll = 0
else if (numChanged == 0)
examineAll = 1
}

Standard

Feature and learner

Finding a good feature for the data analytic task is far more important than a sophisticated model.
A good feature crafted by expert experience captures the essence of the data, and it needs much less effort to try to model it.

the recent “feature learning” is definitely the way to go.

Standard