Retrofitting Analysis

To figure out the process of retrofitting[1] objective updating, we do the following math.

Forward Derivation

[psi(Q) = sum_{i=1}^{n}left[ alpha_i||q_i-hat{q_i}||^2 + sumeta||q_i-q_j||^2 ight] \ frac{partial psi(Q)}{partial q_i} = alpha_i(q_i-hat{q_i}) + sumeta(q_i-q_j) = 0 \ (alpha_i+sumeta_{ij})q_i -alpha_ihat{q_i} -sumeta_{ij}q_j = 0 \ q_i = frac{sumeta_{ij}q_j+alpha_ihat{q_i}}{sumeta_{ij}+alpha_i} ]

Backward Derivation

This is how I understood this updating equation.

In the paper[1], it has mentioned "We take the first derivative of (psi) with respect to one qi vector, and by equating it to zero", hence we get follow idea:

[frac{partialpsi(Q)}{partial q_i} = 0 ]

And,

[q_i = frac{sumeta_{ij}q_j+alpha_ihat{q_i}}{sumeta_{ij}+alpha_i} \ alpha_iq_i - alpha_ihat{q_j} + sumeta_{ij}q_i - sumeta q_j = 0 \ alpha_i(q_i-hat{q_j})+ sumeta_{ij}(q_i-q_j) = 0 ]

Apparently,

[frac{partialpsi(Q)}{partial q_i} = alpha_i(q_i-hat{q_j})+ sumeta_{ij}(q_i-q_j) = 0 ]

Reference

Faruqui M, Dodge J, Jauhar S K, et al. Retrofitting Word Vectors to Semantic Lexicons[J]. ACL, 2015.