Debiasing Word Embeddings

We are building word embeddings from our previous usage of words. So word embeddings can reflect gender, ethnicity, age, sexual orientation, and other biases of the text used to train the model. For example, word embeddings can make analogies such as ‘man is to computer programmer as woman is to homemaker’.

Then how do we stop this? How do we debias word embeddings? I’ll talk about debiasing gender bias as an example.

1. Identify bias direction.

One way to identify bias direction is to average differences between opposite definitial words. For example, we average $e_{he}-e_{she}$ , $e_{male}-e_{female}$, ,…

Then the resulting vector is a ‘bias’ axis. In the below image, bias axis is drawn as x-axis and all the remaining dimentions are simplified into y-axis.

debiasing word embeddings

2. Neutralize

For every word that is not definitial, project them to make bias 0. In the image above, we’ve projected ‘doctor’ and ‘babysitter’ to y-axis.

3. Equalize Pairs

Lastly, adjust definitial words so that they are of equal distance from non-bias axis. In the above image, we’ve adjusted ‘grandmother’ and ‘grandfather’ so that they have equal distance from non-bias axis.

You can see that ‘grandfather’ and ‘grandmother’ now have equal distance from ‘babysitter’.

Leave a Comment