Find Us on Socials


Daniel Garcia, 2018

Einstein is to scientist as Messi is to midfielder.

Paris is to France as Tokyo is to Japan.

Jobs is to Apple as Ballmer is to Microsoft.

These examples of analogies are as highly accurate as many other analogies that artificial intelligence (AI) systems are able to accomplish, thanks to breakthroughs in Natural Language Processing (NLP). However, although these systems are successful in many areas, such as e-mail filters, smart assistants, and predictive text messages, they sometimes fail as they may arise some gender-biased behaviour in text corpora.

So, what do we mean by “bias”?

Bias in machine learning is a bit different from what we generally consider bias is. It is “a phenomenon that occurs when an algorithm produces results that are systemically prejudiced due to erroneous assumptions in the machine learning process.” In other words, the algorithm keeps making the same mistakes over and over again.

However, in NLP, we can consider both types of biases: the bias in everyday language as defined as prejudice for or against one person or groups, generally unfair, and the ML bias, which is already defined. Existing societal biases affect how we speak and what we speak, which creates the written language, which eventually turns into data used to construct ML systems. In the end, people build algorithms with biased data, which results in our own biases being learned and protected.

These biased systems are constituting some dangerous stereotypes when it comes to real-world downstream applications. For example, in some automatic resume filtering systems, it seems to be giving preferences to male applicants, in which cases the only differentiating factor between applicants is their gender. Also, in banks, despite two applicants having almost the same income, expenses, and debt level, the company sets a much higher credit limit to male applicants compared to the opposite gender. In that case, the company could not explain why the algorithm assumed females are less creditworthy than males.

Man is to computer programmer as woman is to …

Naturally, a computer programmer is not a gendered term, so we should complete the sentence with a computer programmer. However, this word embedding tool completes the sentence with “homemaker.” Of course, you can try your analogies yourself with different combinations too.

Another stereotypic example of machine translation is with translators. For instance, while translating gender-neutral languages to English, such as Hungarian sentence “Ő egy orvos. Ő egy nővér,” translates to English as  “He’s a doctor. She is a nurse, ”.

Machine learning systems are doing what they are programmed to do, so they are what they eat. The corpora that ML are fed is, therefore, web-crawled datasets. Thus, the internet and other content contain actual human language, which ML will show the same biases that humans do.

Debiasing The NPL

In “Mitigating Gender Bias in Natural Language Processing: Literature Review,” three approaches are considered in debiasing the ML systems.

  1. Data Augmentation

Data augmentation is a technique used in ML to increase the diversity in datasets. This technique enables limited datasets to give more accurate predictions. Data augmentation in debiasing NLP works as using gender-swapped correspondent and using the name-anonymized version for the original dataset. So, for example, instead of Mary likes her mother Beth, replacing it with A1 likes his father A2. Data augmentation is easy to execute; however, since it doubles the dataset size, the training time will also increase. Also, when irrationally implied, data augmentation can create illogical sentences, such as swapping “she gave birth” to  “he gave birth.”

2. Gender Tagging

Gender tagging is implied by adding a tag to the original dataset. For instance, “I’m excited” would be changed into “MALE I’m excited.” This implication is practical since it can improve the translation quality (i.e., “I’m happy” could translate to French in two ways: “Je suis heureux [M]” and “Je suis heureuse [F].”).  However, gender-tagging could be expensive since it needs metadata (data that provides information about other data), and this requires much more memory which could be costly.

3. Bias Fine-Tuning

It can be hard to create or find a dataset for a particular task; however, there may be some existing unbiased datasets related to that task. Bias fine-tuning enables transferring learnings from an existing unbiased dataset to the new task to ensure it contains much less biased information. Although this technique is somewhat effective for similar datasets, it might be misleading for datasets with significant differences in context.

Detecting and eliminating gender bias in training data for ML algorithms might be a challenge, but it definitely is not an ungovernable problem. Just like people can learn how not to be biased towards any gender, computers can too. It is crucial to accomplish that since gender bias in AI affects many women’s life quality negatively. Also, keep in mind that the vast majority of the current work on debiasing NLP aims only binary. Non-binary genders are mainly disregarded in NLP and should be considered in future implications.