Hadas Orgad

I am a PhD candidate at the Technion-Israel Institute of Technology, advised by Yonatan Belinkov.

As a PhD student in the field of Natural Language Processing, I have a strong interest in improving the robustness and interpretability of deep neural models. My goal is to contribute to the development of new methods for evaluating, understanding and improving the robustness of NLP models, and creating models that are both effective and fair. I am particularly focused on using interpretability as a tool for improving our models, as I believe it will also make models more transparent and trustworthy.

I want to hear from you! Feel free to contact me for brainstorming and potential collaborations.

Past

I completed my bachelors and my masters in the Technion. During masters, I was selected for the 2022 EMEA Generation Google Scholarship. Previously, I worked in Microsoft, for the cloud security research, where I worked on data-science problems and on the application of NLP for security products.

email icon orgadhadas at gmail dot com


Publications

ReFACT: Updating Text-to-Image Models by Editing the Text Encoder

Hadas Orgad, Yonatan Belinkov

Preprint

Text-to-image models are trained on extensive amounts of data, leading them to implicitly encode factual knowledge within their parameters. While some facts are useful, others may be incorrect or become outdated (e.g., the current President of the United States). We introduce ReFACT, a method for updating text-to-image models. ReFACT updates the weights of a specific layer in the text encoder, only modifying a tiny portion of the model's parameters, and leaving the rest of the model unaffected.

Arxiv Cite
2023

Editing Implicit Assumptions in Text-to-Image Diffusion Models

Hadas Orgad*, Bahjat Kawar*, Yonatan Belinkov

Preprint

Text-to-image diffusion models often make implicit assumptions about the world when generating images. While some assumptions are useful (e.g., the sky is blue), they can also be outdated, incorrect, or reflective of social biases present in the training data. In this work, we aim to edit a given implicit assumption in a pre-trained diffusion model. Our method is highly efficient, as it modifies a mere 2.2% of the model's parameters in under one second.

Arxiv Cite
2023

Debiasing NLP Models Without Demographic Information

Hadas Orgad, Yonatan Belinkov

ACL 2023

In this work, we propose a debiasing method that operates without any prior knowledge of the demographics in the dataset, detecting biased examples based on an auxiliary model that predicts the main model's success and down-weights them during the training process. Results on racial and gender bias demonstrate that it is possible to mitigate social biases without having to use a costly demographic annotation process.

Arxiv Cite
2022

Choose your lenses: Flaws in gender bias evaluation

Hadas Orgad, Yonatan Belinkov

GeBNLP 2022

Considerable efforts to measure and mitigate gender bias in recent years have led to the introduction of an abundance of tasks, datasets, and metrics used in this vein. In this position paper, we assess the current paradigm of gender bias evaluation and identify several flaws in it.

Arxiv Cite
2022

How Gender Debiasing Affects Internal Model Representations, and Why It Matters

Hadas Orgad, Seraphina Goldfarb-Tarrant, Yonatan Belinkov

NAACL 2022

Common studies of gender bias in NLP focus either on extrinsic bias measured by model performance on a downstream task or on intrinsic bias found in models’ internal representations. However, the relationship between extrinsic and intrinsic bias is relatively unknown. In this work, we illuminate this relationship by measuring both quantities together.

Arxiv Cite
2022