event

PhD Defense by Hantian Zhang

Primary tabs

Title

Data-Centric Bias Mitigation in Machine Learning

 

Hantian Zhang

Ph.D. Candidate in Computer Science

School of Computer Science

Georgia Institute of Technology

 

Date/Time: April 24th, 2023, 8:00 AM to 10:00AM Eastern Time (US and Canada)

Location: join with zoom via https://gatech.zoom.us/j/95067902080?pwd=Tkp6cjFGUVhaaERTSzFVY3lQUm1PQT09

Join our Cloud HD Video Meeting

Zoom is the leader in modern enterprise video communications, with an easy, reliable cloud platform for video and audio conferencing, chat, and webinars across mobile, desktop, and room systems. Zoom Rooms is the original software-based conference room solution used around the world in board, conference, huddle, and training rooms, as well as executive offices and classrooms. Founded in 2011, Zoom helps businesses and organizations bring their teams together in a frictionless environment to get more done. Zoom is a publicly traded company headquartered in San Jose, CA.

gatech.zoom.us

 

 

 

Meeting ID: 950 6790 2080

Passcode: 692342  

 

 

Committee:

Dr. Xu Chu(co-advisor), School of Computer Science, Georgia Institute of Technology

Dr. Kexin Rong(co-advisor), School of Computer Science, Georgia Institute of Technology

Dr. Joy Arulraj, School of Computer Science, Georgia Institute of Technology

Dr. Shamkant Navathe, School of Computer Science, Georgia Institute of Technology

Dr. Steven Whang, School of Electrical Engineering, KAIST

 

 

Abstract:

As Machine Learning (ML) becomes increasingly central to decision-making processes in our society, it is crucial to acknowledge the potential of these ML models to inadvertently perpetuate biases, disproportionately impacting certain demographic groups and individuals. For instance, some ML models used in judicial systems have shown biases against African Americans when predicting recidivism rates. Therefore, addressing the inherent biases and ensuring fairness in ML models is imperative. While enhancements in fairness can be implemented by changing the ML models directly, I argue that a more foundational solution lies in correcting the data as biased data is often the root cause of unfairness.

 

In my proposed thesis, I aim to systematically understand and mitigate biases in ML models in the full ML life-cycle, from data preparation (pre-processing), to model training (in-processing) and model validation (post-processing). First, I develop a pioneering system, iFlipper, that optimizes for individual fairness in ML. iFlipper enhances training data during data preparation by adjusting the labels, thus mitigating inconsistencies that arise when similar individuals receive varying outcomes. Subsequently, I introduce a declarative system OmniFair that aims at bolstering group fairness in ML. OmniFair allows users to define specific group fairness constraints and change the weight of each training sample during the training process to achieve given group fairness constraints. Finally, I present a method to discover and explain semantically coherent subsets (slices) of unstructured data where the ML models underperform after the models are trained. To be specific, I introduce a new perspective for quantifying explainability in unstructured data slices by borrowing the concept of separability from machine learning literature. I find that separability, which captures how well a slice can be differentiated from the rest of the dataset, complements the coherence measure that focuses on the commonalities of all instances within a slice. With a good understanding of where the ML models are doing poorly, we can improve the ML models by augmenting the dataset and more examples for that specific slice.

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:04/22/2024
  • Modified By:Tatianna Richardson
  • Modified:04/22/2024

Categories

Keywords

Target Audience