Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scan: Overlapping slices should be grouped in the scan generated issues #1845

Open
kevinmessiaen opened this issue Mar 14, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@kevinmessiaen
Copy link
Member

kevinmessiaen commented Mar 14, 2024

馃殌 Feature Request

If a slice is completely contained into another slice, we should just report the biggest one.

馃攬 Motivation

It will makes the scan report more concise and avoid duplication. Furthermore it takes time and memory to check for those sub slices and it doesn't really provide any value.

@kevinmessiaen kevinmessiaen added enhancement New feature or request good first issue Good for newcomers labels Mar 14, 2024
@abhibongale
Copy link

Hi @kevinmessiaen ,

I noticed this issue and would like to contribute to it. Is it still open and relevant? If so, I would appreciate any guidance or additional information that could help me get started.

Thank you!

@kevinmessiaen
Copy link
Member Author

Hello @abhibongale

Yes the issue is still relevant, we would appreciate your contribution on this one!

Basically in the Scanner (giskard.scanner.scanner.py) we run a bunch of evaluators depending of the model type.

For the regression and classification models, the detectors will be using the SliceFinder (giskard.slicing.slice_finder.py) to generate some slices that will then be tested. Some of those slices might be overlapping (ei. We can have a slice for the car sub-category that is inside the slice for the transportation category). This is fine since the dataset might have issue for only one of those categories.

However we can have some cases where the whole transportation category contains an issue (meaning that the car and other sub-categories would also contains this issue). That's why we want to filter the sub-slices from the scan report in order to improve it.

I think you can start by having a look at the PerformanceDetector (giskard.scanner.performance.performance_bias_detector.py)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Development

No branches or pull requests

2 participants