Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency issue with single_cls functionality and dataset class count #13028

Closed
1 task done
Le0v1n opened this issue May 20, 2024 · 3 comments
Closed
1 task done
Labels
question Further information is requested

Comments

@Le0v1n
Copy link

Le0v1n commented May 20, 2024

Search before asking

Question

I found a small issue related to single_cls that I'm not quite clear on the purpose of.

In train.py, there is the following statement:

names = {0: "item"} if single_cls and len(data_dict["names"]) != 1 else data_dict["names"]  # class names

This statement can be broken down into:

if single_cls and len(data_dict["names"]) != 1:  # The user has enabled --single_cls, but the dataset configuration file has more than one class
    names = {0: "item"}
else:  # The user has not enabled --single_cls or len(data_dict["names"]) == 1
    names = data_dict["names"]

Here, single_cls indicates that the task has only one class; data_dict["names"] are the names of different classes defined in the dataset configuration file; len(dict) is used to determine the number of keys in a dictionary.

I don't understand why len(data_dict["names"]) != 1 is used. In the current code, names = {0: "item" only happens in one case, which is when --single_cls is enabled and the dataset configuration file has multiple classes. Is this case too rare? Suppose the dataset used is MS COCO, which has 80 classes, then after enabling --single_cls, only one class remains. Will the model still train and inference normally in this case?

Also, I suggest adding a warning to avoid misuse by users:

if single_cls and len(data_dict["names"]) != 1:
    LOGGER.warning("WARNING ⚠️ Please check the dataset to ensure that when --single_cls is enabled, the number of classes in the dataset is 1.")

Additional

No response

@Le0v1n Le0v1n added the question Further information is requested label May 20, 2024
@glenn-jocher
Copy link
Member

@Le0v1n hello! Thanks for bringing this up. 🌟

The single_cls flag in YOLOv5 is indeed used to train the model assuming there is only one class across the entire dataset. The condition len(data_dict["names"]) != 1 checks if the dataset configuration incorrectly specifies more than one class despite the single_cls flag being set. This is to ensure that the model does not mistakenly train on multiple classes when it's supposed to focus on just one.

Your observation about adding a warning is insightful! It would help users realize a potential misconfiguration in their dataset, especially in cases where they might not be aware that their dataset should only contain one class when single_cls is enabled. Implementing a warning like the one you suggested could indeed prevent confusion and ensure proper model training behavior.

Feel free to contribute this as a feature request or even a pull request if you're up for it. Contributions like these help make YOLOv5 even better for everyone! 😊

Thanks for your input and support!

@Le0v1n
Copy link
Author

Le0v1n commented May 21, 2024

@glenn-jocher Thank you very much for your reply, which has resolved my confusion. 🥰

Inspired by you, I am delighted to submit a PR to Ultralytics to help users become aware of potential misconfigurations in the dataset. 😊

@Le0v1n Le0v1n closed this as completed May 21, 2024
@glenn-jocher
Copy link
Member

@Le0v1n That's fantastic to hear! 😊 We're thrilled that your confusion was cleared up and even more excited about your willingness to contribute with a PR. Contributions like yours help improve the experience for everyone using YOLOv5. Looking forward to seeing your PR and thank you for being a proactive member of the community! 🌟

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants