Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inappropriate dataset structure for the TextClassification model #7

Open
AbSsEnT opened this issue Sep 21, 2023 · 1 comment
Open

Comments

@AbSsEnT
Copy link
Member

AbSsEnT commented Sep 21, 2023

I tested OpenAssistant/reward-model-deberta-v3-large-v2 model. Despite the mode having TextClassification type, related datasets do not have the structure of the 'classification' dataset. Thus, during feature mapping (_get_feature_mapping method) stage next errors are happened, depending on the dataset:

openai/summarize_from_feedback, Dahoas/instruct-synthetic-prompt-responses

/Users/mykytaalekseiev/Work/GiskardPipVersion/venv/bin/python /Users/mykytaalekseiev/Work/cicd/cli.py --loader huggingface --model OpenAssistant/reward-model-deberta-v3-large-v2 --dataset openai/summarize_from_feedback --dataset_split train --dataset_config comparisons --output ${model_name}__default_scan_with__${dataset_name}.html 
Traceback (most recent call last):
  File "/Users/mykytaalekseiev/Work/cicd/cli.py", line 43, in <module>
    report = runner.run(**runner_kwargs)
  File "/Users/mykytaalekseiev/Work/cicd/giskard_cicd/pipeline/runner.py", line 35, in run
    gsk_model, gsk_dataset = loader.load_giskard_model_dataset(**kwargs)
  File "/Users/mykytaalekseiev/Work/cicd/giskard_cicd/loaders/huggingface_loader.py", line 53, in load_giskard_model_dataset
    feature_mapping = self._get_feature_mapping(hf_model, hf_dataset)
  File "/Users/mykytaalekseiev/Work/cicd/giskard_cicd/loaders/huggingface_loader.py", line 128, in _get_feature_mapping
    raise RuntimeError(msg)
RuntimeError: Could not find a suitable mapping for feature for `label`.

openai/webgpt_comparisons

Traceback (most recent call last):
  File "/Users/mykytaalekseiev/Work/cicd/cli.py", line 43, in <module>
    report = runner.run(**runner_kwargs)
  File "/Users/mykytaalekseiev/Work/cicd/giskard_cicd/pipeline/runner.py", line 35, in run
    gsk_model, gsk_dataset = loader.load_giskard_model_dataset(**kwargs)
  File "/Users/mykytaalekseiev/Work/cicd/giskard_cicd/loaders/huggingface_loader.py", line 53, in load_giskard_model_dataset
    feature_mapping = self._get_feature_mapping(hf_model, hf_dataset)
  File "/Users/mykytaalekseiev/Work/cicd/giskard_cicd/loaders/huggingface_loader.py", line 123, in _get_feature_mapping
    candidates = [f for f in available_features if dataset_features[f].dtype == expected_type]
  File "/Users/mykytaalekseiev/Work/cicd/giskard_cicd/loaders/huggingface_loader.py", line 123, in <listcomp>
    candidates = [f for f in available_features if dataset_features[f].dtype == expected_type]
AttributeError: 'dict' object has no attribute 'dtype'

Anthropic/hh-rlhf

Traceback (most recent call last):
  File "/Users/mykytaalekseiev/Work/cicd/cli.py", line 43, in <module>
    report = runner.run(**runner_kwargs)
  File "/Users/mykytaalekseiev/Work/cicd/giskard_cicd/pipeline/runner.py", line 35, in run
    gsk_model, gsk_dataset = loader.load_giskard_model_dataset(**kwargs)
  File "/Users/mykytaalekseiev/Work/cicd/giskard_cicd/loaders/huggingface_loader.py", line 53, in load_giskard_model_dataset
    feature_mapping = self._get_feature_mapping(hf_model, hf_dataset)
  File "/Users/mykytaalekseiev/Work/cicd/giskard_cicd/loaders/huggingface_loader.py", line 128, in _get_feature_mapping
    raise RuntimeError(msg)
RuntimeError: Could not find a suitable mapping for feature for `text`.
@Inokinoki
Copy link
Member

Yes, it's indeed more like a regression model. We are currently not supporting them. To be determined

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants