Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add standard unit of measure support #202

Open
kszlim opened this issue Jul 13, 2023 · 6 comments
Open

Add standard unit of measure support #202

kszlim opened this issue Jul 13, 2023 · 6 comments

Comments

@kszlim
Copy link

kszlim commented Jul 13, 2023

I don't know if it's possible, but having a standard way to thread through unit of measures would be great.

Ideally you could implement something like pint-pandas but instead as pint-dataframe and it would interop seamlessly with all dataframe libraries.

@MarcoGorelli
Copy link
Contributor

I don't think pint would go into the standard itself - but hopefully the standard would enable someone to write a library-agnostic version of pint-pandas!

@kszlim
Copy link
Author

kszlim commented Jul 14, 2023

Yep, that's what I mean, it'd be good for the dataframe-api to specify a standard mechanism for transmitting unit of measure data (and/or a mechanism for transmitting metadata + a mechanism that determines how that metadata can change across operations on dfs).

@rgommers
Copy link
Member

and/or a mechanism for transmitting metadata + a mechanism that determines how that metadata can change across operations on dfs

It seems to me like this is related to gh-40, which discussed adding a way to incorporate any kind of metadata beyond what was standardized in the interchange protocol.

The transmitting or storing part is fairly clear I think. The second part of you suggestion here is less clear to me @kszlim. That seems to suggest some kind of hook that any dataframe library must call after each method it calls. That could be quite expensive to do I think, and there may be other/simpler alternatives there (if the dataframe object lives in a pint-dataframe type package, I'd expect all the methods and logic to live there too, and wrap a "base dataframe object" somehow).

@kszlim
Copy link
Author

kszlim commented Jul 20, 2023

Hmm, I see. I'm not sure how a pint-dataframe package would work, would it require wrapping every dataframe library manually or do you see a way that it could work agnostically?

I guess it's pretty hard if not impossible to make it work agnostically without defining a huge space of operations on the dataframe api itself (which I think you guys are trying to avoid?).

@rgommers
Copy link
Member

Hmm, I see. I'm not sure how a pint-dataframe package would work, would it require wrapping every dataframe library manually or do you see a way that it could work agnostically?

All "base" dataframe objects have the same API, so I imagine you could store it as a private attribute. Something like:

class PintDataFrame
    def __init__(self, base_dataframe : StandardDataFrame, units_metadata : ?) -> PintDataFrame:
        self._df = base_dataframe

    def sum(*, skip_nulls: bool = True) -> PintDataFrame:
        """Reduction returns a 1-row DataFrame."""
        result = self._df.sum(skip_nulls=skip_nulls)
        # If needed, manipulate units metadata here
        result_metadata = self.units_metadata  # or some transformation
        return PintDataFrame(result, units_metadata=result_metadata)

For all methods that don't actually change the units, I imagine there's a way to handle them in an automated/streamlined fashion. And for the ones that do, the custom logic needs to be written once and is independent of what library the base dataframe comes from.

@kszlim
Copy link
Author

kszlim commented Jul 23, 2023

I see, I guess that this will still require a bunch of custom implementations if there are operations that dont' delegate to "base" dataframe methods, but I suppose that's probably impossible to avoid altogether.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants