Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duration/timedelta not supported by dataframe interchange protocol? #329

Open
MarcoGorelli opened this issue Nov 21, 2023 · 2 comments
Open

Comments

@MarcoGorelli
Copy link
Contributor

Looks like timedeltas are currently not supported by the dataframe interchange protocol:

In [1]: pd.api.interchange.from_dataframe(pl.DataFrame({'a': [timedelta(1)]}))
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[1], line 1
----> 1 pd.api.interchange.from_dataframe(pl.DataFrame({'a': [timedelta(1)]}))

File ~/tmp/.venv/lib/python3.10/site-packages/pandas/core/interchange/from_dataframe.py:71, in from_dataframe(df, allow_copy)
     68 if not hasattr(df, "__dataframe__"):
     69     raise ValueError("`df` does not support __dataframe__")
---> 71 return _from_dataframe(
     72     df.__dataframe__(allow_copy=allow_copy), allow_copy=allow_copy
     73 )

File ~/tmp/.venv/lib/python3.10/site-packages/pandas/core/interchange/from_dataframe.py:94, in _from_dataframe(df, allow_copy)
     92 pandas_dfs = []
     93 for chunk in df.get_chunks():
---> 94     pandas_df = protocol_df_chunk_to_pandas(chunk)
     95     pandas_dfs.append(pandas_df)
     97 if not allow_copy and len(pandas_dfs) > 1:

File ~/tmp/.venv/lib/python3.10/site-packages/pandas/core/interchange/from_dataframe.py:150, in protocol_df_chunk_to_pandas(df)
    148     columns[name], buf = string_column_to_ndarray(col)
    149 elif dtype == DtypeKind.DATETIME:
--> 150     columns[name], buf = datetime_column_to_ndarray(col)
    151 else:
    152     raise NotImplementedError(f"Data type {dtype} not handled yet")

File ~/tmp/.venv/lib/python3.10/site-packages/pandas/core/interchange/from_dataframe.py:395, in datetime_column_to_ndarray(col)
    381 # Consider dtype being `uint` to get number of units passed since the 01.01.1970
    383 data = buffer_to_ndarray(
    384     dbuf,
    385     (
   (...)
    392     length=col.size(),
    393 )
--> 395 data = parse_datetime_format_str(format_str, data)  # type: ignore[assignment]
    396 data = set_nulls(data, col, buffers["validity"])
    397 return data, buffers

File ~/tmp/.venv/lib/python3.10/site-packages/pandas/core/interchange/from_dataframe.py:360, in parse_datetime_format_str(format_str, data)
    357         raise NotImplementedError(f"Date unit is not supported: {unit}")
    358     return data
--> 360 raise NotImplementedError(f"DateTime kind is not supported: {format_str}")

NotImplementedError: DateTime kind is not supported: tDu

Should they be?

@kkraus14
Copy link
Collaborator

I'm +1 in supporting them

@WillAyd
Copy link

WillAyd commented Jan 19, 2024

At least between pandas and pyarrow there is some nuance to what these represent. Pandas solely has the timedelta type, but pyarrow has duration (for second and higher precision) and an interval type (for calendar-based shifting).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants