Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding and implementing coordinate generation to geoxarray #12

Open
djhoese opened this issue Jan 16, 2024 · 4 comments
Open

Understanding and implementing coordinate generation to geoxarray #12

djhoese opened this issue Jan 16, 2024 · 4 comments

Comments

@djhoese
Copy link

djhoese commented Jan 16, 2024

I was pointed here by @maxrjones a long time ago as part of the NASA ACCESS19 Pangeo project. The main point of the conversation (over email) was that the coordinate generation done by this project could be something that the geoxarray project could do. I'm looking at implementing this but I was hoping I could get some more information to guide me. I'm trying to keep interfaces similar, if not exactly the same, with rioxarray, but also trying to avoid the rasterio/GDAL dependencies.

Looking at the main coordinate function here:

import imagecodecs.numcodecs
import numpy as np
imagecodecs.numcodecs.register_codecs()
height, width = shape[-2:]
xscale, yscale, zscale = attrs["ModelPixelScale"][:3]
x0, y0, z0 = attrs["ModelTiepoint"][3:6]
out = {}
out["x"] = np.arange(width) * xscale + x0 + xscale / 2
out["y"] = np.arange(height) * -yscale + y0 - yscale / 2
return out

  1. Why do imagecodecs have to be loaded here?
  2. What creates attrs? GDAL's python interfaces don't use the geotiff ModelPixelScale/ModelTiepoint names but refer to numbers like these as the geotransform. While I would definitely have support for the geotransform in geoxarray I'd be fine having support for the ModelPixelScale naming too, I'd just like to know what to reference if there is a standard tool generating this.

I might have more questions later, but any info toward the above would be a big help.

@maxrjones
Copy link
Member

Hi @djhoese, thanks for following up about this! If the coordinate generation doesn't go into geoxarray, I think an appropriate place would be the new xarray backend for kerchunk (xref fsspec/kerchunk#372). That would be better than a standalone accessor like xrefcoord IMO because the coordinates generated from the GeoTiff metadata will only be correct if there's no slicing done first, which motivates generating them while opening the dataset. Let me know if you have thoughts about which is better.

  1. Why do imagecodecs have to be loaded here?

Good question, it doesn't (e.g., #13).

  1. What creates attrs? GDAL's python interfaces don't use the geotiff ModelPixelScale/ModelTiepoint names but refer to numbers like these as the geotransform. While I would definitely have support for the geotransform in geoxarray I'd be fine having support for the ModelPixelScale naming too, I'd just like to know what to reference if there is a standard tool generating this.

Kerchunk puts the GeoTiff metadata into .zattrs, which is loaded by the Zarr backend into attrs. Kerchunk uses Tifffile to extract the GeoTiff metadata, which is done here: https://github.com/cgohlke/tifffile/blob/5ced58caa74936ecc53f58ade94938e251e83138/tifffile/tifffile.py#L9841-L9922

@djhoese
Copy link
Author

djhoese commented Jan 30, 2024

Just curious, is this tifffile loaded tiff -> DataArray in a separate library at all? Does tifffile have an xarray engine to load files with xr.open_dataset(..., engine="tifffile")? Or is this all in kerchunk currently?

@djhoese
Copy link
Author

djhoese commented Jan 30, 2024

Oh and I see when I load a geotiff with tifffile it includes a PCSCitationGeoKey in the .geotiff_metadat. Any idea if this is placed somewhere in the generated DataArray? If so, I could load/create a pyproj CRS object from it (most likely).

@maxrjones
Copy link
Member

maxrjones commented Jan 30, 2024

Just curious, is this tifffile loaded tiff -> DataArray in a separate library at all? Does tifffile have an xarray engine to load files with xr.open_dataset(..., engine="tifffile")? Or is this all in kerchunk currently?

I don't know of a tifffile xarray backend. I use rioxarray.open_rasterio() for loading the geotiffs without kerchunk and the kerchunk backend for loading the reference files.

Oh and I see when I load a geotiff with tifffile it includes a PCSCitationGeoKey in the .geotiff_metadat. Any idea if this is placed somewhere in the generated DataArray? If so, I could load/create a pyproj CRS object from it (most likely).

It would be in .zattrs in the reference file. Here's an example from https://projectpythia.org/kerchunk-cookbook/notebooks/generating_references/GeoTIFF.html:

    ".zattrs": "{\"multiscales\":[{\"datasets\":[{\"path\":\"0\"},{\"path\":\"1\"},{\"path\":\"2\"},{\"path\":\"3\"},{\"path\":\"4\"},{\"path\":\"5\"}],\"metadata\":{},\"name\":\"\",\"version\":\"0.1\"}],\"GDAL_METADATA\":\"<GDALMetadata>\\n<Item name="Observation time" format="YYYYMMDDhhmm">202301010100<\\/Item>\\n<Item name="Quantity" unit="mm">Precipitation accumulation<\\/Item>\\n<Item name="Gain">0.010000<\\/Item>\\n<Item name="Offset">0.000000<\\/Item>\\n<Item name="Nodata">65535<\\/Item>\\n<Item name="Undetect">0<\\/Item>\\n<Item name="Accumulation time" unit="h">24<\\/Item>\\n<Item name="Temporal type">Past<\\/Item>\\n<Item name="Z=AR^B relation parameter A">200.000000<\\/Item>\\n<Item name="Z=AR^B relation parameter B">1.600000<\\/Item>\\n<\\/GDALMetadata>\\n\",\"KeyDirectoryVersion\":1,\"KeyRevision\":1,\"KeyRevisionMinor\":0,\"GTModelTypeGeoKey\":1,\"GTRasterTypeGeoKey\":1,\"GTCitationGeoKey\":\"ETRS89 \\/ TM35FIN(E,N)\",\"GeogCitationGeoKey\":\"ETRS89\",\"GeogAngularUnitsGeoKey\":9102,\"GeogTOWGS84GeoKey\":[0.0,0.0,0.0],\"ProjectedCSTypeGeoKey\":3067,\"ProjLinearUnitsGeoKey\":9001,\"ModelPixelScale\":[1169.2930568410832,1168.8701637541064,0.0],\"ModelTiepoint\":[0.0,0.0,0.0,-118331.36640835612,7907751.537263516,0.0]}",

Note this includes ProjectedCSTypeGeoKey rather than PCSCitationGeoKey - I think it'd be important to support both GeoTIFF key flavors.

When the kerchunk backend used to create a xarray dataset, the metadata gets loaded into .attrs:

storage_options = {
    "remote_protocol": "s3",
    "skip_instance_cache": True,
}  # options passed to fsspec
open_dataset_options = {"chunks": {}}  # opens passed to xarray

ds = xr.open_dataset(
    "references/202301012300.json",
    engine="kerchunk",
    storage_options=storage_options,
    open_dataset_options=open_dataset_options,
)
ds.attrs
{'multiscales': [{'datasets': [{'path': '0'},
    {'path': '1'},
    {'path': '2'},
    {'path': '3'},
    {'path': '4'},
    {'path': '5'}],
   'metadata': {},
   'name': '',
   'version': '0.1'}],
 'GDAL_METADATA': '<GDALMetadata>\n<Item name="Observation time" format="YYYYMMDDhhmm">202301012300</Item>\n<Item name="Quantity" unit="mm">Precipitation accumulation</Item>\n<Item name="Gain">0.010000</Item>\n<Item name="Offset">0.000000</Item>\n<Item name="Nodata">65535</Item>\n<Item name="Undetect">0</Item>\n<Item name="Accumulation time" unit="h">24</Item>\n<Item name="Temporal type">Past</Item>\n<Item name="Z=AR^B relation parameter A">200.000000</Item>\n<Item name="Z=AR^B relation parameter B">1.600000</Item>\n</GDALMetadata>\n',
 'KeyDirectoryVersion': 1,
 'KeyRevision': 1,
 'KeyRevisionMinor': 0,
 'GTModelTypeGeoKey': 1,
 'GTRasterTypeGeoKey': 1,
 'GTCitationGeoKey': 'ETRS89 [/](https://file+.vscode-resource.vscode-cdn.net/) TM35FIN(E,N)',
 'GeogCitationGeoKey': 'ETRS89',
 'GeogAngularUnitsGeoKey': 9102,
 'GeogTOWGS84GeoKey': [0.0, 0.0, 0.0],
 'ProjectedCSTypeGeoKey': 3067,
 'ProjLinearUnitsGeoKey': 9001,
 'ModelPixelScale': [1169.2930568410832, 1168.8701637541064, 0.0],
 'ModelTiepoint': [0.0, 0.0, 0.0, -118331.36640835612, 7907751.537263516, 0.0]}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants