Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geospatial data down selection function #79

Open
MDKempe opened this issue Mar 14, 2024 · 1 comment
Open

Geospatial data down selection function #79

MDKempe opened this issue Mar 14, 2024 · 1 comment
Assignees

Comments

@MDKempe
Copy link
Collaborator

MDKempe commented Mar 14, 2024

We need to create a geospatial down-selection function that will be comprehensive.

The input would be a list of, GID, lat/lon, altitude and altitude. The output would be the shortened list with just the GID or other identifier as appropriate/selected.

We want to be able to select the number of points to include in the final list.

To get more useful data, we would want to account for topology where preference for data next to or in mountains would be preferentially selected. This could be accomplished by a nearest neighbor search where a weighted number is calculated based on the altitude difference between the nearest neighbors. Then all the points are randomly included with this weighted probability. This method does rely on there being a statistically large enough number of data points.

We would also want to determine the perimeter locations or locations near an ocean or large lake and try to make sure there is a good outline. This could be done through a point search that looks for a direction where there are no data points in a ~150 degree cone for a specified number of miles. The number of miles would be determined by looking at the typical spacing (e.g. 4 km) determined by a few random tests of nearest neighbors, and then just multiplying that distance by say a factor of 10. Then you would, for example, look for points where there is a direction with nothing for 40 km. Then you put all the edge points into a sublist and down select with half the rate of exclusion.

These calculations may take some time, but would create nice lists to make the subsequent calculations much better.

@tobin-ford tobin-ford self-assigned this May 29, 2024
@tobin-ford
Copy link
Collaborator

I have implemented a simple version of this on dev_scenario geospatial. Can select for coastline, mountains, rivers from geospatial metadata dictionary. Nearest neighbor search using sklearn kdtrees (scipy has them too but sklearn is much faster).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants