Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running list of TODOs #9

Open
orianac opened this issue Feb 12, 2021 · 0 comments
Open

Running list of TODOs #9

orianac opened this issue Feb 12, 2021 · 0 comments
Labels

Comments

@orianac
Copy link
Member

orianac commented Feb 12, 2021

For MVP for Washington:

  • basic form of training dataset:
    • All glas shots translated to biomass using one allometric equation (Cindy) [done]
    • Look up sampling strategy of GLAS and allometric equation assumptions wrt leaf conditions (Cindy/Ori) [done]
    • Calculate seasonal average for each year from Landsat with spatially continuous map for WA (Ori + Joe) (relatedly, decide on Landsat data structure) (snap to a uniform Hansen 30m grid x annually)
    • Extract Landsat variables to use into a tabular format (all raw bands)
  • set up ML model for training (Cindy)
    • random forest + XGBoost!
    • set up inference function
  • Set up inference inputs
    • extract the same landsat variables into tabular format for all of washington
  • Plotting function from ML model output (altair)(Ori) (lat/lon/time)
    • spatial maps
    • time series
  • Set up validation dataset
    • Find 4 well-respected datasets

To expand to global:

  • Transforming Harris et al spreadsheet into python
    • Mask of column 2 (ecoregion + NLCD) -> allometric equation
    • allometric equation = dictionary of functions
    • height metrics = another dictionary of functions [done]
    • parameter to indicate whether to preprocess (whether input is smooth or raw)

Improvements by April:
GLAS/biomass:

  • apply glas filtering based on Harris et al (Cindy) [done]
  • double check how GLAS elevation should be calculated from GLAH14 data
  • decide whether we should use smoothed or raw wf to make height metric calculations
  • Double check terrain calculations by reading Duncanson et al more closely
  • potentially change the raw extracted glas data into the original variable name
  • interpolate between bins (currently at 15cm intervals)
  • double check that compression ratio does not change during the valid signal part (between sig beg and sig end)
  • Figure out which allometric equations can be used for leaf off conditionsAllometric equations are trained predominantly upon leaf-on conditions, so we should determine whether estimates for leaf-off conditions are valid. This is relevant for our reporting/updating interval- proposal: update bi-annually after the end of the growing season in each hemisphere (September and March(?)).

Landsat

  • Masking clouds (potentially via https://github.com/ubarsc/python-fmask or potentially using *_BQA.TIF files in LANDSAT archive
  • Smoothing LANDSAT images using CCDC
  • Grabbing multiple LANDSAT pixels for each GLAS record? GLAS has 70 m diameter and LANDSAT is 30m so could use 4 LANDSAT? Bounding box of all LANDSAT pixels?

ML model

  • Training different model for each ecoregion
  • Incorporating a climate dataset into the training of the model (Others have used Worldclim, though we could use Terraclim)
  • out of sample validation
@tcchiao tcchiao changed the title Running list of things we're putting aside for MVP but should revisit Running list of TODOs Feb 12, 2021
@jhamman jhamman added the v1 label Sep 15, 2021
@jhamman jhamman added this to To do in data-engineering Sep 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

No branches or pull requests

2 participants