Running list of TODOs #9

orianac · 2021-02-12T00:52:42Z

For MVP for Washington:

basic form of training dataset:
- All glas shots translated to biomass using one allometric equation (Cindy) [done]
- Look up sampling strategy of GLAS and allometric equation assumptions wrt leaf conditions (Cindy/Ori) [done]
- Calculate seasonal average for each year from Landsat with spatially continuous map for WA (Ori + Joe) (relatedly, decide on Landsat data structure) (snap to a uniform Hansen 30m grid x annually)
- Extract Landsat variables to use into a tabular format (all raw bands)
set up ML model for training (Cindy)
- random forest + XGBoost!
- set up inference function
Set up inference inputs
- extract the same landsat variables into tabular format for all of washington
Plotting function from ML model output (altair)(Ori) (lat/lon/time)
- spatial maps
- time series
Set up validation dataset
- Find 4 well-respected datasets

To expand to global:

Transforming Harris et al spreadsheet into python
- Mask of column 2 (ecoregion + NLCD) -> allometric equation
- allometric equation = dictionary of functions
- height metrics = another dictionary of functions [done]
- parameter to indicate whether to preprocess (whether input is smooth or raw)

Improvements by April:
GLAS/biomass:

apply glas filtering based on Harris et al (Cindy) [done]
double check how GLAS elevation should be calculated from GLAH14 data
decide whether we should use smoothed or raw wf to make height metric calculations
Double check terrain calculations by reading Duncanson et al more closely
potentially change the raw extracted glas data into the original variable name
interpolate between bins (currently at 15cm intervals)
double check that compression ratio does not change during the valid signal part (between sig beg and sig end)
Figure out which allometric equations can be used for leaf off conditionsAllometric equations are trained predominantly upon leaf-on conditions, so we should determine whether estimates for leaf-off conditions are valid. This is relevant for our reporting/updating interval- proposal: update bi-annually after the end of the growing season in each hemisphere (September and March(?)).

Landsat

Masking clouds (potentially via https://github.com/ubarsc/python-fmask or potentially using *_BQA.TIF files in LANDSAT archive
Smoothing LANDSAT images using CCDC
Grabbing multiple LANDSAT pixels for each GLAS record? GLAS has 70 m diameter and LANDSAT is 30m so could use 4 LANDSAT? Bounding box of all LANDSAT pixels?

ML model

Training different model for each ecoregion
Incorporating a climate dataset into the training of the model (Others have used Worldclim, though we could use Terraclim)
out of sample validation

The text was updated successfully, but these errors were encountered:

tcchiao changed the title ~~Running list of things we're putting aside for MVP but should revisit~~ Running list of TODOs Feb 12, 2021

jhamman added the v1 label Sep 15, 2021

jhamman added this to To do in data-engineering Sep 28, 2021

Provide feedback