How Predictions Work
How predictions work
Section titled “How predictions work”Canopi predicts the probability that a tree planted at a specific location, of a specific species, using a specific planting method, will be alive at 1 year, 3 years, and 5 years after planting.
What a survival probability means
Section titled “What a survival probability means”When Canopi returns survival_probability: 0.82 at a 5-year horizon, it means: based on historical patterns of tree survival across thousands of sites with similar soil, climate, terrain, and planting conditions, approximately 82% of trees are expected to survive to year five at this location.
This is a statistical probability, not a guarantee. Forests are complex systems influenced by weather extremes, pest outbreaks, fire, and human activity that no model can predict with certainty. A probability is honest about that uncertainty.
How the prediction is generated
Section titled “How the prediction is generated”-
You send coordinates, a species, and a planting method. These define the prediction question: “What happens if I plant this species here, this way?”
-
Canopi finds the nearest data point. Our prediction network covers approximately 18,000 sites across Oregon and Washington. The API locates the closest site with pre-computed predictions and reports the match distance so you know how close the data is to your actual location.
-
Pre-computed predictions are returned. Canopi’s XGBoost model has already evaluated every site-species-method-horizon combination. The predictions account for 17 environmental features including soil composition, climate patterns, terrain, and the planting method itself.
-
Risk factors are extracted. Each prediction includes the top environmental factors influencing survival at that site, derived from SHAP (SHapley Additive exPlanations) analysis. These aren’t generic — they’re specific to this site, this species, and this horizon.
The features behind the prediction
Section titled “The features behind the prediction”The model evaluates 17 features for every prediction:
Soil characteristics — Organic matter content, available water capacity, clay percentage, and drainage class. These determine how well a site retains and delivers moisture to roots.
Climate patterns — Precipitation, maximum temperature, mean temperature, dew point temperature, and maximum vapor pressure deficit. These capture the atmospheric conditions that drive water stress — the primary killer of seedlings.
Terrain — Elevation. Combined with climate features, this captures the temperature and moisture gradients that define where species can thrive.
Tree characteristics — Crown ratio, height, and diameter at the baseline measurement. These reflect the condition and vigor of trees at similar sites in the training data.
Planting method — Manual versus drone-seeded. The model learns how method choice interacts with site conditions to affect survival outcomes.
Time horizons
Section titled “Time horizons”Canopi currently predicts across three horizons:
-
1 year — First-year establishment. This is when the highest mortality occurs. Seedlings face transplant shock, drought stress, frost, and competition from established vegetation. First-year survival is the most actionable prediction — it’s the horizon where intervention (irrigation, weed control, replanting) is most feasible.
-
3 years — Mid-term survival. Trees that survive the first year face ongoing climate stress and competition. The 3-year mark is a common milestone for reforestation success assessment.
-
5 years — Establishment threshold. In most reforestation programs and carbon credit methodologies, trees surviving to year five are considered established. This is the horizon most relevant to financial decisions — carbon credit forward contracts, project insurance, and investment underwriting.
Match distance
Section titled “Match distance”Every response includes match_distance_km — the distance between your requested coordinates and the nearest data point Canopi has predictions for. This is a transparency feature.
- Under 2km — Very close match. Predictions are highly site-relevant.
- 2-5km — Good match. Environmental conditions are likely similar.
- 5-10km — Moderate match. Terrain and microclimate may differ from the matched site.
- Over 10km — Weak match. Use predictions directionally rather than precisely.