Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
habitat_suitability_modeling_with_small_sample_size [2014/03/17 17:06]
andre [Resources]
habitat_suitability_modeling_with_small_sample_size [2018/05/16 02:19] (current)
hermann [Resources]
Line 1: Line 1:
-====== ​ Evaluating Weights of Evidence method for habitat suitability modeling: a comparison ​to Maximum Entropy ​in case of few presence records ======  +====== ​ Evaluating Weights of Evidence method for habitat suitability modeling: a comparison ​with Maximum Entropy ​for a case of few presence records ======  
-**Andre Carvalho Silveira, ​Daniel Fernandes Mamede Teixeira Lopes and Britaldo Silveira Soares Filho**+**Daniel Fernandes Mamede Teixeira Lopes, Andre Carvalho Silveira ​and Britaldo Silveira Soares Filho**
 \\ \\
 \\ \\
 ==== Abstract ==== ==== Abstract ====
  
-Studies on habitat suitability and species distribution ​aided by spatially explicit modeling are widely applied in ecological fields by enabling exploratory analysis of the relationship between ​species ​and its environmental context, thus predicting ​the likelihood of species occurrence ​(Guisan & Zimmermann, 2000). Therefore, suitability maps can be seen as an operational application of the niche theory, using environmental variables ​to indicate ​high suitable areas for presence ​and absence of a species ​within a quantitative range (Hirzel & Le Lay2008). ​This is a useful tool for ecological ​dynamics investigations,​ even for species ​without pseudo absence records and with only a few presence records ​available ​(Pearson et al.2007). ​This article reports ​the application of two modeling methods ​providing ​suitability maps for //Cotinga maculata// (Cotingidae). The results ​comparison ​reaffirms ​the maximum entropy as an efficient approach to cases with only few records of presence ​(Pearson et. al2007), as well results show that the weights of evidence method presents satisfactory performance ​based on ROC analysis and similarity index.The availability ​of the weights of evidence ​method in the free platform ​Dinamica-EGO (Soares-Filho et al.2013) offers an alternativeregarding ​habitat suitability.+Spatial modeling is widely applied to map habitat suitability and species distribution ​thus enabling exploratory analysis of species’ geographical distribution. Suitability models predict ​the likelihood of species occurrence ​based on the niche theory ​translated into a set of environmental variables ​that indicate ​the suitability ​for presence ​or absence of a species (Guisan & Zimmermann 2000;  ​Hirzel & Le Lay 2008). ​These tools are applied to map the ecological ​distribution of species ​based on a few presence records ​only and without records of absence ​(Pearson et al. 2007). ​Here we report ​the application of two modeling methods ​that produce ​suitability maps for Cotinga maculata (Cotingidae). The comparison ​of results indicate that the Maximum Entropy together with the Weights of Evidence method presents satisfactory performance for cases with only few records of presence, based on the ROC analysis and Similarity Fuzzy comparison. The Weights ​of Evidence ​method ​available ​in Dinamica-EGO (Soares-Filho et al. 2013) performed on par with Maxentthus offering an additional means to map habitat suitability. ​
  
 ==== Keywords ==== ==== Keywords ====
 Habitat suitability,​ Cotinga maculata, Maximum Entropy, Weights of Evidence, Maxent, Dinamica EGO, ROC analysis. Habitat suitability,​ Cotinga maculata, Maximum Entropy, Weights of Evidence, Maxent, Dinamica EGO, ROC analysis.
 ==== Introduction ==== ==== Introduction ====
-Spatially explicit models are computational representations of systems ​with expression in space (Wu & David2002). In terms of environmental requirements ​the ecological niche of determined ​species ​is a natural object proper for spatial modeling ​(Guisan & Zimmermann2000; Hirzel & Le Lay2008). Based on the relationship between different environmental variables and registers ​of species occurrence, it is possible to establish a spatial model for habitat suitability. In theory, such models enable ​a review about the knowledge of the species’ potential spatial ​distribuition ​(Franklin2011), even in areas still not visited. For that it is used projections of habitat suitability provided by modeling (Pearson et al., 2007).+ 
 +Spatially explicit models are computational representations of environmental ​systems (Wu & David 2002), such as the ecological niche of species (Guisan & Zimmermann 2000; Hirzel & Le Lay 2008). Based on the relationship between different environmental variables and records ​of species occurrence, it is possible to establish a spatial model for habitat suitability. In theory, such models enable ​identifying ​species’ potential spatial ​distribution ​(Franklin 2011), even in areas without sampling
 \\ \\
 \\ \\
-//Cotinga maculata// is a species ​from Order Passariformes,​ Family Cotingidaeendemic to narrow ​remnants of the Brazilian Atlantic Forest between south of Bahia state and Rio de Janeiro ​state. The species occurs in lowland rainforest, up to 200 meters ​of altitude, primary vegetation or in advanced ​regeneration ​stage. ​Eventually the species ​can visit little ​forest patches ​looking ​for small fruits ​to compose its staple food. Considered rare by experts this species is difficult to observe due long immobile and quiet staying on trees. The few occurrence ​registers ​available concentrate ​at Conservation Units from the south Bahia state and north of Espírito Santo state (MMA, 2008). In this article it was used 18 registers ​from Conservation International Brazil database.+//Cotinga maculata// is a species ​of the Order of Passariformes,​ Family Cotingidae. It is endemic to small remnants of the Brazilian Atlantic Forest between south of Bahia and Rio de Janeiro ​states. The species occurs in lowland rainforest, up to 200 meters, primary vegetation or in advanced ​regrowth ​stage. ​The species visit small forest patches ​searching ​for fruits ​that compose its staple food. Considered rare by expertsthis species is difficult to observe due to long immobile and quiet perching periods. The few occurrence ​records ​available concentrate ​in conservation units in the south Bahia state and north of Espírito Santo state (MMA, 2008). In this study, we used 18 records ​from Conservation International Brazil database.
  
 ==== Methods ==== ==== Methods ====
-=== Maximum entropy ​modeling ​=== +=== Maximum entropy === 
-The maximum entropy method ​proposes inferences ​from incomplete ​information defining ​a probability distribution that accepts ​all the constraints ​imposed by a given dataset, and also avoids any yielding for a specific constraint, ie, keeping ​the maximum entropy of the data. The method ​application ​assumes that there are features expressed ​by environmental ​variables ​distributed in a dataset (raster grid), the constraints that will drive these variables derive from the crossing ​with species occurrence points (organized in raster grid). ​Thus the entropy ​that can be understood as a measure of “inner amount of choice”, ​being maximized ​stochastically returns in a result that attends ​to the major number of constraints ​possibleIn this sense the method avoids ​taking ​any unknown assumption (Philips et al.2006). The final product is a map that indicates the suitability for species occurrence ​in the area relative to each raster cell.+ 
 +The maximum entropy method ​infers ​from incomplete ​knowledge ​a probability distribution ​function ​that includes ​all the constraints ​of a given dataset. It aims to maintain ​the maximum entropy of the data. The method assumes that constraints ​are obtained ​by overlaying selected spatial ​variables with species occurrence points (organized in raster grid). ​The entropy ​represents ​a measure of “inner amount of choice”, ​and thus it is stochastically ​maximized to encompass ​the larger ​number of constraints. ​As a result, ​the method avoids any unknown assumption (Philips et al. 2006). The final product is a map that indicates the suitability for species occurrence. 
 + 
 +=== Weights of evidence ===
  
-=== Weights of evidence modeling === +The Weights of Evidence ​method ​consists of a Bayesian ​approach that calculates ​the influence of explanatory ​variables on the spatial prediction ​of response variable (Bonham-Carter 1994, Soares-Filho et al. 2004). ​This approach ​employs ​categorical and binary explanatory variables to assess how attractive or repulsive ​these variables ​are to species occurrence (response variable). ​Continuous ​variables ​must be categorized and each variable ​category is evaluated in terms of its association/​disassociation ​to the species occurrence. ​Calculation ​of Weights ​of Evidence is performed using the Dinamica EGO platform.
-The weights of evidence ​method ​applies ​Bayesian ​probability to ponder ​the influence of each explanatory ​variable in respect to behavior ​of the response variable (Bonham-Carter 1994, Soares-Filho et al. 2004). ​The approach ​uses categorical and binary explanatory variables to assess how attractive or repulsive ​they are in relation ​to species occurrence (response variable). ​Thereby if the study includes continuous ​variables, they are categorized and each defined ​category is evaluated in terms of attractiveness (positive weight) and repulsiveness (negative weight) ​to the species occurrence. ​The suitability map produced assimilates how suitable is the environmental context ​of each portion ​of area to the species occurrence. The weights of each variable are explicit and can be manipulated through ​the Dinamica EGO platform.+
 \\ \\
 \\ \\
-The explanatory variables selected initially were: altitude, annual precipitation,​ maximum, minimum and mean annual temperature,​ all obtained ​online ​from WorldClim database (Hijmans et al.2005). ​Every raster relative to these variables were resampled ​in 1000x1000 ​meters ​of spatial resolution. All variables and its intervals were submitted to statistical significance ​testThe variables ​considered in the study were the same used on both platforms: Maxent (for maximum entropy ​method) and Dinamica EGO (for weights ​of evidence ​method).+The explanatory variables selected initially were elevation, annual precipitation,​ maximum, minimumand mean annual temperature,​ all obtained from WorldClim database (Hijmans et al. 2005). ​Raster grids of these variables were resampled ​to 1000×1000 ​meters. All variables and its intervals were evaluated for statistical significance. ​Same variables were used as input for both software: Maxent (for Maximum Entropy ​method) and Dinamica EGO (for Weights ​of Evidence ​method).
  
-=== Suitability maps, congruence ​and divergence ​===+=== Suitability maps, similarity ​and disagreement ​===
  
-The figure ​1 shows the suitability maps obtained ​by both methods. ​It is possible to note the maps convergence over the areas with higher suitability ​to the species in analysis. The coastal area on extreme ​northeast of study area is the main region ​that concentrated ​high values of suitability. However ​it is possible identify traces of each approach in the respective produced maps. A substantial difference between the methods is the fact that the maximum entropy ​treats directly continuous variables. On other hand, the weights ​of evidence ​method categorizes ​all the continuous variables and treats each category as a binary secondary variable. Thus the gradient of the map produced by weights ​of evidence ​presents ​nuances ​that correspond to the categories ​created ​previously. This is the main feature that differentiates both obtained gradients.+Figure ​1 shows the suitability maps obtained ​from both methods. ​Areas with higher suitability ​match on both maps. The coastal area in the northeast of study area is the main region ​with high values of suitability. However, a substantial difference between the methods is the fact that the Maximum Entropy ​treats directly continuous variables, ​whereas ​the Weights ​of Evidence ​method categorizes continuous variables and treats each category as a binary secondary variable. Thusthe map produced by Weights ​of Evidence ​presents ​shades of gray that correspond to the ranges ​created ​in the categorization process.
 \\ \\
 \\ \\
Line 34: Line 37:
 \\ \\
 \\ \\
-One way to explore the concordance ​between different methods ​of building ​a suitability ​surface is generate congruence and divergence maps. Thereby it is possible observe ​spatially ​areas predicted ​suitable ​by both methods, areas predicted ​suitable ​exclusively by one method, and also the concordance by ranges of suitability. Furthermore,​ both maps can also be evaluated by the Dinamica EGO reciprocal similarity ​functor, as ilustrated by figure 2. This functor ​calculates a two-way fuzzy similarity index between ​two maps ([[calc_reciprocal_similarity_map|Calc Reciprocal Similarity Map]]).+Similarity and disagreement maps consist of one way to explore the matching ​between different methods ​for calculating ​a suitability ​map. Thereby it is possible ​to observe areas predicted by both methods, areas predicted exclusively by one method, and agreement of ranges of suitability. Furthermore,​ both maps can also be evaluated by using reciprocal similarity ​metric, as illustrated in figure 2. This method ​calculates a two-way fuzzy similarity index between ​a pair of maps  ([[calc_reciprocal_similarity_map|Calc Reciprocal Similarity Map]]).
 \\ \\
 \\ \\
-{{ :​images:​habitatmod_figure02congruence.png |Figure 02: Congruence ​and divergence ​maps comparing Maximum Entropy and Weights of Evidence methods + similarity index.}} +{{ :​images:​habitatmod_figure02congruence.png |Figure 02: Similarity ​and disagreement ​maps comparing Maximum Entropy and Weights of Evidence methods + similarity index.}} 
-{{ :​images:​habitatmod_figure02divergence.png |Figure 02: Congruence ​and divergence maps comparing Maximum Entropy and Weights of Evidence methods + similarity index.}} +{{ :​images:​habitatmod_figure02divergence.png |Figure 02: Similarity ​and disagreement ​comparing Maximum Entropy and Weights of Evidence methods + similarity index.}} 
-Figure 02: Congruence ​and divergence ​maps comparing Maximum Entropy and Weights of Evidence methods + similarity index.+Figure 02: Similarity ​and disagreement ​maps comparing Maximum Entropy and Weights of Evidence methods + similarity index.
 \\ \\
 \\ \\
 === ROC performance evaluation === === ROC performance evaluation ===
-The Receiver Operating Characteristic (ROC) is a method to evaluate image similarity considering a prefixed ​binary pattern. ROC ponders ​true positive ​rate and false positive ​rate through incremental binary ​classifications ​(Mas et. al, 2013a). ​Despite the method has been applied to many study fields, ​ROC is commonly used in GIS to evaluate predictions ​provided by modeling ​versus observed data. Thus this work uses ROC metrics to evaluate the performance of each method individually,​ as well as to compare predictions between the both methods.+The Receiver Operating Characteristic (ROC) evaluates map similarity considering a reference ​binary pattern. ROC compare the amount of true positive and false positive ​cells through ​an incremental binary ​classification ​(Mas et al. 2013a). ROC is commonly used in GIS to evaluate ​spatial ​predictions versus observed data. In this work, wee used ROC metrics to evaluate the performance of each method individually,​ as well as to compare predictions between the two methods.
 \\ \\
 \\ \\
Line 50: Line 53:
 \\ \\
 \\ \\
-The main ROC metrics used to evaluate the results ​were the area under curve (AUC) and the partial area under curve (pAUC). Figure 03 presents the standard ROC chart contrasting ​true positive ​rate and false positive ​rate. The red diagonal curve represents ​a low-skilled prediction, ie, a hypothetical model that predicts ​how much hits as much false alarms. The suitability maps are interpreted on ROC as predictions to be compared with the fixed diagonal. Each suitability ​map evaluated ​generates a new curve for the same chartAny superposition of the prediction in analysis ​over the fixed diagonal ​is interpreted as performance gain. The final gain offered by the prediction analyzed ​(relative to the suitability map) is summarized by the AUC measure. ​The same reading ​can be applied ​for a restricted range of hit rate or error rate, this partial measure is called pAUC, as illustrated in the figure 03.+The main ROC metrics used to evaluate the results ​are the area under curve (AUC) and the partial area under curve (pAUC). Figure 03 presents the standard ROC graph of true positive and false positive. The red diagonal curve represents a hypothetical model that predicts ​the same number of hits and false alarms. The suitability maps are interpreted on the ROC as prediction curves ​compared with the fixed diagonal. Each suitability ​evaluation ​generates a new curve on the graphCurves ​over the fixed diagonal ​represent models that perform better than random model. The final gain (relative to the suitability map) is summarized by the AUC measure. ​An equivalent metric ​can be applied ​to measure ​hit rate or error rate, this partial measure is called pAUC, as illustrated in the figure 03.
  
 ==== Results and Discussions ==== ==== Results and Discussions ====
-The suitability maps generated by maximum entropy ​and weights ​of evidence ​were compared ​by sampling ​due to allow a feasible analyses in terms of computational effort. The comparison ​process more costly ​took around ​15 hours to be concluded on a computer with 64 GB of memory RAM. There were executed 469 bootstraps, each one generating a curve based in binary classifications incremented by 10% (ie, 10 points to compose the ROC curve). The methods were compared considering all the area under curve (AUC), and also considering partial area under curve (pAUC).+The suitability maps generated by Maximum Entropy ​and Weights ​of Evidence ​were compared ​using sampling. The comparison ​procedure ​took about 15 hours to be concluded on a computer with 64 GB of memory RAM and 32 processorsThe procedure ​executed 469 bootstraps, each one generating a curve based in binary classifications incremented by 10% (i.e., 10 points to compose the ROC curve). The methods were compared considering all the area under curve (AUC), and considering partial area under curve (pAUC).
 \\ \\
 \\ \\
-The maximum entropy ​method ​has reached ​AUC 0.92, while the weights ​of evidence ​method ​has reached ​AUC = 0.81. The comparison between the methods through multiple sampling has generated a p-value ​0,030. The comparison ​restricted ​to high hit indices, ​conform ​suggested by Pearson (2007), resulted in a p-value = 0,045. To the partial area under curve comparison were used 50 bootstraps ​in order of computational ​limitations. The p-value of 0,030 obtained ​by comparison between both methods ​points ​a statistical correlation between both projections. This fact indicates ​that weights ​of evidence ​method ​has enough skill for habitat suitability ​modeling, even in cases of small size samples. ​Being the maximum entropy ​a method considered ​high skilled ​for these cases.+The Maximum Entropy ​method AUC amounted to 0.92, while the Weights ​of Evidence ​method reached 0.81. The comparison between the methods through multiple sampling has generated a p-value ​of 0.030. The comparison ​constrained ​to high hit indices, ​as suggested by Pearson (2007), resulted in a p-value = 0.045. The comparison of partial area under curve used 50 bootstraps ​due to computational ​time required. The p-value of 0.030 obtained ​from comparing the two methods ​indicates ​a statistical correlation between both predictions. This result shows that Weights ​of Evidence ​method ​performs well for modeling ​habitat suitability,​ even in cases of small size samples. ​Such performance is compatible with the one of the Maximum Entropy, ​a method considered ​highly suitable ​for these cases.
 \\ \\
 \\ \\
-{{ :​images:​habitatmod_figure04.png |Figure 04: ROC curve and p-value for AUC comparison between ​maximum entropy ​and weights ​of evidence.}} +{{ :​images:​habitatmod_figure04.png |Figure 04: ROC curve and p-value for AUC comparison between ​Maximum Entropy ​and Weights ​of Evidence.}} 
-Figure 04: ROC curve and p-value for AUC comparison between ​maximum entropy ​and weights ​of evidence.+Figure 04: ROC curve and p-value for AUC comparison between ​Maximum Entropy ​and Weights ​of Evidence.
 \\ \\
 \\ \\
-{{ :​images:​habitatmod_figure05.png |Figure 05: ROC curve and p-value for pAUC comparison between ​maximum entropy ​and weights ​of evidence.}} +{{ :​images:​habitatmod_figure05.png |Figure 05: ROC curve and p-value for pAUC comparison between ​Maximum Entropy ​and Weights ​of Evidence.}} 
-Figure 05: ROC curve and p-value for pAUC comparison between ​maximum entropy ​and weights ​of evidence.+Figure 05: ROC curve and p-value for pAUC comparison between ​Maximum Entropy ​and Weights ​of Evidence.
 \\ \\
 \\ \\
-Besides direct comparison, the suitability maps of both methods were normalized to 0:100 range and then compared ​by ROC. In this case the p-value was 0.117. ​Meanwhile this result was obtained ​by a less exhaustive analysis in reason of computational limitations: ​50 bootstraps and 10% of increment.+In addition, the suitability maps of both methods were normalized to 0:100 range and then compared ​using ROC. In this casethe p-value was 0.117. ​This result was obtained ​using 50 bootstraps and 10% of increment.
 \\ \\
 \\ \\
-Results ​show that performance ​of weights of evidence is enough close to maximum entropy one. Once the maximum entropy is recognized as appropriated approach to model habitat suitability in low sampling cases (Pearson et al., 2007), weights of evidence emerges as an alternative for such studies. Considering the availability of weights of evidence method in the spatially explicit environment Dinamica EGO, it turns up an alternative to the commercial software Maxent, the main framework used to apply maximum entropy methodStill it is important to note that both modeling methods could be improved by more sophisticated configurations ​(heuristic searching, knowledge-driven adjustments, ​etc.). In this work both methods were compared assuming only calibration direct by sampling.+In sum, our results ​show that Weights ​of Evidence performed on par with the Maximum EntropyIt is important to note that both modeling methods could be further ​improved by fine-tuning ​(heuristic searching, knowledge-driven adjustments ​in Maxent), more finer rangesor applying genetic algorithm in the case of Weights of Evidence.
  
 ==== References ==== ==== References ====
Line 88: Line 91:
 ==== Resources ==== ==== Resources ====
  
-The links below are relative to models and available datasets used in this article: 
  
-  * [[http://​www.csr.ufmg.br/​dinamica|Inputs]] +[[http://​www.csr.ufmg.br/​dinamica_utils/download/files/resources.zip|Download models and datasets used in this example]]
-  * [[http://www.csr.ufmg.br/dinamica|Models]] +
-  * [[http://​www.csr.ufmg.br/​dinamica|Outputs]]+