Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
tutorial:heuristic_calibration_of_models_by_using_genetic_algorithm [2013/07/29 14:13]
juliana
tutorial:heuristic_calibration_of_models_by_using_genetic_algorithm [2019/08/23 13:36] (current)
argemiro
Line 1: Line 1:
 +{{ :​logo_logo.png?​400 |}}
 +\\
 +\\
 +\\
 ====== ​ Heuristic calibration of models by using Genetic Algorithm ​ ====== ====== ​ Heuristic calibration of models by using Genetic Algorithm ​ ======
    
Line 8: Line 12:
   * How to get model parameter coefficients from a gene   * How to get model parameter coefficients from a gene
   * How to pass the model fitness to the GA tool   * How to pass the model fitness to the GA tool
-  * Functors:\\ - [[:create_lookup_table_group|Create Lookup Table Group]]\\ - [[:extract_lookup_table_from_lookup_table_group|Extract Lookup Table from Lookup Table Group]]\\ - [[:get_current_individual|Get Current Individual]]\\ - [[:set_fitness|Set Fitness]]\\ ​+  * Functors:\\ - //[[:Genetic Algorithm Tool|Genetic Algorithm (GA) Tool]]//\\ - //[[:Create Lookup Table Group]]//\\ - //[[:Extract Lookup Table from Lookup Table Group]]//\\ - //[[:Get Current Individual]]//\\ - //[[:Set Fitness]]//\\ 
  
  
-The Genetic Algorithm (GA) tool provides a powerful means of calibrating environmental models( Soares Filho et al., 2013). By mimicking the principle of biological evolution (Koza, 1992), GA tool uses massive computing and heuristics to seek for a global optimum solution for a set of model parameters. Dinamica EGO´s GA tool consists of a container, which requires the placement of a sequence of functors within it and, in particular, of two associated functors: //Get Current Individual//​ and //Set Fitness//. Fig. 1 depicts a model calibration scheme using GA tool. First, one needs to get the coefficients from the model parameters to be calibrated and assemble these coefficients in tables. Thus each model parameter will represent an allele in a table that corresponds to a gene. In turn, these tables are assembled by using //Create Lookup Table// Group in a group of tables to form a chromosome. This group of tables is an input to GA tool. GA tool spawns a population based on the genotype passed in a group table. Inside GA tool, //Get Current Individual//​ is placed to get the genes from the individuals of a generation. Other functors, such as //Extract Lookup Table from Lookup Table Group//, are sequenced to catch the parameter coefficients and pass them to the model, which is executed once per individual. An evaluation function is coupled to the output of the model and its result is passed to //Set Fitness//, which returns the fitness value to GA tool for the selection process. The internal sequence of functors will iterate a number of times equal to the number of individuals multiplied by generations,​ as specified in GA tool´s input ports. When GA tool terminates, it will output the fitness of the overall best individual as well as the group of tables that comprises its genes. ​+The //​[[:​Genetic Algorithm Tool|Genetic Algorithm (GA) Tool]]// ​provides a powerful means of calibrating environmental models[[http://​dx.doi.org/​10.1016/​j.envsoft.2013.01.010|( Soares Filho et al., 2013)]]. By mimicking the principle of biological evolution (Koza, 1992), GA tool uses massive computing and heuristics to seek for a global optimum solution for a set of model parameters. Dinamica EGO´s GA tool consists of a container, which requires the placement of a sequence of functors within it and, in particular, of two associated functors: //[[:Get Current Individual]]// and //[[:Set Fitness]]//. Fig. 1 depicts a model calibration scheme using GA tool. First, one needs to get the coefficients from the model parameters to be calibrated and assemble these coefficients in tables. Thus each model parameter will represent an allele in a table that corresponds to a gene.  
 +\\ 
 +\\ 
 +In turn, these tables are assembled by using //[[:Create Lookup Table]]// Group in a group of tables to form a chromosome. This group of tables is an input to GA tool. GA tool spawns a population based on the genotype passed in a group table. Inside GA tool, //[[:Get Current Individual]]// is placed to get the genes from the individuals of a generation. Other functors, such as //[[:Extract Lookup Table from Lookup Table Group]]//, are sequenced to catch the parameter coefficients and pass them to the model, which is executed once per individual. An evaluation function is coupled to the output of the model and its result is passed to //[[:Set Fitness]]//, which returns the fitness value to GA tool for the selection process. The internal sequence of functors will iterate a number of times equal to the number of individuals multiplied by generations,​ as specified in GA tool´s input ports. When GA tool terminates, it will output the fitness of the overall best individual as well as the group of tables that comprises its genes. ​
  
    
Line 18: Line 25:
 In the example provided, we make available an application of GA tool to calibrating a model of deforestation. The aim of this exercise is to demonstrate the potential of GA tool for calibrating any type of model-as long as there is sufficient computer power to run a model together with its evaluation function numerous times-as well as the overspecialization problem that may arise when using such hard predictors, as artificial neural network and GA itself.  ​ In the example provided, we make available an application of GA tool to calibrating a model of deforestation. The aim of this exercise is to demonstrate the potential of GA tool for calibrating any type of model-as long as there is sufficient computer power to run a model together with its evaluation function numerous times-as well as the overspecialization problem that may arise when using such hard predictors, as artificial neural network and GA itself.  ​
  
-First, load the model Cal_Reciprocal_fitness1x1.ego” from \ Examples\ Genetic_Algorithm\WEofE. This model calculates the reciprocal fitness of a deforestation model that was calibrated using the Weights of Evidence method – a soft predictor. Run the model and access its output ​Reciprocal_fitness1x1.csv” using either a spreadsheet or enabling table viewer on the output port of Set Key 1. The concept of this validation measure is provided in sixth  and seventh chapters of lesson [[tutorial:​building_a_land-use_and_land-cover_change_simulation_model|Building a Land use and Land-cover Change Simulation Model]]. The fitness obtained for this model is 0.2060. Open the Weights of Evidence tables in originals\tables” and their corresponding maps in originals\maps” under the folder ​Genetic_Algorithm. In this model, //Calculate Map// replaces //Calc W. E. Probability Map//. Open it to see the equation that integrates the weights of evidence to produce the transition probability map. The weights of evidence coefficients are input as separate tables, so they can form a group of table and thereby the gene.+First, load the model ''​Cal_Reciprocal_fitness1x1.ego'' ​from ''​\ Examples\ Genetic_Algorithm\WEofE''​. This model calculates the reciprocal fitness of a deforestation model that was calibrated using the Weights of Evidence method – a soft predictor. Run the model and access its output ​''​Reciprocal_fitness1x1.csv'' ​using either a spreadsheet or enabling table viewer on the output port of Set Key 1. The concept of this validation measure is provided in sixth  and seventh chapters of lesson [[tutorial:​building_a_land-use_and_land-cover_change_simulation_model|Building a Land use and Land-cover Change Simulation Model]]. The fitness obtained for this model is 0.2060. Open the Weights of Evidence tables in ''​originals\tables'' ​and their corresponding maps in ''​originals\maps'' ​under the folder ​''​Genetic_Algorithm''​. In this model, //[[:Calculate Map]]// replaces //[[:​calc_w._of_e._probability_map|Calc W. E. Probability Map]]//. Open it to see the equation that integrates the weights of evidence to produce the transition probability map. The weights of evidence coefficients are input as separate tables, so they can form a group of table and thereby the gene.
  
-Now, open GAReciprocal_fitness1x1.ego”  ​from Genetic_Algorithm\GAknn\Reciprocal_fitness1x1. Compare the structure of this model with the diagram from Fig.1. Open GA tool. This container envelops three //Groups// and two functors. //Get Current Individual//​ obtains the gene of an individual pertaining to a generation and passes it to a sequence of functors that extract the lookup tables that compose the gene. A land change simulation model receives those tables as input and its execution results are passed to a fitness function that assesses its spatial performance. In turn, this function returns the fitness measure that is caught and passed to GA tool by //Set Fitness//​. ​+Now, open ''​GAReciprocal_fitness1x1.ego'' ​from ''​Genetic_Algorithm\GAknn\Reciprocal_fitness1x1''​. Compare the structure of this model with the diagram from Fig.1. Open GA tool. This container envelops three //[[:Group]]//and two functors. //[[:Get Current Individual]]// obtains the gene of an individual pertaining to a generation and passes it to a sequence of functors that extract the lookup tables that compose the gene. A land change simulation model receives those tables as input and its execution results are passed to a fitness function that assesses its spatial performance. In turn, this function returns the fitness measure that is caught and passed to GA tool by //[[:Set Fitness]]//. 
  
 {{ :​tutorial:​ga_2.jpg |}} {{ :​tutorial:​ga_2.jpg |}}
Line 28: Line 35:
 {{ :​tutorial:​ga_3.jpg |}} {{ :​tutorial:​ga_3.jpg |}}
  
-Run this model. It may take a while depending on the capabilities of the computer system used. Remember that, as a hard predictor, **GA tool** uses massive computing, which in principle would demand a high-performance computer system. Nevertheless,​ Dinamica EGO 1.is designed to take advantage of computer power, such as dual or more processors, extended virtual memory, and pre-compilations of algebraic and logical equations, making it feasible to run the current GA tool model even on a laptop computer. ​ Follow the log report and open the file “Reciprocal_fitness_of the_overall_best_individual.csv”. The overall best individual attained a fitness of 0.3439, and hence an increase of 66% with respect to the Weights of Evidence method. Now, let´s validate this model.+Run this model. It may take a while depending on the capabilities of the computer system used. Remember that, as a hard predictor, **GA tool** uses massive computing, which in principle would demand a high-performance computer system. Nevertheless,​ Dinamica EGO 2.is designed to take advantage of computer power, such as dual or more processors, extended virtual memory, and pre-compilations of algebraic and logical equations, making it feasible to run the current GA tool model even on a laptop computer. ​ Follow the log report and open the file “Reciprocal_fitness_of the_overall_best_individual.csv”. The overall best individual attained a fitness of 0.3439, and hence an increase of 66% with respect to the Weights of Evidence method. Now, let´s validate this model.
  
-<​note>​**Note**:​ Model fitness can be increased by lowering Prune Factor in Patcher. This diminishes model stochasticity,​ but often results in less realism.</​note> ​  +<​note>​**Note**:​ Model fitness can be increased by lowering ​**Prune Factor** in [[:Patcher]]. This diminishes model stochasticity,​ but often results in less realism.</​note> ​  
  
-Open validation_fitness1x1.ego” from Genetic_Algorithm\GAknn\Reciprocal_fitness1x1” and run it. What is the fitness now? Surprised? What happened? ​+Open ''​validation_fitness1x1.ego'' ​from ''​Genetic_Algorithm\GAknn\Reciprocal_fitness1x1'' ​and run it. What is the fitness now? Surprised? What happened? ​
  
 The answer is simple. GA tool attained a very high fitness in the calibration process through overspecialization. That is, it adapted so well to this specific situation given by the map of changes from 1997 to 2000, that the probability map obtained with GA tool results became useless when applied to model validation, which is performed by using deforestation from 2000 to 2003. Thus, although GA tool can achieve high scores in the calibration process, it will become overspecialized if the formation of genes in the reproduction process (mostly through mutation and crossover) is not constrained within a certain range of variation from the characteristics of the primeval individual. The answer is simple. GA tool attained a very high fitness in the calibration process through overspecialization. That is, it adapted so well to this specific situation given by the map of changes from 1997 to 2000, that the probability map obtained with GA tool results became useless when applied to model validation, which is performed by using deforestation from 2000 to 2003. Thus, although GA tool can achieve high scores in the calibration process, it will become overspecialized if the formation of genes in the reproduction process (mostly through mutation and crossover) is not constrained within a certain range of variation from the characteristics of the primeval individual.
Line 38: Line 45:
 Dinamica EGO´s GA tool provides a means to overcome overspecialization by enabling the user to specify an envelope of maximum variation to the new genes that will be formed during the reproduction process. So, instead of default values, this envelop is delimited by two group tables, consisting, respectively,​ of the upper and lower gene bounds. Dinamica EGO´s GA tool provides a means to overcome overspecialization by enabling the user to specify an envelope of maximum variation to the new genes that will be formed during the reproduction process. So, instead of default values, this envelop is delimited by two group tables, consisting, respectively,​ of the upper and lower gene bounds.
  
-Open model GAReciprocal_fitness1x1.ego” from Genetic_Algorithm\GALimitedRanges120\Reciprocal_fitness1x1” and examine //​Group ​LowerBoundEnvelope//. Open it. Each //Calculate Lookup Table// establishes a lower bound for a weights of evidence table by using the following equation:+Open model ''​GAReciprocal_fitness1x1.ego'' ​from ''​Genetic_Algorithm\GALimitedRanges120\Reciprocal_fitness1x1'' ​and examine //[[:Group]]// "Lower Bound Envelope"​. Open it. Each //[[:Calculate Lookup Table]]// establishes a lower bound for a weights of evidence table by using the following equation:
  
 **t1[line] - t1[line] * 1.2** **t1[line] - t1[line] * 1.2**
  
-Likewise, the G//roup UpperBoundEnvelope// will set an upper bound as follows:+Likewise, the //[[:Group]]// "Upper Bound Envelope" ​will set an upper bound as follows:
  
 **t1[line] + t1[line] * 1.2** **t1[line] + t1[line] * 1.2**
  
-The use of these tables as gene bounds will constrain the new genes to an envelope of ±1.2 times the values of the original weights of evidence coefficients,​ thus providing a trend around which the global optimum solution must be found. Although this constraint will result in lower calibration scores (Run validation_fitness1x1.ego” located in GALimitedRanges120%\Reciprocal_fitness1x1), it will tame the //GA tool// engine, allowing it to improve the Weights of Evidence result for application to a general prediction process (Figs. 2 and 3).    ​+The use of these tables as gene bounds will constrain the new genes to an envelope of ±1.2 times the values of the original weights of evidence coefficients,​ thus providing a trend around which the global optimum solution must be found. Although this constraint will result in lower calibration scores (Run ''​validation_fitness1x1.ego'' ​located in ''​GALimitedRanges120%\Reciprocal_fitness1x1''​), it will tame the //GA tool// engine, allowing it to improve the Weights of Evidence result for application to a general prediction process (Figs. 2 and 3).    ​
  
 In conclusion, hard predictors like GA tool must incorporate prior knowledge in order to overcome overspecialization. When this is taken into consideration,​ GA tool can really push the envelope of model optimization. ​ Use your expertise to develop other GA tool calibration processes following the scheme of Fig. 1 and examples provided. In conclusion, hard predictors like GA tool must incorporate prior knowledge in order to overcome overspecialization. When this is taken into consideration,​ GA tool can really push the envelope of model optimization. ​ Use your expertise to develop other GA tool calibration processes following the scheme of Fig. 1 and examples provided.
Line 54: Line 61:
 [{{ :​tutorial:​ga_ 4.jpg |Fig. 2 –GA tool performance with respect to Weights of Evidence method (baseline). GA Limited ranges runs use an envelope of variation around the characteristics of the primeval individual.}}] [{{ :​tutorial:​ga_ 4.jpg |Fig. 2 –GA tool performance with respect to Weights of Evidence method (baseline). GA Limited ranges runs use an envelope of variation around the characteristics of the primeval individual.}}]
    
-[{{ :​tutorial:​ga_ 5.1.jpg |Fig.3- ​ a) Transition probability map from Weights of Evidence (blue: low probability,​ red: high probability),​ b) Simulated changes (red) by using map from a) over historical changes from 1997 to 2000 (black). c) Transition probability map using weights output from Genetic Algorithm ​tool limited within a 120% envelope. d) Simulated changes (red) using map from b) over historical changes from 1997 to 2000 (black). e) Weights of evidence of deforestation as a function of distance to previously deforested areas (WofE- Weights of Evidence method, GAknn - GA sole method, others: GA with values limited, respectively,​ within 80% and 120% envelope of original weights of evidence values, ​ f ) Simulated changes (red) using map from c) over historical changes from 2000 to 2003 (black).}}]+[{{ :​tutorial:​ga_ 5.1.jpg |Fig.3- ​ a) Transition probability map from Weights of Evidence (blue: low probability,​ red: high probability),​ b) Simulated changes (red) by using map from a) over historical changes from 1997 to 2000 (black). c) Transition probability map using weights output from //[[:Genetic Algorithm ​Tool]]// ​limited within a 120% envelope. d) Simulated changes (red) using map from b) over historical changes from 1997 to 2000 (black). e) Weights of evidence of deforestation as a function of distance to previously deforested areas (WofE- Weights of Evidence method, GAknn - GA sole method, others: GA with values limited, respectively,​ within 80% and 120% envelope of original weights of evidence values, ​ f ) Simulated changes (red) using map from c) over historical changes from 2000 to 2003 (black).}}]
  
 [[tutorial:​dinamica_ego_script_language_and_console_launcher|Next Session]] [[tutorial:​dinamica_ego_script_language_and_console_launcher|Next Session]]