Example Problems

Jump to bottom

mikefenton edited this page Jun 23, 2017 · 15 revisions

Seven example problems are currently provided:

String-match
Regression
Classification
Pymax
Integer sequence match
Program synthesis
Genetic Improvement of Regex Runtime Performance

A brief description is given below of each problem, along with the command-line arguments necessary to call each problem. The developers of PonyGE2 encourage users to test out the various different operators and options available within PonyGE2 using these example problems in order to gain an appreciation of how they work.

String-match

The grammar specifies words as lists of vowels and consonants along with special characters. The aim is to match a target string.

To use it, specify the following command-line argument:

--parameters string_match.txt

The default string match target is Hello world!, but this can be changed with the --target argument.

Regression

The grammar generates a symbolic function composed of standard mathematical operations and a set of variables. This function is then evaluated using a pre-defined set of inputs, given in the datasets folder. Each problem suite has a unique set of inputs. The aim is to minimise some error between the expected output of the function and the desired output specified in the datasets. This is the default problem for PonyGE.

To try this problem, specify the following command-line argument:

--parameters regression.txt

The default regression problem is Vladislavleva4, but this can be changed with the --grammar_file, --dataset_train and --dataset_test arguments.

Classification

Classification can be considered a special case of symbolic regression but with a different error metric. Like with regression, the grammar generates a symbolic function composed of standard mathematical operations and a set of variables. This function is then evaluated using a pre-defined set of inputs, given in the datasets folder. Each problem suite has a unique set of inputs. The aim is to minimise some classification error between the expected output of the function and the desired output specified in the datasets.

To try this problem, specify the following command-line argument:

--parameters classification.txt

The default classification problem is Banknote, but this can be changed with the --grammar_file, --dataset_train and --dataset_test arguments.

Pymax

One of the strongest aspects of a grammatical mapping approach such as PonyGE2 is the ability to generate executable computer programs in an arbitrary language [O'Neill & Ryan, 2003]. In order to demonstrate this in the simplest way possible, we have included an example Python programming problem.

The Pymax problem is a traditional maximisation problem, where the goal is to produce as large a number as possible. However, instead of encoding the grammar in a symbolic manner and evaluating the result, we have encoded the grammar for the Pymax problem as a basic Python programming example. The phenotypes generated by this grammar are executable python functions, whose outputs represent the fitness value of the individual. Users are encouraged to examine the pymax.bnf grammar and the resultant individual phenotypes to gain an understanding of how grammars can be used to generate such arbitrary programs.

To try this problem, specify the following command-line argument:

--parameters pymax.txt

Integer sequence match

In the sequence-match problem, we're given an integer sequence target, say [0, 5, 0, 5, 0, 5], and we try to synthesize a program (loops, if-statements, etc) which will yield that sequence, one item at a time. There are several components to the provided fitness function, which are weighted by numerical parameters. We can specify the target sequence and weights using parameters on the command line or in a parameters file.

To try this problem, specify the following command-line argument:

--parameters sequence_match.txt

NOTE that the sequence match dependencies are currently outside of Anaconda.

Program synthesis

The General Program Synthesis Benchmark Suite is available in PonyGE2. Grammars and datasets have been provided by HeuristicLab.CFGGP. The individuals produce executable Python code. Note: multicore is currently not supported for this type of problem.

To try this problem, specify the following command-line argument:

--parameters progsys.txt

Genetic Improvement of Regex Runtime Performance

An example of Genetic Improvement for software engineering is given for program improvement of Regular Expressions (regexs). The given examples improve existing regexes by seeding them into the population. The fitness function measures runtime and functionality of regexs.

The fitness function for the regex performance improvement problem has a number of sub-modules and programs which are used for generating test cases and for timing the execution of candidate programs. This is an exmaple of how a fitness function can be extended in PonyGE2 with customised modules and classes to perform complex tasks.

To try this problem, specify the following command-line argument:

python scripts/seed_PonyGE2.py --parameters regex_improvement.txt

Adding New Problems

It has been made as simple as possible to add new problems to PonyGE. To add in a new problem, you will need:

a new grammar file specific to your problem,
a new fitness function (if you don't want to use a previously existing one), and
if you are doing supervised learning then you may also need to add some new datasets.

A template for new fitness functions is provided in fitness.base_ff_classes.ff_template. This template allows for new fitness functions to be easily and quickly created in an easily compatible and extensible manner.

NOTE that it may be beneficial to create a new parameters file for any new problem.

Editing Code to enable new problems

Finally, depending on the problem itself you may need to edit representation.individual.Individual.evaluate to fully integrate the new problem to PonyGE. individual.evaluate is where PonyGE specifies the inputs needed for fitness evaluation.

NOTE that it may not be necessary to edit individual.evaluate if you only pass in the individual to be evaluated, as the call argument for each fitness function only has one input by default.