prev

next

of 62

View

36Download

0

Tags:

Embed Size (px)

DESCRIPTION

Estimation Of Distribution Algorithm based on Markov Random Fields. Siddhartha Shakya School Of Computing The Robert Gordon University. Outline. From GAs to EDAs Probabilistic Graphical Models in EDAs Bayesian networks Markov Random Fields - PowerPoint PPT Presentation

Estimation Of Distribution Algorithm based on Markov Random FieldsSiddhartha ShakyaSchool Of ComputingThe Robert Gordon University

Siddhartha Shakya

OutlineFrom GAs to EDAsProbabilistic Graphical Models in EDAsBayesian networksMarkov Random FieldsFitness modelling approach to estimating and sampling MRF in EDAGibbs distribution, energy function and modelling the fitnessEstimating parameters (Fitness modelling approach)Sampling MRF (several different approaches)Conclusion

Siddhartha Shakya

Genetic Algorithms (GAs)Population based optimisation techniqueBased on Darwin's theory of EvolutionA solution is encoded as a set of symbols known as chromosomeA population of solution is generatedGenetic operators are then applied to the population to get next generation that replaces the parent population

Siddhartha Shakya

Simple GA simulation

Siddhartha Shakya

GA to EDA

Siddhartha Shakya

Simple EDA simulation0.50.51.01.00.500111111011010101111

Siddhartha Shakya

Joint Probability Distribution (JPD)Solution as a set of random variables

Joint probability Distribution (JPD)

Exponential to the number of variables, therefore not feasible to calculate in most casesNeeds Simplification!!

Siddhartha Shakya

Factorisation of JPDUnivariate model: No interaction: Simplest model

Bivariate model: Pair-wise interaction

Multivariate Model: interaction of more than two variables

Siddhartha Shakya

Typical estimation and sampling of JPD in EDAs Learn the interaction between variables in the solutionLearn the probabilities associated with interacting variablesThis specifies the JPD: p(x)Sample the JPD (i.e. learned probabilities)

Siddhartha Shakya

Probabilistic Graphical ModelsEfficient tool to represent the factorisation of JPDMarriage between probability theory and Graph theoryConsist of Two componentsStructureParametersTwo types of PGMDirected PGM (Bayesian Networks)Undirected PGM (Markov Random Field)

Siddhartha Shakya

Directed PGM (Bayesian networks)Structure: Directed Acyclic Graph (DAG)

Independence relationship:A variable is conditionally independent of rest of the variables given its parents

Parameters: Conditional probabilities

Siddhartha Shakya

Bayesian networksThe factorisation of JPD encoded in terms of conditional probabilities is

JPD for BN

Siddhartha Shakya

Estimating a Bayesian networkEstimate structure

Estimate parameters

This completely specifies the JPD

JPD can then be Sampled

Siddhartha Shakya

BN based EDAsInitialise parent solutionsSelect a set from parent solutionsEstimate a BN from selected setEstimate structure Estimate parametersSample BN to generate new populationReplace parents with new set and go to 2 until termination criteria satisfies

Siddhartha Shakya

How to estimate and sample BN in EDAsEstimating structureScore + Search techniques Conditional independence test Estimating parametersTrivial in EDAs: Dataset is completeEstimate probabilities of parents before childSamplingProbabilistic Logical Sampling (Sample parents before child)

Siddhartha Shakya

BN based EDAsWell established approach in EDAsBOA, EBNA, LFDA, MIMIC, COMIT, BMDAReferencesLarraiaga and Lozano 2002Pelikan 2002

Siddhartha Shakya

Markov Random Fields (MRF)Structure: Undirected Graph

Local independence: A variable is conditionally independent of rest of the variables given its neighbours

Global Independence: Two sets of variables are conditionally independent to each other if there is a third set that separates them.

Parameters: potential functions defined on the cliques

X1X3X2X4X6X5

Siddhartha Shakya

Markov Random FieldThe factorisation of JPD encoded in terms of potential function over maximal cliques is

JPD for MRF

Siddhartha Shakya

Estimating a Markov Random fieldEstimate structure from data

Estimate parametersRequires potential functions to be numerically defined

This completely specifies the JPD

JPD can then be SampledNo specific order (not a DAG) so a bit problematic

Siddhartha Shakya

MRF in EDAHas recently been proposed as a estimation of distribution technique in EDAShakya et al 2004, 2005Santana et el 2003, 2005

Siddhartha Shakya

MRF based EDAInitialise parent solutionsSelect a set from parent solutionsEstimate a MRF from selected setEstimate structureEstimate parameters Sample MRF to generate new populationReplace parent with new solutions and go to 2 until termination criteria satisfies

Siddhartha Shakya

How to estimate and sample MRF in EDALearning StructureConditional Independence test (MN-EDA, MN-FDA)Linkage detection algorithm (LDFA)Learning ParameterJunction tree approach (FDA)Junction graph approach (MN-FDA)Kikuchi approximation approach (MN-EDA)Fitness modelling approach (DEUM)SamplingProbabilistic Logic Sampling (FDA, MN-FDA)Probability vector approach (DEUMpv)Direct sampling of Gibbs distribution (DEUMd)Metropolis sampler (Is-DEUMm)Gibbs Sampler (Is-DEUMg, MN-EDA)

Siddhartha Shakya

Fitness modelling approachHamersley Clifford theorem: JPD for any MRF follows Gibbs distribution

Energy of Gibbs distribution in terms of potential functions over the cliques

Assuming probability of solution is proportional to its fitness:

From (a) and (b) a Model of fitness function - MRF fitness model (MFM) is derived

Siddhartha Shakya

MRF fitness Model (MFM)Properties:Completely specifies the JPD for MRFNegative relationship between fitness and Energy i.e. Minimising energy = maximise fitnessTask:Need to find the structure for MRFNeed to numerically define clique potential function

Siddhartha Shakya

MRF Fitness Model (MFM)Let us start with simplest model: univariate model this eliminates structure learning :)

For univariate model there will be n singleton clique

For each singleton clique assign a potential function

Corresponding MFM

In terms of Gibbs distribution

Siddhartha Shakya

Estimating MRF parameters using MFMEach chromosome gives us a linear equationApplying it to a set of selected solution gives us a system of linear equationsSolving it will give us the approximation to the MRF parametersKnowing MRF parameters completely specifies JPDNext step is to sample the JPD

Siddhartha Shakya

General DEUM frameworkDistribution Estimation Using MRF algorithm (DEUM)Initialise parent population PSelect set D from P (can use D=P !!)Build a MFM and fit to D to estimate MRF parametersSample MRF to generate new populationReplace P with new population and go to 2 until termination criterion satisfies

Siddhartha Shakya

How to sample MRFProbability vector approachDirect Sampling of Gibbs DistributionMetropolis samplingGibbs sampling

Siddhartha Shakya

Probability vector approach to sample MRFMinimise U(x) to maximise f(x)To minimise U(x) Each ixi should be minimumThis suggests: if i is negative then corresponding xi should be positiveWe could get an optimum chromosome for the current population just by looking on However not always the current population contains enough information to generate optimum We look on sign of each i to update a vector of probability

Siddhartha Shakya

DEUM with probability vector (DEUMpv)

Siddhartha Shakya

Updating RuleUses sign of a MRF parameter to direct search towards favouring value of respective variable that minimises energy U(x)Learning rate controls convergence

Siddhartha Shakya

Simulation of DEUMpv011111010100101010000.40.60.60.60.60.50.50.50.50.5432101111101014321

Siddhartha Shakya

ResultsOneMax Problem

Siddhartha Shakya

ResultsF6 function optimisation

Siddhartha Shakya

ResultsTrap 5 functionDeceptive problemNo solution found

Siddhartha Shakya

Sampling MRFProbability vector approachDirect sampling of Gibbs distributionMetropolis samplingGibbs sampling

Siddhartha Shakya

Direct Sampling of Gibbs distributionIn the probability vector approach, only the sign of MRF parameters has been used

However, one could directly sample from the Gibbs distribution and make use of the values of MRF parameters

Also could use the temperature coefficient to manipulate the probabilities

Siddhartha Shakya

Direct Sampling of Gibbs distribution

Siddhartha Shakya

Direct Sampling of Gibbs distributionThe temperature coefficient has an important roleDecreasing T will cool probability to either 1 or 0 depending upon sign and value of alphaThis forms the basis for the DEUM based on direct sampling of Gibbs distribution (DEUMd)

Siddhartha Shakya

DEUM with direct sampling (DEUMd)1. Generate initial population, P, of size M 2. Select the N fittest solutions, N M3. Calculate MRF parameters 4. Generate M new solutions by sampling univariate distribution

5. Replace P by new population and go to 2 until complete

Siddhartha Shakya

DEUMd simulation011111011101101010104432

Siddhartha Shakya

Experimental resultsOneMax Problem

Siddhartha Shakya

F6 function optimization

Siddhartha Shakya

Plateau Problem (n=180)

Siddhartha Shakya

Checker Board Problem (n=100)

Siddhartha Shakya

Trap function of order 5 (n=60)

Siddhartha Shakya

Ex