Efficiency of surrogates for higher dim #106

agrayver · 2015-01-25T12:09:04Z

I took examples/surrogates/sample-code-surrogate-rsvm.cc and set dim=30, alg=acmaes, fixed seed=12345. Then, running with set_exploit(true) and set_exploit(false) one sees (figure attached) that exploitation not only does not help, but essentially breaks convergence. For low dimensions, e.g. dim=5-15, it works well however. Any idea how to make it efficient for higher dim?

beniz · 2015-01-25T12:55:11Z

Are you running

./sample_code_surrogate_rsvm -dim 30 -alg acmaes

? (which implicitly runs on fsphere btw).

First try a higher initial sigma, such as:

./sample_code_surrogate_rsvm -dim 30 -alg acmaes -sigma0 1

which does the trick for me here, and beats the algorithm without surrogates and with same initial sigma.

I guess you have looked at it, but just in case, see https://github.com/beniz/libcmaes/wiki/Using-Surrogates-for-expensive-objective-functions as there's a dedicated Python script that in addition to regular useful values plots both the train and test error of the surrogate.

Also, look at additional options with

./sample_code_surrogate_rsvm --help

A potentially useful flag is -l that specifies the number of sample points the surrogate trains from. As dimension increases, you may expect a higher (exponential ?) requirement in sample points.

Finally, to my knowledge, the science of surrogates with CMA-ES is still pretty young. The most informed person to my knowledge is @loshchil. He has studied related techniques that are not available in the lib yet, such as #76.

beniz · 2015-01-25T12:55:30Z

@nikohansen there's the obvious question here of TPA + surrogates :)

nikohansen · 2015-01-25T12:57:51Z

What exactly does set_exploit(true) imply? Generally, I would consider failure of convergence on the sphere to be a bug (if not in the implementation, then in the algorithm), no matter what.

beniz · 2015-01-25T13:10:41Z

set_exploit activates the exploitation of the surrogate, that is replaces (some) objective function calls with calls to the surrogate. It could be a bug, as a first step, I'd need to re-dig into the papers in order to compare with published results to check for discrepancy.

beniz · 2015-01-25T14:26:28Z

I would consider failure of convergence on the sphere to be a bug

Understood, a good rule for the lib to follow. FTR, below is a run of

./sample_code_surrogate_rsvm -dim 30 -alg acmaes -ftarget 1e-8

The run terminates on 'conditionCov':

nikohansen · 2015-01-25T14:40:53Z

It looks like a problem connected to step-size control though, because we see standard deviations and eigenvalues increase systematically and far above 1. This suggest that the step-size is (far) too small, which should not happen under normal circumstances.

agrayver · 2015-01-26T16:56:25Z

As far as I understand, the problem is in the library, isn't it? If not, is there anything a user can do about controlling step-size?

nikohansen · 2015-01-26T17:02:04Z

Agreed, the user can't really do anything to improve step-size control. What we see is probably a bug in the library, or a (subtle) problem from the coupling of surrogate and CMA.

beniz · 2015-01-26T19:10:54Z

What we see is probably a bug in the library, or a (subtle) problem from the coupling of surrogate and CMA.

I wouldn't rule out a bug indeed. It will take me some time to re-run comparisons to the little witness runs I have from publications. For some reasons, the cluster I used to rely on is now significantly slower than my laptop, and I am trying to work around it.

beniz · 2015-01-27T14:12:07Z

Just an update on this issue: I do confirm this is due to several intertwined bugs. I've put a long fight and they should be fixed now, along with better default settings of a few parameters.

FYI, what I would now consider to have been the main bug, was messing the final ranking of candidates before the update.

I plan to update the original ticket #57 with new reports from runs vs published literature, along with commits.

The code has still a few rough angles, so I will mention it here once it is ready to use.

Here is the fixed run with active-CMA-ES and surrogate in 30-D on fsphere:

agrayver · 2015-01-27T14:54:33Z

Great news! Looking forward to testing new version.

…alues + rank available in candidate object, ref #57, #106

beniz · 2015-01-28T14:14:11Z

Fix has been pushed,

./sample_code_surrogate_rsvm -dim 30 -alg acmaes

now yields the desired behavior.

Thanks for reporting the initial problem, it did indeed lead to very necessary bug fixes and improvements.

beniz · 2015-01-29T10:25:58Z

FYI, not all runs appear to converge equally well, and this is something I am investigating, along other improvements.

agrayver · 2015-01-29T10:37:34Z

Thank for information. I have not tested it yet. Please let me know when you're done, I will update and run it here.

beniz · 2015-01-29T11:31:33Z

Well, in fact there was no problem but an old version of the program in the path on my laptop. On the bright side, this had me check on three different machines and compilers, many runs with 100% success, so a false alarm definitely!

…rogates, ref #57, #106

agrayver · 2015-01-29T19:53:57Z

When I do make install it seems the following files are missing in the $PREFIX:

surrogates/rankingsvm.hpp
surrogates/rsvm_surr_strategy.hpp
opti_err.h

agrayver · 2015-01-29T20:42:13Z

For my application, I still do not see any improvement when using surrogates. I run 40-D problem and my F-targer ~ 3000.

Here are results with exploitation (just used RSVMSurrogateStrategy and set_exploit, all parameters are default):

Why does it show 41-D btw?

Here are results without surrogates:

It converges to the target quite fast.
Any idea?

beniz · 2015-01-29T22:52:10Z

Why does it show 41-D btw?

try using tests/cma_multiplt_surr.py to plot the results. This is required to plot surrogate data output.

beniz · 2015-01-29T23:01:20Z

For my application, I still do not see any improvement when using surrogates. I run 40-D problem and my F-targer ~ 3000.

First, can you confirm that the previous example with fsphere and acmaes does work properly ? (just making sure the proper version of the code is running to get this out of the way).

My understanding is that you are pioneering surrogates with CMA and 'unknown' (i.e. out of benchmarks) functions, that is, in addition to potential bugs.

From there and the graph with the right python script above, we'll be able to move forward.

nikohansen · 2015-01-30T00:07:43Z

try using tests/cma_multiplt_surr.py to plot the results. This is required to plot surrogate data output.

somewhat unfortunate user interface ;-)

For my application, I still do not see any improvement when using surrogates.

I have seen this before. Surrogates are no guaranty for an improvement, but the shown results are not (yet) conclusive. Just to cross-check: are you sure the true function values are shown in the surrogate case?

Commenting the results without surrogate:

It hasn't converged yet, I guess not even close (same with surrogate).
It would be quite helpful to see the difference to the best function value, to see the current changes in function value (as implemented in the original plotting functions). This would reveal that there are still changes and I would expect even more significant improvements after some more adaptation has taken place.
We can nicely observe that a bunch of parameters is correlated, as they move up all alike at around 11000 f-evaluations. This movement corresponds to the emergence of a long axis in the eigenvalues plot. It might be insightful from the application view point to see to which parameters this corresponds. It is quite likely that we see the same in the surrogate plot, where the standard deviations of a few parameters increase in lockstep at the very end.
It might be useful to find out which parameter corresponds to the lowest line in the lower right plot (for this reason variable annotations are quite practical). Rescaling of this parameter accordingly should reduce the time to find a better solution (this is exactly what CMA is doing, but it takes some time to learn).

agrayver · 2015-01-30T07:45:05Z

Thanks for your help. I will do all the tests anew in a more systematic way, but maybe solving this installation problem first is a good idea, because it is potential cause of problems for a user:

#106 (comment)

I had to copy these files manually and could make a mistake.

beniz · 2015-01-30T09:11:40Z

I had to copy these files manually and could make a mistake.

This has been fixed. It is recommended you use

make uninstall

in between two versions.

beniz · 2015-01-30T09:22:17Z

Just to cross-check: are you sure the true function values are shown in the surrogate case?

Certainly not, this is due to not using the dedicated script (https://github.com/beniz/libcmaes/wiki/Using-Surrogates-for-expensive-objective-functions). I will work on making a single script instead, but basically the default output function is different whether surrogate is active or not. As for now, running the correct script on the existing output should generate a proper plot.

beniz · 2015-01-30T09:39:44Z

(for this reason variable annotations are quite practical)

btw, well noted, part of #91, still a lot to do.

nikohansen · 2015-01-30T12:09:17Z

Re annotations: one relatively simple way to get all the goodies from the existing plotting routines would be to convert the output file to a format the existing plotting routines can read (one file for each subfigure). This should be relatively straight forward.

The first two rows look like this (the remaining rows contain just more data):

File outcmaesfit.dat (upper left subfigure):

% # columns="iteration, evaluation, sigma, axis ratio, bestever, best, median, worst objective function value, further objective values of best", seed=608288, Thu Jan 29 12:55:04 2015
1 8 0.969230056957 1.0000400008 58211.48782 5.8211487820003007e+04 1718697.45296 6204457.23137    ...

File outcmaesxrecentbest.dat (upper right subfigure):

% # iter+eval+sigma+0+fitness+xbest, seed=608288, Thu Jan 29 12:55:04 2015
1 8 0.969230056957 0 58211.48782 1.91572226397 1.96813672849 2.85336974742 0.995993538345 -0.136285409354    ...

File outcmaesxmean.dat (upper right subfigure, alternatively):

% # columns="iteration, evaluation, void, void, void, xmean", seed=608288, Thu Jan 29 12:55:04 2015 # scaling_of_variables: 1, typical_x: 0
1 8 0 0.0 nan 1.14545721227 1.49619800601 1.86033379793 1.29455358941 0.0164016479173
...

File outcmaesaxlen.dat (lower left subfigure):

%  columns="iteration, evaluation, sigma, max axis length,  min axis length, all principle axes lengths  (sorted square roots of eigenvalues of C)", seed=608288, Thu Jan 29 12:55:04 2015
1 8 0.969230056957 1.0000400008 1.0 1.0 1.00001000005 1.0000200002 1.00003000045 1.0000400008
...

File outcmaesstddev.dat (lower right subfigure):

% # columns=["iteration, evaluation, sigma, void, void,  stds==sigma*sqrt(diag(C))", seed=608288, Thu Jan 29 12:55:04 2015
 1 8 0.969230056957 0 0 0.942502802172 0.949838194587 0.998376907781 0.938075234269 0.988418591891
...

The first two columns are always iteration and evaluations and the variable length data start from the 5-th column. outcmaes is the default recognized name, but is an argument to the plotting function.

beniz · 2015-01-30T14:13:58Z

@agrayver OK, so last fix does the trick, results now on par with literature. Sorry for the inconvenience, this was a nasty and well hidden one. This should now run both faster and perform better.

beniz · 2015-01-30T14:17:26Z

@nikohansen good idea, let's work this as #110

agrayver · 2015-01-31T10:33:57Z

I cannot compile freshly installed RSVM-related code. From surrcmaes.h these includes do not exist:

#include "surrogates/rankingsvm.hpp"
#include "surrogates/rsvm_surr_strategy.hpp"

changing to

#include "rankingsvm.hpp"
#include "rsvm_surr_strategy.hpp"

works however.

beniz · 2015-01-31T10:42:59Z

@agrayver OK, thanks for the heads up, I know where this comes from, assuming you are using make install.

EDIT: fixed. not sure why the commit does not show up here, 7545a3f

agrayver · 2015-01-31T11:01:44Z

Could you please confirm that what I get below is the correct behaviour?

 ./sample_code_surrogate_rsvm -dim 40 -alg acmaes -fplot output.dat -no_exploit

 ./sample_code_surrogate_rsvm -dim 40 -alg acmaes -fplot output.dat

agrayver · 2015-01-31T11:11:54Z

For 10D Rosenbrock surrogates do not seem to help:

-dim 10 -alg acmaes -fplot output.dat -fname rosenbrock -no_exploit

-dim 10 -alg acmaes -fplot output.dat -fname rosenbrock

beniz · 2015-01-31T11:17:27Z

@agrayver yes, it does. To 1e-8, I get 5565 f-evals without surrogates, and 2815 with surrogate exploitation.
EDIT: on the sphere example

beniz · 2015-01-31T11:19:32Z

@agrayver below are the results for Rosenbrock (that I still need to add to #57), average fevals over 10 runs with std deviation and number of successful runs:

running surrogates on rosenbrock
D       fevals_avg
2       573.667 +/- 357.795 (9)
4       952.667 +/- 360.211 (9)
5       1072.38 +/- 594.202 (8)
8       1384.1 +/- 111.025 (10)
10      1895.67 +/- 221.716 (9)
16      5556.33 +/- 1014.05 (9)
20      8150.4 +/- 681.513 (10)
32      27046 +/- 1550.77 (9)
40      41545 +/- 2852.78 (9)

This uses the tests/surr_test exe and default parameters are not the same as in the sample code.

agrayver · 2015-01-31T11:23:14Z

@beniz would you mind please elaborating on which parameters you think need to be tweaked for the sample code to be more efficient?

beniz · 2015-01-31T11:27:16Z

@agrayver try setting same x0 on runs you want to compare, and for Rosenbrock, you can use -l with value ~70 * sqrt(N), i.e. ~ 210 in 10-D.

EDIT: 70 instead of 40

beniz · 2015-01-31T11:29:03Z

Typically, compare (no exploitation)

./sample_code_surrogate_rsvm -ftarget 1e-8 -dim 10 -alg acmaes -fname rosenbrock -fplot rosen_noex.dat -l 210 -x0 2 -no_exploit

to (exploitation)

./sample_code_surrogate_rsvm -ftarget 1e-8 -dim 10 -alg acmaes -fname rosenbrock -fplot rosen_ex.dat -l 210 -x0 2

agrayver · 2015-02-01T21:13:49Z

Thank you for the help. Using -l ~70 * sqrt(N) seems to work. I see significant reduction (1.5-3x) of function calls for different functions and dimension (still have not tested my real app). At the moment I am testing 100D Rosenbrock and it takes very long time. It's been running for more than 5 hours on my quite modern laptop and it is still far from finishing. I'm hence wondering:

What is the complexity of the RSVM with respect to N?
What exactly takes so much time, is there potential for optimization?

beniz · 2015-02-01T21:23:16Z

Try lowering the number of iterations of the RSVM algo 'rsvm_iter' to something in between 150000 and 1M which is the current very conservative default.

beniz · 2015-02-02T07:59:53Z

What is the complexity of the RSVM with respect to N?

Roughly speaking, O(niter,l-1) for the algorithm, with l~70_sqrt(N), and O(N_l^2) for the kernel computation.

For thorough details, see https://www.lri.fr/~ilya/phd.html page 81 (section 4.1.1)

agrayver · 2015-02-02T23:06:33Z

Thank you, I got a reasonable performance with testing function. Now, switching to my actual problem I still see that surrogates deteriorate convergence. Here is results with no exploitation, 40D:

Here RSVM was used (with -l 500 and rsvm_iter 200000):

I used the same starting guess and seed for both.
Any idea?

beniz · 2015-02-02T23:20:04Z

At first glance I d suggest you try increasing rsvm_iter and see if at least it does improve convergence.

On February 3, 2015 12:06:34 AM GMT+01:00, Alexander Grayver [email protected] wrote:

Thank you, I got a reasonable performance with testing function. Now,
switching to my actual problem I still see that surrogates deteriorate
convergence. Here is results with no exploitation, 40D:

Here RSVM was used (with -l 500 and rsvm_iter 200000):

I used the same starting guess and seed for both.
Any idea?

Reply to this email directly or view it on GitHub:
#106 (comment)

Sent from my Android device with K-9 Mail. Please excuse my brevity.

nikohansen · 2015-02-05T12:54:15Z

It looks like the initial step-size is at least a factor 1000 (in the previous plot a factor 1e5) too small and the mechanism to prevent one eigenvalue to massively increase doesn't work out here. The second point should be checked in the lib.

Addition: it can be useful to use a small initial step-size to check where locally the best solution will be found. In this case I would ideally expect an initial increase by a factor of ten or so.

beniz · 2015-02-05T14:29:10Z

The second point should be checked in the lib.

This is not utterly clear to me, could you give me more details, thanks!

nikohansen · 2015-02-05T15:41:39Z

Sorry, yes, this is the factor hsig which "modulates" the learning rate for the update of the evolution path pc for the covariance matrix update. Looking at the graphs carefully reveals however that the step-size sigma is increasing very quickly only in the beginning, where hsig seems to be effective. That means the reason is not likely to be a mistaken computation of hsig, but a too slow increment of sigma. It doesn't look like ideal strategy behavior though.

beniz · 2015-02-20T11:45:06Z

Closing for no activity after 15 days. Can be reopened as needed.

…alues + rank available in candidate object, ref CMA-ES#57, CMA-ES#106

…rogates, ref CMA-ES#57, CMA-ES#106

…ref CMA-ES#106

beniz self-assigned this Jan 27, 2015

beniz added bug surrogates labels Jan 27, 2015

beniz added a commit that referenced this issue Jan 28, 2015

major bug fixes and improvements in acm surrogates + better default v…

39c1659

…alues + rank available in candidate object, ref #57, #106

beniz pushed a commit that referenced this issue Jan 29, 2015

using two independent Mersenne twisters for selection sampling in sur…

2f94b9c

…rogates, ref #57, #106

beniz pushed a commit that referenced this issue Jan 30, 2015

makefile fix for installation of surrogate files, ref #106

2babd0b

beniz pushed a commit that referenced this issue Jan 30, 2015

fixed the ordering of constraints weights in RankSVM, ref #57, ref #106

60d329f

beniz pushed a commit that referenced this issue Jan 30, 2015

new default value for RankSVM number of iterations, ref #57, #106

76ef629

beniz mentioned this issue Jan 30, 2015

refactor output stream and / or python plotting script, with conversion format for other plotting scripts outside the lib #110

Open

2 tasks

beniz removed the bug label Feb 11, 2015

beniz closed this as completed Feb 20, 2015

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

major bug fixes and improvements in acm surrogates + better default v…

2d4f546

…alues + rank available in candidate object, ref CMA-ES#57, CMA-ES#106

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

using two independent Mersenne twisters for selection sampling in sur…

ea1cda4

…rogates, ref CMA-ES#57, CMA-ES#106

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

makefile fix for installation of surrogate files, ref CMA-ES#106

abc2ae9

andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016

fixed the ordering of constraints weights in RankSVM, ref CMA-ES#57, …

5d4d631

…ref CMA-ES#106

Efficiency of surrogates for higher dim #106

Efficiency of surrogates for higher dim #106

Comments

agrayver commented Jan 25, 2015

beniz commented Jan 25, 2015

beniz commented Jan 25, 2015

nikohansen commented Jan 25, 2015

beniz commented Jan 25, 2015

beniz commented Jan 25, 2015

nikohansen commented Jan 25, 2015

agrayver commented Jan 26, 2015

nikohansen commented Jan 26, 2015

beniz commented Jan 26, 2015

beniz commented Jan 27, 2015

agrayver commented Jan 27, 2015

beniz commented Jan 28, 2015

beniz commented Jan 29, 2015

agrayver commented Jan 29, 2015

beniz commented Jan 29, 2015

agrayver commented Jan 29, 2015

agrayver commented Jan 29, 2015

beniz commented Jan 29, 2015

beniz commented Jan 29, 2015

nikohansen commented Jan 30, 2015

agrayver commented Jan 30, 2015

beniz commented Jan 30, 2015

beniz commented Jan 30, 2015

beniz commented Jan 30, 2015

nikohansen commented Jan 30, 2015

beniz commented Jan 30, 2015

beniz commented Jan 30, 2015

agrayver commented Jan 31, 2015

beniz commented Jan 31, 2015

agrayver commented Jan 31, 2015

agrayver commented Jan 31, 2015

beniz commented Jan 31, 2015

beniz commented Jan 31, 2015

agrayver commented Jan 31, 2015

beniz commented Jan 31, 2015

beniz commented Jan 31, 2015

agrayver commented Feb 1, 2015

beniz commented Feb 1, 2015

beniz commented Feb 2, 2015

agrayver commented Feb 2, 2015

beniz commented Feb 2, 2015

nikohansen commented Feb 5, 2015

beniz commented Feb 5, 2015

nikohansen commented Feb 5, 2015

beniz commented Feb 20, 2015