Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficiency of surrogates for higher dim #106

Closed
agrayver opened this issue Jan 25, 2015 · 49 comments
Closed

Efficiency of surrogates for higher dim #106

agrayver opened this issue Jan 25, 2015 · 49 comments
Assignees

Comments

@agrayver
Copy link

I took examples/surrogates/sample-code-surrogate-rsvm.cc and set dim=30, alg=acmaes, fixed seed=12345. Then, running with set_exploit(true) and set_exploit(false) one sees (figure attached) that exploitation not only does not help, but essentially breaks convergence. For low dimensions, e.g. dim=5-15, it works well however. Any idea how to make it efficient for higher dim?
surr

@beniz
Copy link
Collaborator

beniz commented Jan 25, 2015

Are you running

./sample_code_surrogate_rsvm -dim 30 -alg acmaes

? (which implicitly runs on fsphere btw).

First try a higher initial sigma, such as:

./sample_code_surrogate_rsvm -dim 30 -alg acmaes -sigma0 1

which does the trick for me here, and beats the algorithm without surrogates and with same initial sigma.

I guess you have looked at it, but just in case, see https://github.com/beniz/libcmaes/wiki/Using-Surrogates-for-expensive-objective-functions as there's a dedicated Python script that in addition to regular useful values plots both the train and test error of the surrogate.

Also, look at additional options with

./sample_code_surrogate_rsvm --help

A potentially useful flag is -l that specifies the number of sample points the surrogate trains from. As dimension increases, you may expect a higher (exponential ?) requirement in sample points.

Finally, to my knowledge, the science of surrogates with CMA-ES is still pretty young. The most informed person to my knowledge is @loshchil. He has studied related techniques that are not available in the lib yet, such as #76.

@beniz
Copy link
Collaborator

beniz commented Jan 25, 2015

@nikohansen there's the obvious question here of TPA + surrogates :)

@nikohansen
Copy link
Collaborator

What exactly does set_exploit(true) imply? Generally, I would consider failure of convergence on the sphere to be a bug (if not in the implementation, then in the algorithm), no matter what.

@beniz
Copy link
Collaborator

beniz commented Jan 25, 2015

set_exploit activates the exploitation of the surrogate, that is replaces (some) objective function calls with calls to the surrogate. It could be a bug, as a first step, I'd need to re-dig into the papers in order to compare with published results to check for discrepancy.

@beniz
Copy link
Collaborator

beniz commented Jan 25, 2015

I would consider failure of convergence on the sphere to be a bug

Understood, a good rule for the lib to follow. FTR, below is a run of

./sample_code_surrogate_rsvm -dim 30 -alg acmaes -ftarget 1e-8

The run terminates on 'conditionCov':

acmaes_surr_30d_fsphere

@nikohansen
Copy link
Collaborator

It looks like a problem connected to step-size control though, because we see standard deviations and eigenvalues increase systematically and far above 1. This suggest that the step-size is (far) too small, which should not happen under normal circumstances.

@agrayver
Copy link
Author

As far as I understand, the problem is in the library, isn't it? If not, is there anything a user can do about controlling step-size?

@nikohansen
Copy link
Collaborator

Agreed, the user can't really do anything to improve step-size control. What we see is probably a bug in the library, or a (subtle) problem from the coupling of surrogate and CMA.

@beniz
Copy link
Collaborator

beniz commented Jan 26, 2015

What we see is probably a bug in the library, or a (subtle) problem from the coupling of surrogate and CMA.

I wouldn't rule out a bug indeed. It will take me some time to re-run comparisons to the little witness runs I have from publications. For some reasons, the cluster I used to rely on is now significantly slower than my laptop, and I am trying to work around it.

@beniz beniz self-assigned this Jan 27, 2015
@beniz
Copy link
Collaborator

beniz commented Jan 27, 2015

Just an update on this issue: I do confirm this is due to several intertwined bugs. I've put a long fight and they should be fixed now, along with better default settings of a few parameters.

FYI, what I would now consider to have been the main bug, was messing the final ranking of candidates before the update.

I plan to update the original ticket #57 with new reports from runs vs published literature, along with commits.

The code has still a few rough angles, so I will mention it here once it is ready to use.

Here is the fixed run with active-CMA-ES and surrogate in 30-D on fsphere:
surr_acmaes_fsphere_30d_fix

@agrayver
Copy link
Author

Great news! Looking forward to testing new version.

beniz added a commit that referenced this issue Jan 28, 2015
…alues + rank available in candidate object, ref #57, #106
@beniz
Copy link
Collaborator

beniz commented Jan 28, 2015

Fix has been pushed,

./sample_code_surrogate_rsvm -dim 30 -alg acmaes

now yields the desired behavior.

Thanks for reporting the initial problem, it did indeed lead to very necessary bug fixes and improvements.

@beniz
Copy link
Collaborator

beniz commented Jan 29, 2015

FYI, not all runs appear to converge equally well, and this is something I am investigating, along other improvements.

@agrayver
Copy link
Author

Thank for information. I have not tested it yet. Please let me know when you're done, I will update and run it here.

@beniz
Copy link
Collaborator

beniz commented Jan 29, 2015

Well, in fact there was no problem but an old version of the program in the path on my laptop. On the bright side, this had me check on three different machines and compilers, many runs with 100% success, so a false alarm definitely!

@agrayver
Copy link
Author

When I do make install it seems the following files are missing in the $PREFIX:

surrogates/rankingsvm.hpp
surrogates/rsvm_surr_strategy.hpp
opti_err.h

@agrayver
Copy link
Author

For my application, I still do not see any improvement when using surrogates. I run 40-D problem and my F-targer ~ 3000.

Here are results with exploitation (just used RSVMSurrogateStrategy and set_exploit, all parameters are default):
exploit

Why does it show 41-D btw?

Here are results without surrogates:
no_exploit

It converges to the target quite fast.
Any idea?

@beniz
Copy link
Collaborator

beniz commented Jan 29, 2015

Why does it show 41-D btw?

try using tests/cma_multiplt_surr.py to plot the results. This is required to plot surrogate data output.

@beniz
Copy link
Collaborator

beniz commented Jan 29, 2015

For my application, I still do not see any improvement when using surrogates. I run 40-D problem and my F-targer ~ 3000.

First, can you confirm that the previous example with fsphere and acmaes does work properly ? (just making sure the proper version of the code is running to get this out of the way).

My understanding is that you are pioneering surrogates with CMA and 'unknown' (i.e. out of benchmarks) functions, that is, in addition to potential bugs.

From there and the graph with the right python script above, we'll be able to move forward.

@nikohansen
Copy link
Collaborator

try using tests/cma_multiplt_surr.py to plot the results. This is required to plot surrogate data output.

somewhat unfortunate user interface ;-)

For my application, I still do not see any improvement when using surrogates.

I have seen this before. Surrogates are no guaranty for an improvement, but the shown results are not (yet) conclusive. Just to cross-check: are you sure the true function values are shown in the surrogate case?

Commenting the results without surrogate:

  • It hasn't converged yet, I guess not even close (same with surrogate).
  • It would be quite helpful to see the difference to the best function value, to see the current changes in function value (as implemented in the original plotting functions). This would reveal that there are still changes and I would expect even more significant improvements after some more adaptation has taken place.
  • We can nicely observe that a bunch of parameters is correlated, as they move up all alike at around 11000 f-evaluations. This movement corresponds to the emergence of a long axis in the eigenvalues plot. It might be insightful from the application view point to see to which parameters this corresponds. It is quite likely that we see the same in the surrogate plot, where the standard deviations of a few parameters increase in lockstep at the very end.
  • It might be useful to find out which parameter corresponds to the lowest line in the lower right plot (for this reason variable annotations are quite practical). Rescaling of this parameter accordingly should reduce the time to find a better solution (this is exactly what CMA is doing, but it takes some time to learn).

@agrayver
Copy link
Author

Thanks for your help. I will do all the tests anew in a more systematic way, but maybe solving this installation problem first is a good idea, because it is potential cause of problems for a user:

#106 (comment)

I had to copy these files manually and could make a mistake.

@beniz
Copy link
Collaborator

beniz commented Jan 30, 2015

I had to copy these files manually and could make a mistake.

This has been fixed. It is recommended you use

make uninstall

in between two versions.

@beniz
Copy link
Collaborator

beniz commented Jan 30, 2015

Just to cross-check: are you sure the true function values are shown in the surrogate case?

Certainly not, this is due to not using the dedicated script (https://github.com/beniz/libcmaes/wiki/Using-Surrogates-for-expensive-objective-functions). I will work on making a single script instead, but basically the default output function is different whether surrogate is active or not. As for now, running the correct script on the existing output should generate a proper plot.

@beniz
Copy link
Collaborator

beniz commented Jan 30, 2015

(for this reason variable annotations are quite practical)

btw, well noted, part of #91, still a lot to do.

@nikohansen
Copy link
Collaborator

Re annotations: one relatively simple way to get all the goodies from the existing plotting routines would be to convert the output file to a format the existing plotting routines can read (one file for each subfigure). This should be relatively straight forward.

The first two rows look like this (the remaining rows contain just more data):

File outcmaesfit.dat (upper left subfigure):

% # columns="iteration, evaluation, sigma, axis ratio, bestever, best, median, worst objective function value, further objective values of best", seed=608288, Thu Jan 29 12:55:04 2015
1 8 0.969230056957 1.0000400008 58211.48782 5.8211487820003007e+04 1718697.45296 6204457.23137    ...

File outcmaesxrecentbest.dat (upper right subfigure):

% # iter+eval+sigma+0+fitness+xbest, seed=608288, Thu Jan 29 12:55:04 2015
1 8 0.969230056957 0 58211.48782 1.91572226397 1.96813672849 2.85336974742 0.995993538345 -0.136285409354    ...

File outcmaesxmean.dat (upper right subfigure, alternatively):

% # columns="iteration, evaluation, void, void, void, xmean", seed=608288, Thu Jan 29 12:55:04 2015 # scaling_of_variables: 1, typical_x: 0
1 8 0 0.0 nan 1.14545721227 1.49619800601 1.86033379793 1.29455358941 0.0164016479173
...

File outcmaesaxlen.dat (lower left subfigure):

%  columns="iteration, evaluation, sigma, max axis length,  min axis length, all principle axes lengths  (sorted square roots of eigenvalues of C)", seed=608288, Thu Jan 29 12:55:04 2015
1 8 0.969230056957 1.0000400008 1.0 1.0 1.00001000005 1.0000200002 1.00003000045 1.0000400008
...

File outcmaesstddev.dat (lower right subfigure):

% # columns=["iteration, evaluation, sigma, void, void,  stds==sigma*sqrt(diag(C))", seed=608288, Thu Jan 29 12:55:04 2015
 1 8 0.969230056957 0 0 0.942502802172 0.949838194587 0.998376907781 0.938075234269 0.988418591891
...

The first two columns are always iteration and evaluations and the variable length data start from the 5-th column. outcmaes is the default recognized name, but is an argument to the plotting function.

@beniz
Copy link
Collaborator

beniz commented Jan 30, 2015

@agrayver OK, so last fix does the trick, results now on par with literature. Sorry for the inconvenience, this was a nasty and well hidden one. This should now run both faster and perform better.

@beniz
Copy link
Collaborator

beniz commented Jan 30, 2015

@nikohansen good idea, let's work this as #110

@agrayver
Copy link
Author

I cannot compile freshly installed RSVM-related code. From surrcmaes.h these includes do not exist:

#include "surrogates/rankingsvm.hpp"
#include "surrogates/rsvm_surr_strategy.hpp"

changing to

#include "rankingsvm.hpp"
#include "rsvm_surr_strategy.hpp"

works however.

@beniz
Copy link
Collaborator

beniz commented Jan 31, 2015

@agrayver OK, thanks for the heads up, I know where this comes from, assuming you are using make install.

EDIT: fixed. not sure why the commit does not show up here, 7545a3f

@agrayver
Copy link
Author

Could you please confirm that what I get below is the correct behaviour?

 ./sample_code_surrogate_rsvm -dim 40 -alg acmaes -fplot output.dat -no_exploit

fsphere_40d_noexp

 ./sample_code_surrogate_rsvm -dim 40 -alg acmaes -fplot output.dat

fshpere_40d

@agrayver
Copy link
Author

For 10D Rosenbrock surrogates do not seem to help:

-dim 10 -alg acmaes -fplot output.dat -fname rosenbrock -no_exploit

rosenbrock_10d_noexp

-dim 10 -alg acmaes -fplot output.dat -fname rosenbrock

rosenbrock_10d

@beniz
Copy link
Collaborator

beniz commented Jan 31, 2015

@agrayver yes, it does. To 1e-8, I get 5565 f-evals without surrogates, and 2815 with surrogate exploitation.
EDIT: on the sphere example

@beniz
Copy link
Collaborator

beniz commented Jan 31, 2015

@agrayver below are the results for Rosenbrock (that I still need to add to #57), average fevals over 10 runs with std deviation and number of successful runs:

running surrogates on rosenbrock
D       fevals_avg
2       573.667 +/- 357.795 (9)
4       952.667 +/- 360.211 (9)
5       1072.38 +/- 594.202 (8)
8       1384.1 +/- 111.025 (10)
10      1895.67 +/- 221.716 (9)
16      5556.33 +/- 1014.05 (9)
20      8150.4 +/- 681.513 (10)
32      27046 +/- 1550.77 (9)
40      41545 +/- 2852.78 (9)

This uses the tests/surr_test exe and default parameters are not the same as in the sample code.

@agrayver
Copy link
Author

@beniz would you mind please elaborating on which parameters you think need to be tweaked for the sample code to be more efficient?

@beniz
Copy link
Collaborator

beniz commented Jan 31, 2015

@agrayver try setting same x0 on runs you want to compare, and for Rosenbrock, you can use -l with value ~70 * sqrt(N), i.e. ~ 210 in 10-D.

EDIT: 70 instead of 40

@beniz
Copy link
Collaborator

beniz commented Jan 31, 2015

Typically, compare (no exploitation)

./sample_code_surrogate_rsvm -ftarget 1e-8 -dim 10 -alg acmaes -fname rosenbrock -fplot rosen_noex.dat -l 210 -x0 2 -no_exploit

to (exploitation)

./sample_code_surrogate_rsvm -ftarget 1e-8 -dim 10 -alg acmaes -fname rosenbrock -fplot rosen_ex.dat -l 210 -x0 2

@agrayver
Copy link
Author

agrayver commented Feb 1, 2015

Thank you for the help. Using -l ~70 * sqrt(N) seems to work. I see significant reduction (1.5-3x) of function calls for different functions and dimension (still have not tested my real app). At the moment I am testing 100D Rosenbrock and it takes very long time. It's been running for more than 5 hours on my quite modern laptop and it is still far from finishing. I'm hence wondering:

  1. What is the complexity of the RSVM with respect to N?
  2. What exactly takes so much time, is there potential for optimization?

@beniz
Copy link
Collaborator

beniz commented Feb 1, 2015

Try lowering the number of iterations of the RSVM algo 'rsvm_iter' to something in between 150000 and 1M which is the current very conservative default.

@beniz
Copy link
Collaborator

beniz commented Feb 2, 2015

What is the complexity of the RSVM with respect to N?

Roughly speaking, O(niter,l-1) for the algorithm, with l~70_sqrt(N), and O(N_l^2) for the kernel computation.

For thorough details, see https://www.lri.fr/~ilya/phd.html page 81 (section 4.1.1)

@agrayver
Copy link
Author

agrayver commented Feb 2, 2015

Thank you, I got a reasonable performance with testing function. Now, switching to my actual problem I still see that surrogates deteriorate convergence. Here is results with no exploitation, 40D:

mt_40d_noexp

Here RSVM was used (with -l 500 and rsvm_iter 200000):

mt_40d_exp

I used the same starting guess and seed for both.
Any idea?

@beniz
Copy link
Collaborator

beniz commented Feb 2, 2015

At first glance I d suggest you try increasing rsvm_iter and see if at least it does improve convergence.

On February 3, 2015 12:06:34 AM GMT+01:00, Alexander Grayver [email protected] wrote:

Thank you, I got a reasonable performance with testing function. Now,
switching to my actual problem I still see that surrogates deteriorate
convergence. Here is results with no exploitation, 40D:

mt_40d_noexp

Here RSVM was used (with -l 500 and rsvm_iter 200000):

mt_40d_exp

I used the same starting guess and seed for both.
Any idea?


Reply to this email directly or view it on GitHub:
#106 (comment)

Sent from my Android device with K-9 Mail. Please excuse my brevity.

@nikohansen
Copy link
Collaborator

It looks like the initial step-size is at least a factor 1000 (in the previous plot a factor 1e5) too small and the mechanism to prevent one eigenvalue to massively increase doesn't work out here. The second point should be checked in the lib.

Addition: it can be useful to use a small initial step-size to check where locally the best solution will be found. In this case I would ideally expect an initial increase by a factor of ten or so.

@beniz
Copy link
Collaborator

beniz commented Feb 5, 2015

The second point should be checked in the lib.

This is not utterly clear to me, could you give me more details, thanks!

@nikohansen
Copy link
Collaborator

Sorry, yes, this is the factor hsig which "modulates" the learning rate for the update of the evolution path pc for the covariance matrix update. Looking at the graphs carefully reveals however that the step-size sigma is increasing very quickly only in the beginning, where hsig seems to be effective. That means the reason is not likely to be a mistaken computation of hsig, but a too slow increment of sigma. It doesn't look like ideal strategy behavior though.

@beniz beniz removed the bug label Feb 11, 2015
@beniz
Copy link
Collaborator

beniz commented Feb 20, 2015

Closing for no activity after 15 days. Can be reopened as needed.

@beniz beniz closed this as completed Feb 20, 2015
andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016
andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016
andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016
andrewsali pushed a commit to andrewsali/libcmaes that referenced this issue Jan 31, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants