Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 2.8.0: Test fails when building with GCC 13 #563

Closed
badshah400 opened this issue Aug 10, 2024 · 11 comments
Closed

Version 2.8.0: Test fails when building with GCC 13 #563

badshah400 opened this issue Aug 10, 2024 · 11 comments

Comments

@badshah400
Copy link
Contributor

When building nlopt version 2.8.0 for openSUSE Tumbleweed, where the default C/C++ compiler is GCC 13, we find that running ctest gives the following "buffer overflow" errors:

[   57s] 54/62 Test #54: testopt_algo26_obj1 ..............   Passed    0.00 sec
[   57s] 55/62 Test #55: testopt_algo27_obj0 ..............Subprocess aborted***Exception:   0.00 sec
[   57s] *** buffer overflow detected ***: terminated
[   57s] 
[   57s]       Start 58: testopt_algo28_obj1
[   57s]       Start 59: testopt_algo29_obj0
[   57s] 56/62 Test #56: testopt_algo27_obj1 ..............Subprocess aborted***Exception:   0.00 sec
[   57s] *** buffer overflow detected ***: terminated
[   57s] 
[   57s] 57/62 Test #57: testopt_algo28_obj0 ..............   Passed    0.00 sec
[   57s]       Start 60: testopt_algo29_obj1
[   57s]       Start 61: test_python
[   57s] 58/62 Test #58: testopt_algo28_obj1 ..............   Passed    0.00 sec
[   57s] 59/62 Test #59: testopt_algo29_obj0 ..............   Passed    0.00 sec
[   57s]       Start 62: test_octave
[   57s] 60/62 Test #60: testopt_algo29_obj1 ..............   Passed    0.00 sec
[   57s] 61/62 Test #61: test_python ......................   Passed    0.11 sec
[   57s] 62/62 Test #62: test_octave ......................   Passed    0.15 sec
[   57s] 
[   57s] 97% tests passed, 2 tests failed out of 62
[   57s] 
[   57s] Total Test time (real) =   0.22 sec
[   57s] 
[   57s] The following tests FAILED:
[   57s] 	 55 - testopt_algo27_obj0 (Subprocess aborted)
[   57s] 	 56 - testopt_algo27_obj1 (Subprocess aborted)
[   57s] Errors while running CTest
[   57s] error: Bad exit status from /var/tmp/rpm-tmp.SNBeIL (%check)
[   57s] 

We do not see these issues with the previous version of NLopt (2.7.1) using the same compiler, nor indeed when using older GCC (version 7) --- as we do for openSUSE Leap 15 --- to build NLopt 2.8.0.

Thanks.

@stevengj
Copy link
Owner

stevengj commented Aug 10, 2024

That's odd — algo27 should be newuoa.c, which hasn't changed in this release.

Can you run it in a debugger and get a stacktrace?

@stevengj
Copy link
Owner

stevengj commented Aug 10, 2024

I tried running valgrind test/testopt -r 0 -a 27 -o 0 and it ran with no errors in nlopt (using gcc 14).

(valgrind gives some warnings deep into a stacktrace for printf, but that looks like an unrelated libc false positive; it happens independent of the NLopt algorithm choice.)

Can try test/testopt -r 0 -a 27 -o 0 specifically, to make sure I'm looking at the right thing? If you build with cmake -DCMAKE_BUILD_TYPE=Debug you can also try running this in the debugger.

@badshah400
Copy link
Contributor Author

Yes this is it:

~> test/testopt -r 0 -a 27 -o 0
~> -----------------------------------------------------------
~> Optimizing Rosenbrock function (2 dims) using Bound-constrained optimization via NEWUOA-based quadratic models (local, no-derivative) algorithm
~> lower bounds at lb = [ -2 -2]
~> upper bounds at ub = [ 2 2]
~> Starting guess x = [ 0.097627 0.430379]
~> Starting function value = 18.5256
~> *** buffer overflow detected ***: terminated
~> /var/tmp/rpm-tmp.3ExlYR: line 34:  2328 Aborted                 test/testopt -r 0 -a 27 -o 0

I managed to get a backtrace, but I do not know how useful this is:

Starting program: /home/abuild/rpmbuild/BUILD/nlopt-2.8.0/build/test/testopt -r 0 -a 27 -o 0
Missing separate debuginfos, use: zypper install glibc-debuginfo-2.39-9.1.x86_64
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Program received signal SIGABRT, Aborted.
0x00007ffff7c949dc in __pthread_kill_implementation () from /lib64/libc.so.6
Missing separate debuginfos, use: zypper install libgcc_s1-debuginfo-14.2.0+git10526-1.1.x86_64 libstdc++6-debuginfo-14.2.0+git10526-1.1.x86_64
#0  0x00007ffff7c949dc in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007ffff7c41176 in raise () from /lib64/libc.so.6
#2  0x00007ffff7c28917 in abort () from /lib64/libc.so.6
#3  0x00007ffff7c297e8 in __libc_message_impl.cold () from /lib64/libc.so.6
#4  0x00007ffff7d20bdb in __fortify_fail () from /lib64/libc.so.6
#5  0x00007ffff7d20506 in __chk_fail () from /lib64/libc.so.6
#6  0x00007ffff7f6dc13 in memset (__len=<optimized out>, __ch=<optimized out>, __dest=<optimized out>, __dest=<optimized out>, __ch=<optimized out>, __len=<optimized out>) at /usr/include/bits/string_fortified.h:59
#7  trsapp_ (ub=<optimized out>, lb=<optimized out>, xbase=<optimized out>, crvmin=<synthetic pointer>, hs=0x5555555726c8, hd=0x5555555726b8, g=0x5555555726a8, d__=0x555555572698, step=<optimized out>, delta=0x7fffffffdd58, pq=<optimized out>, hq=<optimized out>, gq=<optimized out>, xpt=<optimized out>, xopt=0x5555555724a8, npt=0x7fffffffdd38, n=0x7fffffffdd3c) at /home/abuild/rpmbuild/BUILD/nlopt-2.8.0/src/algs/newuoa/newuoa.c:184
#8  newuob_ (w=0x555555572698, vlag=0x555555572660, d__=<optimized out>, ndim=0x7fffffffdd40, zmat=0x5555555725d8, bmat=0x555555572558, pq=0x555555572568, hq=0x555555572550, gq=0x555555572540, fval=0x555555572518, xpt=0x5555555724a0, xnew=0x5555555724b8, xopt=0x5555555724a8, xbase=<optimized out>, calfun_data=0x555555572320, calfun=0x7ffff7f7b970 <f_noderiv(int, double const*, void*)>, minf=0x7fffffffe048, stop=0x7fffffffdec0, ub=0x555555572440, lb=0x555555572420, rhobeg=<synthetic pointer>, x=0x5555555712a8, npt=0x7fffffffdd38, n=0x7fffffffdd3c) at /home/abuild/rpmbuild/BUILD/nlopt-2.8.0/src/algs/newuoa/newuoa.c:1858
#9  newuoa (n=<optimized out>, npt=<optimized out>, x=0x5555555712a8, lb=0x555555572420, ub=0x555555572440, rhobeg=1, stop=0x7fffffffdec0, minf=0x7fffffffe048, calfun=0x7ffff7f7b970 <f_noderiv(int, double const*, void*)>, calfun_data=0x555555572320) at /home/abuild/rpmbuild/BUILD/nlopt-2.8.0/src/algs/newuoa/newuoa.c:2571
#10 0x00007ffff7f881c4 in nlopt_optimize_ (minf=0x7fffffffe048, x=<optimized out>, opt=0x555555572320) at /home/abuild/rpmbuild/BUILD/nlopt-2.8.0/src/api/optimize.c:718
#11 nlopt_optimize (opt=opt@entry=0x555555572320, x=x@entry=0x5555555712b0, opt_f=opt_f@entry=0x7fffffffe048) at /home/abuild/rpmbuild/BUILD/nlopt-2.8.0/src/api/optimize.c:890
#12 0x0000555555557077 in test_function (ifunc=<optimized out>) at /home/abuild/rpmbuild/BUILD/nlopt-2.8.0/test/testopt.c:241
#13 main (argc=7, argv=0x7fffffffe1c8) at /home/abuild/rpmbuild/BUILD/nlopt-2.8.0/test/testopt.c:362
quit

@stevengj
Copy link
Owner

stevengj commented Aug 10, 2024

Thanks, it's failing at this line:

memset(&step[1], 0, sizeof(double) * *n);

where step is a pointer to an array passed in from &d__[1] on this line:

delta, &d__[1], &w[1], &w[np], &w[np + *n], &w[np + (*n << 1)], &

which is passed in from the &w[id] parameter on this line:

&ndim, &w[id], &w[ivl], &w[iw]);

(gotta love these f2c-translated Fortran codes).

I added a printf statement

	printf("DEBUG: iw = %d, n = %d, len = %d\n",
	       id, n, ((npt+13)*(npt+n) + 3*(n*(n+3))/2));

right before the newuob_ call, and in the test case above, it prints out

DEBUG: iw = 56, n = 2, len = 141

which indicates that plenty of space has been allocated (we are looking at 2 elements right in the middle of a length-141 array w, so there shouldn't be a buffer overrun). I also tried adding a couple of printf's to make sure that &step[1] is indeed the same as &w[id], and that checks out:

diff --git a/src/algs/newuoa/newuoa.c b/src/algs/newuoa/newuoa.c
index a3428a6..e82be63 100644
--- a/src/algs/newuoa/newuoa.c
+++ b/src/algs/newuoa/newuoa.c
@@ -181,6 +181,7 @@ static nlopt_result trsapp_(int *n, int *npt, double *xopt,
              if (sub[j] < 0) sub[j] = 0;
              xtol[j] = 1e-7 * *delta; /* absolute x tolerance */
         }
+        printf("DEBUG 2: &step[1] = %p, n = %d\n", &step[1], *n);
         memset(&step[1], 0, sizeof(double) * *n);
         opt = nlopt_create(NLOPT_LD_MMA, *n);
         nlopt_set_min_objective(opt, quad_model, &qmd);
@@ -2564,6 +2565,9 @@ nlopt_result newuoa(int n, int npt, double *x,
     if (!w) return NLOPT_OUT_OF_MEMORY;
     --w;
 
+
+        printf("DEBUG 1: &w[id] = %p, n = %d\n", &w[id], n);
+
 /* The above settings provide a partition of W for subroutine NEWUOB. */
 /* The partition requires the first NPT*(NPT+N)+5*N*(N+3)/2 elements of */
 /* W plus the space that is needed by the last array of NEWUOB. */

As I said, I can't reproduce this problem with gcc 14 and valgrind, so I'm not sure what the problem could be. Can you try with gcc 14?

@badshah400
Copy link
Contributor Author

Crashes with GCC 14 too, unfortunately. Perhaps you could try compiling with the following additional GCC flags used by default on openSUSE during compilation to see if you can reproduce the issue:

-O2 -Wall -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type  -g -ffp-contract=off -O2 -g -DNDEBUG -fPIC -MD -MT

In case it helps, here is a full log for the failing build: _log.zip

@badshah400
Copy link
Contributor Author

Managed to narrow it down to the use of -D_FORTIFY_SOURCE=3. If I change this to -D_FORTIFY_SOURCE=2, the tests all pass. However, according to this, this does suggest a bug in the code that is missed by -D_FORTIFY_SOURCE=2, if I understand correctly.

@stevengj
Copy link
Owner

stevengj commented Aug 10, 2024

It's possible that FORTIFY_SOURCE doesn't like the Fortran-style 1-based indexing that is generated by f2c, which is perfectly safe (if used correctly) but may look odd to the compiler.

In particular, in order to implement 1-based indexing in C code, f2c takes all of the array pointers and decrements them by 1 at the beginning of each function — that's why you'll see lines like --step. Then the first element becomes step[1] (rather than C's usual step[0]), but this may confuse gcc's "fortification" since *step itself points to an invalid location (8 bytes before the beginning of the array)?

In which case you should just turn off the FORTIFY_SOURCE option and ignore this. I'm inclined to think that this is a bug in the FORTIFY_SOURCE mode — it's getting confused about valid pointer dereferences due to the weird way that buffers are managed in newuoa.c.

(If valgrind, which is much more rigorous, passes, and FORTIFY_SOURCE fails, then it seems likely that's a bug in FORTIFY_SOURCE. On my machine, however, it succeeds even with the -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 flags.)

@badshah400
Copy link
Contributor Author

All right, I built the package using 2 instead of 3 for D_FORTIFY_SOURCE and submitted it. Many thanks for the discussion, your suggestions and advice. Feel free to close this issue at your convenience.

@bkmgit
Copy link

bkmgit commented Sep 25, 2024

Similar error on Fedora 42

60/64 Test #55: testopt_algo27_obj0 ..............Subprocess aborted***Exception:   0.15 sec
*** buffer overflow detected ***: terminated
61/64 Test #56: testopt_algo27_obj1 ..............Subprocess aborted***Exception:   0.15 sec
*** buffer overflow detected ***: terminated

BuildLog

@bkmgit
Copy link

bkmgit commented Sep 25, 2024

There are a few warnings for the Nelder-Mead algorithm

nlopt-7cdebfe5f777b12d3c5b0788c38fe595444d69c6/src/api -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -DNDEBUG -std=gnu++11 -fPIC -MD -MT CMakeFiles/nlopt.dir/src/algs/ags/solver.cc.o -MF CMakeFiles/nlopt.dir/src/algs/ags/solver.cc.o.d -o CMakeFiles/nlopt.dir/src/algs/ags/solver.cc.o -c /builddir/build/BUILD/NLopt-2.8.0_202409167cdebfe-build/nlopt-7cdebfe5f777b12d3c5b0788c38fe595444d69c6/src/algs/ags/solver.cc
/builddir/build/BUILD/NLopt-2.8.0_202409167cdebfe-build/nlopt-7cdebfe5f777b12d3c5b0788c38fe595444d69c6/redhat-linux-build/nlopt.hpp:121: Warning 401: Nothing known about base class 'std::runtime_error'. Ignored.
/builddir/build/BUILD/NLopt-2.8.0_202409167cdebfe-build/nlopt-7cdebfe5f777b12d3c5b0788c38fe595444d69c6/redhat-linux-build/nlopt.hpp:125: Warning 401: Nothing known about base class 'std::runtime_error'. Ignored.
In file included from /usr/include/string.h:548,
                 from /builddir/build/BUILD/NLopt-2.8.0_202409167cdebfe-build/nlopt-7cdebfe5f777b12d3c5b0788c38fe595444d69c6/src/algs/neldermead/nldrmd.c:25:
In function ‘memset’,
    inlined from ‘nldrmd_minimize_’ at /builddir/build/BUILD/NLopt-2.8.0_202409167cdebfe-build/nlopt-7cdebfe5f777b12d3c5b0788c38fe595444d69c6/src/algs/neldermead/nldrmd.c:205:4:
/usr/include/bits/string_fortified.h:59:10: warning: ‘memset’ specified bound between 18446744056529682432 and 18446744073709551608 exceeds maximum object size 9223372036854775807 [-Wstringop-overflow=]
   59 |   return __builtin___memset_chk (__dest, __ch, __len,
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   60 |                                  __glibc_objsize0 (__dest));

and

In function ‘memset’,
    inlined from ‘nldrmd_minimize_’ at /builddir/build/BUILD/NLopt-2.8.0_202409167cdebfe-build/nlopt-7cdebfe5f777b12d3c5b0788c38fe595444d69c6/src/algs/neldermead/nldrmd.c:205:4:
/usr/include/bits/string_fortified.h:59:10: warning: ‘__builtin_memset’ specified bound between 18446744056529682432 and 18446744073709551608 exceeds maximum object size 9223372036854775807 [-Wstringop-overflow=]
   59 |   return __builtin___memset_chk (__dest, __ch, __len,
      |          ^

@jschueller
Copy link
Collaborator

hi, I can totally reproduce with gcc 12/13 on ubuntu without any special flags,
I propose to workaround it by compiling newoa with -U_FORTIFY_SOURCE
@badshah400 could you check master branch works for you ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants