Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why is Signals volatile ? #431

Closed
lucasart opened this issue Sep 19, 2015 · 9 comments
Closed

why is Signals volatile ? #431

lucasart opened this issue Sep 19, 2015 · 9 comments

Comments

@lucasart
Copy link

So long as they are only read/written under lock protection, we are safe from unwanted compiler optimization.

Also, the bool members of SignalsType should be std::atomic_flag, or std::atomic<bool>, if we wanted to modify them outside lock protection.

Am I missing something ?

@mcostalba
Copy link

I don't see signals are lock protected, for instance 'stop' is not.

Regarding atomic, why you think we should use atomic instead of a bool in this case? (please if possible specific answer, no general stuff).

@lucasart
Copy link
Author

As far as I know, and correct me if I'm wrong:

  • the C standard does not even guarantee that a 4-byte variable will be written atomically. On x86 and x86-64, the actual 32-bit mov instruction does the write atomically, but a standard conforming compiler is perfectly allowed to compile the code instead as 4 1-byte writes in shuffled order. That seems stupid and crazy, but it's not forbidden. Who knows? There may be some weird platform or weird cases where it makes sense for a compiler to do that. Or maybe today there aren't such cases, but tomorrow there will be?
  • the compiler or hardware can do any kind of reordering. If you set signals.stop = true, then signals.stoponponderhit = true, the current thread may see these values change in one order, and another thread will see them change in a different order, due to cache coherency effects.
  • there is no guarantee that once a a thread sets a variable (even if you make it volatile) to some value, the new value is visible to everyone after any given amount of time. I think the C++ standard has a loose wording like "after a reasonable amount of time" or something like that.

In practice, perhaps the SF code is such that all the possible scenarios that can result from this do not cause any problem, and have been thought through correctly. But if that's the case, it makes the code much harder to understand (and easier to break by seemingly trivial modifications that fail to understand the whole logic). It really takes a lot more brain power to envisage all possibilities and be sure that all of them are handled correctly.

With lock/unlock or atomics, you have the standard to guarantee your expectations, and you have piece of mind, and a codebase that is future proof.

Is there a mesurable performance penalty in imposing the following constraints?

  • signals is an atomic
  • bit 0 means stop, bit 1 means stopOnPonderHit, etc.
  • signal can only be written under lock protrecion, and this must be done only once in the critical section (ie. calculate the new value and set signal = newValue only once).
  • signal can be read outside lock protection. This is the hot path where you don't want the lock to slow you down. Because modifications are atomic and done one shot, you know that you can never read a half baked signal where the thread modifying it has modified signal.stop but not yet signal.stopOnPonderHit (or maybe it has from the pov of the writing thread but not from the pov of the reading thread).

Edit: The "after a reasonable amount of time" is only for atomics. For volatile, it seems that no guarantee is given.

@mcostalba
Copy link

mcostalba commented Sep 29, 2015 via email

@mstembera
Copy link
Contributor

Marco, I'm not familiar with this code but your explanation makes complete sense to me.
Thank you! All I want to add is that if atomic was used for self-documenting purposes and
I was reading the code on my own without your explanation it would actually be harder for
me to realize it because I would assume atomic was in fact necessary.

@lucasart
Copy link
Author

@mcostalba: By default atomics use memory_order_seq_cst. You could use memory_order_relaxed when reading the signal (on the hot path). But, of course, signals.load(memory_order_seq_cst) is more verbose than just signals. At least the C++ gives you a couple of guarantees that volatile does not:

  • read/write are atomic (in practice it would be the case anyway, but now it's guaranteed by the standard). so you can't have a torn write observed half-way through by another thread. I don't know if there are platforms where bool is not atomic, but I do know that there are platforms where 1-byte read/writes are non-atomic...
  • writes are visible to other threads after a reasonable amount of time.

@lucasart
Copy link
Author

anyway. i'm closing this, because it's not a bug it's a feature.

@lucasart
Copy link
Author

@mcostalba: atomics with memory_order_acquire/release are completely free on x86. no memory barriers. it's only if you use the default memory_order_seq_cst that you have memory barriers. if you don't believe me, write a trivial program, and look at the assembly generated. it's only on ARM that you will see some memory barriers.

I submitted tests with and without memory barriers on fishtest.

My point is two-fold:

  • the theoretical cost of a memory barrier is not even mesurable in practice (that's for ARM where we don't currently test).
  • correctness can be achieved on x86 without barriers, and without locks either.

@lucasart lucasart reopened this Oct 12, 2015
@vondele
Copy link
Member

vondele commented Oct 16, 2015

Actually, there is now a good tool to detect data races in code, please try -fsanitize=thread with a recent gcc (using 5.2 here) on linux. It does trigger often in stockfish, with quite clean output.

For a good discussion on data races by the author of the implementation in gcc, see:
https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
by the firefox team:
https://blog.mozilla.org/nnethercote/2015/02/24/fix-your-damned-data-races/

So, I think performance is one aspect, but code correctness an other one, and as mentioned by lucasart, on x86 atomics are partially for free depending on the memory_order.

For example, the first race found by the tool (on stockfish benchmark with 1 thread):

WARNING: ThreadSanitizer: data race (pid=46204)
Write of size 1 at 0x7d1c0000dff9 by thread T2:
#0 MainThread::think() /data/vjoost/gnu/tst/Stockfish-atomic_signals/src/search.cpp:302 (stockfish+0x00000041b8b9)
#1 MainThread::idle_loop() /data/vjoost/gnu/tst/Stockfish-atomic_signals/src/thread.cpp:157 (stockfish+0x00000043738b)
#2 void std::Mem_fn_base<void (MainThread::*)(), true>::operator()<, void>(MainThread) const /data/vjoost/gnu/gcc-libstd/gcc-5.2.0/install/include/c++/5.2.0/functional:600 (stockfish+0x00000043d876)
#3 void std::Bind_simple<std::Mem_fn<void (MainThread::)()> (MainThread)>::_M_invoke<0ul>(std::_Index_tuple<0ul>) /data/vjoost/gnu/gcc-libstd/gcc-5.2.0/install/include/c++/5.2.0/functional:1531 (stockfish+0x00000043d876)
#4 std::_Bind_simple<std::Mem_fn<void (MainThread::)()> (MainThread*)>::operator()() /data/vjoost/gnu/gcc-libstd/gcc-5.2.0/install/include/c++/5.2.0/functional:1520 (stockfish+0x00000043d876)
#5 _M_run /data/vjoost/gnu/gcc-libstd/gcc-5.2.0/install/include/c++/5.2.0/thread:115 (stockfish+0x00000043d876)
#6 execute_native_thread_routine (libstdc++.so.6+0x0000000f6fcd)

Previous read of size 1 at 0x7d1c0000dff9 by thread T1 (mutexes: write M79):
#0 TimerThread::idle_loop() /data/vjoost/gnu/tst/Stockfish-atomic_signals/src/thread.cpp:108 (stockfish+0x000000437627)
#1 void std::Mem_fn_base<void (TimerThread::*)(), true>::operator()<, void>(TimerThread) const /data/vjoost/gnu/gcc-libstd/gcc-5.2.0/install/include/c++/5.2.0/functional:600 (stockfish+0x00000043d7ee)
#2 void std::Bind_simple<std::Mem_fn<void (TimerThread::)()> (TimerThread)>::_M_invoke<0ul>(std::_Index_tuple<0ul>) /data/vjoost/gnu/gcc-libstd/gcc-5.2.0/install/include/c++/5.2.0/functional:1531 (stockfish+0x00000043d7ee)
#3 std::_Bind_simple<std::Mem_fn<void (TimerThread::)()> (TimerThread*)>::operator()() /data/vjoost/gnu/gcc-libstd/gcc-5.2.0/install/include/c++/5.2.0/functional:1520 (stockfish+0x00000043d7ee)
#4 _M_run /data/vjoost/gnu/gcc-libstd/gcc-5.2.0/install/include/c++/5.2.0/thread:115 (stockfish+0x00000043d7ee)
#5 execute_native_thread_routine (libstdc++.so.6+0x0000000f6fcd)

This one can be fixed by using
std::atomic run = ATOMIC_VAR_INIT(false);
in struct TimerThread

Leading to the next:

WARNING: ThreadSanitizer: data race (pid=46785)
Write of size 8 at 0x00000065b268 by thread T2:
#0 init /data/vjoost/gnu/tst/Stockfish-atomic_signals/src/timeman.cpp:105 (stockfish+0x000000419789)
#1 MainThread::think() /data/vjoost/gnu/tst/Stockfish-atomic_signals/src/search.cpp:229 (stockfish+0x000000419789)
#2 MainThread::idle_loop() /data/vjoost/gnu/tst/Stockfish-atomic_signals/src/thread.cpp:157 (stockfish+0x0000004373a3)
#3 void std::Mem_fn_base<void (MainThread::*)(), true>::operator()<, void>(MainThread) const /data/vjoost/gnu/gcc-libstd/gcc-5.2.0/install/include/c++/5.2.0/functional:600 (stockfish+0x00000043d87e)
#4 void std::Bind_simple<std::Mem_fn<void (MainThread::)()> (MainThread)>::_M_invoke<0ul>(std::_Index_tuple<0ul>) /data/vjoost/gnu/gcc-libstd/gcc-5.2.0/install/include/c++/5.2.0/functional:1531 (stockfish+0x00000043d87e)
#5 std::_Bind_simple<std::Mem_fn<void (MainThread::)()> (MainThread*)>::operator()() /data/vjoost/gnu/gcc-libstd/gcc-5.2.0/install/include/c++/5.2.0/functional:1520 (stockfish+0x00000043d87e)
#6 _M_run /data/vjoost/gnu/gcc-libstd/gcc-5.2.0/install/include/c++/5.2.0/thread:115 (stockfish+0x00000043d87e)
#7 execute_native_thread_routine (libstdc++.so.6+0x0000000f6fcd)

Previous read of size 8 at 0x00000065b268 by thread T1:
#0 TimeManagement::elapsed() const /data/vjoost/gnu/tst/Stockfish-atomic_signals/src/timeman.h:36 (stockfish+0x0000004377af)
#1 check_time /data/vjoost/gnu/tst/Stockfish-atomic_signals/src/search.cpp:1551 (stockfish+0x0000004377af)
#2 TimerThread::idle_loop() /data/vjoost/gnu/tst/Stockfish-atomic_signals/src/thread.cpp:113 (stockfish+0x0000004377af)
#3 void std::Mem_fn_base<void (TimerThread::*)(), true>::operator()<, void>(TimerThread) const /data/vjoost/gnu/gcc-libstd/gcc-5.2.0/install/include/c++/5.2.0/functional:600 (stockfish+0x00000043d7f6)
#4 void std::Bind_simple<std::Mem_fn<void (TimerThread::)()> (TimerThread)>::_M_invoke<0ul>(std::_Index_tuple<0ul>) /data/vjoost/gnu/gcc-libstd/gcc-5.2.0/install/include/c++/5.2.0/functional:1531 (stockfish+0x00000043d7f6)
#5 std::_Bind_simple<std::Mem_fn<void (TimerThread::)()> (TimerThread*)>::operator()() /data/vjoost/gnu/gcc-libstd/gcc-5.2.0/install/include/c++/5.2.0/functional:1520 (stockfish+0x00000043d7f6)
#6 _M_run /data/vjoost/gnu/gcc-libstd/gcc-5.2.0/install/include/c++/5.2.0/thread:115 (stockfish+0x00000043d7f6)
#7 execute_native_thread_routine (libstdc++.so.6+0x0000000f6fcd)

Location is global 'Time' of size 32 at 0x00000065b260 (stockfish+0x00000065b268)

@zamar
Copy link

zamar commented Oct 25, 2015

Signals are now atomic.

Closing this issue.

@zamar zamar closed this as completed Oct 25, 2015
niklasf pushed a commit to niklasf/Stockfish that referenced this issue Nov 16, 2017
niklasf pushed a commit to niklasf/Stockfish that referenced this issue Nov 16, 2017
niklasf pushed a commit to niklasf/Stockfish that referenced this issue Nov 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants