Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it normal for GCC 9.1 to be build with stripped flags? #297

Open
jiblime opened this issue May 7, 2019 · 31 comments
Open

Is it normal for GCC 9.1 to be build with stripped flags? #297

jiblime opened this issue May 7, 2019 · 31 comments

Comments

@jiblime
Copy link
Contributor

jiblime commented May 7, 2019

Something I've noticed with compiling 9.1.0 vs <= 8.3.0 is that it is compiled with these flags:

-march=native -pipe -O2

This is in the initial log when emerging:

 * strip-flags: CFLAGS: changed '-march=native -O3 -fgraphite-identity -floop-nest-optimize -fno-semantic-interposition -flto= -fipa-pta -fuse-linker-plugin -pipe -falign-functions=32' to '-march=native -pipe -O2'
 * strip-flags: CXXFLAGS: changed '-march=native -O3 -fgraphite-identity -floop-nest-optimize -fno-semantic-interposition -flto= -fipa-pta -fuse-linker-plugin -pipe -falign-functions=32' to '-march=native -pipe -O2'
 * strip-flags: FFLAGS: changed '-march=native -O3 -fgraphite-identity -floop-nest-optimize -fno-semantic-interposition -flto= -fipa-pta -fuse-linker-plugin -pipe -falign-functions=32' to '-march=native -pipe -O2'
 * strip-flags: FCFLAGS: changed '-march=native -O3 -fgraphite-identity -floop-nest-optimize -fno-semantic-interposition -flto= -fipa-pta -fuse-linker-plugin -pipe -falign-functions=32' to '-march=native -pipe -O2'
 * strip-flags: LDFLAGS: changed '-march=native -O3 -fgraphite-identity -floop-nest-optimize -fno-semantic-interposition -flto= -fipa-pta -fuse-linker-plugin -pipe -falign-functions=32 -Wl,--hash-style=gnu' to '-march=native -pipe -Wl,--hash-style=gnu -O2'

And these are my USE flags:

USE="cxx fortran graphite (multilib) nls nptl objc openmp pch pgo (pie) sanitize ssp vtv (-altivec) -d -debug -doc (-fixed-point) -go (-hardened) (-jit) (-libssp) -objc++ -objc-gc -systemtap -test -vanilla"

This is not true for other packages, and was not fixed with a recompile and exporting defined flags from make.conf. Is this expected behavior?

@InBetweenNames
Copy link
Owner

Yup, it is normal. It's strip-flags from flag-o-matic.eclass at work. You can override it by emerging sys-config/ltoize with USE=override-flagomatic. There's a manual-enable set in ltoworkarounds.conf for GCC to force its use though, as I've found it won't build using LTO at all right now. It used to, though!

@nivedita76
Copy link

nivedita76 commented May 7, 2019

GCC doesn't need the override.

The trick is to use an env file, here's my gcc.conf.
EXTRA_ECONF='--with-build-config=bootstrap-lto'
BOOT_CFLAGS="-march=native ${OPT} ${FALIGN} ${GRAPHITE}"

The EXTRA_ECONF configures gcc to build with lto, and BOOT_CFLAGS which isn't stripped by the gentoo ebuild tells it to bootstrap with those flags (the definitions of OPT etc are in my make.conf but you get the idea). Put this file in /etc/portage/env, and add a file in /etc/portage/package.env with "sys-devel/gcc gcc.conf" to use it.

@barolo
Copy link

barolo commented May 7, 2019

@nivedita76 you don't have to mention it in packag.env, just put them in /etc/portage/env/sys-devel/gcc and emerge will pick it up

@nivedita76
Copy link

My real config has it only applied for >gcc-8 (so I have a "stable" compiler version to fall back on just in case). Would it pick it up if I did env/sys-devel/gcc-9, or does it have to be the full version number then?

@InBetweenNames
Copy link
Owner

@nivedita76 Thank you! I just tested it and it seems to be working! I'll look into adding a USE=lto to sys-devel/gcc. I also tried BOOT_CFLAGS and noticed the flags are indeed passing through, at least in the first stage of compilation.

@InBetweenNames
Copy link
Owner

PR created upstream: gentoo/gentoo#11943

@nivedita76
Copy link

Thanks! They are used to build the compiler itself but not the startup libraries (libgcc etc) I think.

@jiblime
Copy link
Contributor Author

jiblime commented May 9, 2019

@nivedita76 thank you for the helpful tip. Do you happen to have any resources on the differences between BOOT_CFLAGS and CFLAGS?

@nivedita76
Copy link

nivedita76 commented May 9, 2019

BOOT_CFLAGS are what are used for building stage2/stage3 compilers, i.e. what eventually gets installed. I'm actually not 100% sure whether CFLAGS gets included in that by default or it only uses it for stage1 or something.

https://gcc.gnu.org/install/

There's some info in the Building section here but it doesn't mention what happens to regular CFLAGS. It does suggest a way of passing custom CFLAGS to the libgcc etc as well (CFLAGS_FOR_TARGET)

@barolo
Copy link

barolo commented May 10, 2019

@nivedita76 correct me if I'm wrong but isn't using EXTRA_ECONF='--with-build-config=bootstrap-lto' preventing PGO build? which uses profiledbootstrap?

@nivedita76
Copy link

Nope. The make target is profiledbootstrap which is what makes it use pgo.

@nivedita76
Copy link

https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=config/bootstrap-lto.mk;hb=HEAD

The configure argument basically adds these into the Makefile. Looking at the snippet I'm actually surprised that lto build works without using this as well.. the -frandom-seed argument says:
This option provides a seed that GCC uses in place of random numbers in generating certain symbol names that have to be different in every compiled file. It is also used to place unique stamps in coverage data files and the object files that produce them. You can use the -frandom-seed option to produce reproducibly identical object files.

I would have thought not having that would mess up the build's test -- it checks that it can compile itself reproducibly to make sure you don't end up with a horribly broken compiler.

@nivedita76
Copy link

Hm I think it doesn't do the comparisons if it's doing a pgo build rather than a regular one.

@barolo
Copy link

barolo commented May 10, 2019

@nivedita76 Thanks! I stand corrected

@InBetweenNames
Copy link
Owner

InBetweenNames commented May 13, 2019

Upstream bug is here: https://bugs.gentoo.org/685634

Quote:

> >>>>>> Please mention in both bootstrap-lto-lean.mk and the documentation
> >>>>>> that the intended make target for this config is profiledbootstrap
> >>>>>> since for non-profiledbootstrap it ends up not using LTO at all.  A 
> >>>>>> "lean"
> >>>>>> mode for non-profiledbootstrap would need to set up things to
> >>>>>> use LTO only for stage3 which means not doing a bootstrap comparison
> >>>>>> which means we could "skip" stage2 as well here.

From: https://www.mail-archive.com/[email protected]/msg210066.html

It appears that this is the intended way to build GCC with LTO and PGO. You use the build config option at configure time and then make profiledbootstrap at build time. I plan on optionally feeding CFLAGS into BOOT_CFLAGS when using override-flagomatic or perhaps introducing another USE like optimize-gcc, with sys-config/ltoize.

@InBetweenNames
Copy link
Owner

Patch was accepted upstream!

InBetweenNames added a commit that referenced this issue May 14, 2019
Reference #297

Signed-off-by: Shane Peelar <[email protected]>
@InBetweenNames
Copy link
Owner

Now that we can use both LTO and PGO in conjunction, I'd like to also support users injecting their CFLAGS into BOOT_CFLAGS, minus -flto (since that is handled internally). Since we use bootstrap-lto as the configuration for GCC, comparisons are made between stage 2 and stage 3 binaries as a test to ensure the final GCC is sane. I'll be testing out all optimizations on my own rig for a few weeks and depending how that goes, it would be nice to have an opt-in for users to do this.

@nivedita76
Copy link

@InBetweenNames If you use pgo no comparison is done.

@nivedita76
Copy link

I added this to package.cflags/gcc
>=sys-devel/gcc-9 *FLAGS-=-flto* BOOT_CFLAGS='"${CFLAGS} ${OPTCFLAGS}"'

@InBetweenNames
Copy link
Owner

Are you sure no comparison is done? I checked the bootstrap-lto.mk config here:

https://github.com/gcc-mirror/gcc/blob/master/config/bootstrap-lto.mk

do-compare = $(SHELL) $(srcdir)/contrib/compare-lto $$f1 $$f2
extra-compare = gcc/lto1$(exeext)

It seems it compares two stages at least. Does PGO skip over do-compare and extra-compare?

@nivedita76
Copy link

Yes it will compare if you do only lto, but pgo bootstrap has no compare targets. It already builds 4 stages I guess they felt building a 5th for the comparison was just too much.

@nivedita76
Copy link

I patched it to add one and with my options it does checkout fwiw.

@InBetweenNames
Copy link
Owner

Excellent! Do you think we should integrate your patch here? It might ease some users minds about applying LTO + PGO + BOOT_CFLAGS optimizations to their GCC.

@nivedita76
Copy link

gcc-full-pgo.txt

Attaching the current state. This will actually do a 6-stage bootstrap. It uses the profile from the stage built using profile-use (normally the last stage) to do another build, idea was to collect better profiling information about the passes that only get enabled with profile-use. It then does a compare of that final product, so 6 stages total. I've tested with bootstrap-lto though not with the -lean variant.

@InBetweenNames
Copy link
Owner

@nivedita76 one more question -- I notice you use OPTCFLAGS as well, do you have those defined somewhere?

@nivedita76
Copy link

@InBetweenNames I have that in my make.conf. The bashrc-mv overlay appends those to CFLAGS. So what I did was have CFLAGS be safe defaults and set all the extra flags in OPTCFLAGS. This is what the flags section of my make.conf looks like. (note some of the stuff is unused)

source make.conf.lto.defines
FALIGN="-falign-functions=32"
# RETPOLINE="-mindirect-branch=thunk -mfunction-return=thunk -mindirect-branch-register"
RETPOLINE=""
FTLS="-mtls-dialect=gnu2"
LOOPPAR="-floop-parallelize-all -ftree-loop-parallelize=4"
NOPLT="-fno-plt"
FVISIBILITY="-fvisibility-inlines-hidden"
OPT="-O3 -fira-loop-pressure -flive-range-shrinkage"
OPTCFLAGS="${OPT} ${FASTMATH} ${GRAPHITE} ${IPA} ${FLTO} ${SEMINTERPOS} ${FTLS}"
OPTCXXFLAGS="${OPTCFLAGS} -fdevirtualize-at-ltrans ${FVISIBILITY}"
# DEBUGFLAGS="-ggdb"
DEBUGFLAGS=""
SAFEFLAGS="-pipe -march=native -O2 ${FALIGN} ${NOPLT}"
CFLAGS="${SAFEFLAGS} ${DEBUGFLAGS} ${RETPOLINE}"
CFLAGS_x86="${CFLAGS_x86} -mfpmath=sse"
CXXFLAGS="${CFLAGS}"
RUSTFLAGS="-C target-cpu=native -C opt-level=2"
LDFLAGS="${LDFLAGS}"

@jiblime
Copy link
Contributor Author

jiblime commented Nov 13, 2019

I was able to edit the GCC ebuild and push in my own flags, which BOOT_CFLAGS inherited (if only I knew). The difference though is that my compile time was cut in half (?!). I'm willing to bet you can add EXTRA_ECONF='STAGE1_CFLAGS="-O2 -pipe"' to your package.env/ file instead of going through this trouble. But Gentoo is about choices!

sys-devel/gcc: 1:26:34   -- LTO/PGO
sys-devel/gcc: 34′04″    -- No LTO/PGO so I can test -flto=auto patch without waiting
sys-devel/gcc: 34′18″    -- No LTO/PGO for the same reason, different implicit multithread -flto patch
sys-devel/gcc: 46′17″    -- LTO/PGO tested with -flto auto and injected stage 1 flags, compile time reduced by >30min ^^

The -flto patch now automatically detects the number of CPUs I have so I no longer need to define a number. This was backported from GCC 10 and is right here. All that needs fixing is the Changelog.


The ebuild to use custom flags:

# Copyright 1999-2019 Gentoo Authors
# Distributed under the terms of the GNU General Public License v2

EAPI="7"

PATCH_VER="3"

inherit toolchain

KEYWORDS="~alpha amd64 ~arm arm64 ~hppa ~ia64 ~m68k ~mips ~ppc ppc64 ~riscv s390 ~sh sparc x86"
IUSE+="custom-cflags"

RDEPEND=""
DEPEND="${RDEPEND}
	elibc_glibc? ( >=sys-libs/glibc-2.13 )
	>=${CATEGORY}/binutils-2.20"

if [[ ${CATEGORY} != cross-* ]] ; then
	PDEPEND="${PDEPEND} elibc_glibc? ( >=sys-libs/glibc-2.13 )"
fi

# Since all the ebuild does is source its environment from the toolchain eclass (and its inherits and so on)
# all that needs to be done for custom CFLAGS is to redefine strip-flags and replace-flags

# sys-config/ltoize[override-flagomatic] does this but removes all the flag-o-matic functions,
# most of which are workarounds for older GCC versions but also the essential filters for
# funky flags and substitution for architecture definitions in GCC.

# Originally I thought I needed to copy entirely and redefine the gcc_do_filter_flags function
# but it doesn't matter since strip-flags and replace-flags aren't used anywhere else

check_em() {
	for eclass in eutils fixheadtails gnuconfig libtool multilib pax-utils toolchain-funcs prefix ; do
		grep 'strip\-flags\|replace\-flags' $(portageq eclass_path ${SYSROOT} gentoo ${eclass})
	done

	# Ideally use this function to test for nonzero output and fail if so since that would mean
	# *something* has changed and requires these either of these functions. For now whatever
}

pkg_setup() {
	if use custom-cflags ; then
		strip-flags() {
			ewarn "Flags were not stripped for sanity. You might be interested in using quickpkg on GCC if this goes horribly wrong"
		}

		replace-flags() {
			elog "Sometimes -O2 is prefixed to the compiler flags. Any -O level that follows will replace it. -flto* flags will be replaced as long as USE lto is active"
		}
		# -flto flags need to be filtered or else the stage 1 will need to be LTO'd too.
		# That would increase build time significantly for no performance boost. USE lto will enable LTO for the later stages
		filter-flags -flto* 

		# optimize the stage 1 a little bit to make the total compile time shorter https://patchwork.ozlabs.org/patch/766906/
		STAGE1_CFLAGS="-O2 -march=native -pipe" 
	fi
}

emerge --info gcc

sys-devel/gcc-9.2.0-r3::local was built with the following:
USE="custom-cflags (cxx) fortran graphite lto (multilib) nls nptl objc openmp pch pgo sanitize ssp vtv (-altivec) -d -debug -doc (-fixed-point) -go (-hardened) -jit (-libssp) -objc++ -objc-gc -pie -systemtap -test -vanilla" ABI_X86="(64)"
CFLAGS="-O3 -march=native -fgraphite-identity -floop-nest-optimize -fdevirtualize-at-ltrans -fno-semantic-interposition -fno-math-errno -fno-trapping-math -malign-data=cacheline -pipe -falign-functions=32 -fuse-ld=gold -fuse-linker-plugin -Wl,-O1 -Wl,--as-needed"
CXXFLAGS="-O3 -march=native -fgraphite-identity -floop-nest-optimize -fdevirtualize-at-ltrans -fno-semantic-interposition -fno-math-errno -fno-trapping-math -malign-data=cacheline -pipe -falign-functions=32 -fuse-ld=gold -fuse-linker-plugin -Wl,-O1 -Wl,--as-needed"
FEATURES="multilib-strict xattr usersync merge-sync parallel-fetch news strict assume-digests pid-sandbox usersandbox preserve-libs split-log unmerge-logs ipc-sandbox config-protect-if-modified candy unknown-features-warn split-elog binpkg-logs binpkg-docompress ccache protect-owned unmerge-orphans parallel-install sandbox userfetch binpkg-dostrip userpriv network-sandbox distlocks fixlafiles sfperms"
LDFLAGS="-Wl,-O1 -Wl,--as-needed -O3 -march=native -fgraphite-identity -floop-nest-optimize -fdevirtualize-at-ltrans -fno-semantic-interposition -fno-math-errno -fno-trapping-math -malign-data=cacheline -pipe -falign-functions=32 -fuse-ld=gold -fuse-linker-plugin"

I generally use -fuse-ld so I can know what I built a package with but I may opt to just default to gold only after the recent issues. I've also had the most runtime issues with -fipa-pta and -fno-plt so I no longer use those; -fno-semantic-interposition seems to consistently give the best performance out of all the flags with the least random runtime errors

@elsandosgrande
Copy link
Contributor

  1. Does -flto=auto alone reduce the compilation time to such a degree?
  2. What issues have you been having with -fipa-pta and the Gold linker? I have had none that I can think of.

@jiblime
Copy link
Contributor Author

jiblime commented Nov 14, 2019

-flto=auto is the same as -flto=jobserver, so it would be the same for GCC. I think I got really lucky with ccache with that.

About -fipa-pta, I was mistaken because I had it on my mind, sorry about that

@elsandosgrande
Copy link
Contributor

  1. Does -flto=jobserver alone reduce the compilation time to such a degree? It is unclear to me.
  2. All right. I have also seen that you had -fno-plt issues. What were they?

@jiblime
Copy link
Contributor Author

jiblime commented Nov 17, 2019

  1. I appear to be unclear in explanation. When flto is called with =jobserver, that means linking will be parallelized equal to the MAKEOPTS that you've specified. If you have MAKEOPTS="-j4" in your make.conf, -flto=jobserver should mean -flto=4. But this should only true be for plain make/gmake. Other make systems like ninja apparently do not recognize the jobserver argument. You would be better off specifying the number of threads that -flto will use based on the number of threads your processor has.

  2. -fno-plt problems are random, and that is the problem I have with it. I can't track when it is causing an issue and don't really care for it. This is equal to removing -flto from all flags just because I am too lazy to figure out workarounds that -flto causes. So I am just too lazy to figure it out.

Note:

The only benefit that the GCC 10 -flto auto-parallelization backport I've used is convenience. The only case I know where I would benefit from it is if I had decided to configure python --with-lto. But that would be a bad idea because Python's configure.ac specificies LTOFLAGS="-flto -fuse-linker-plugin -ffat-lto-objects -flto-partition=none", and none is not optimal afaik because you want partitioning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants