-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get visibilities from shared objects to use them in filtering #89
Conversation
@woodruffw could you take a look? I still have to do the Windows stuff, but I'd welcome a first opinion, especially on all of the Mach-O spaghetti. EDIT: |
I think this is ready now. |
Thanks! I’m going to be a bit busy this week, but I’ll try and do a review here tomorrow or Wednesday. Sent from mobile. Please excuse my brevity.On Apr 15, 2024, at 10:48 AM, Nicholas Junge ***@***.***> wrote:
I think this is ready now.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Notes and observations:
Meaning that dockerfile regressed on Linux, but improved on Mac. Now, flagging the module entry point on Linux is wrong, but it seems that is marked as either weak or global in the extension (I don't have the FWIW, google_benchmark and nanobind_example (for which I wrote the C/C++ build system configs) are green on either branch. |
Anything I can do to support you @woodruffw? I've been thinking about a CI setup that regression-tests some "golden" wheels against the current PR, although that presents the problem of hand-curating "good" wheels. |
That would be helpful, thank you! I think the (I'm sorry for the delay here -- I'll try and find some time to review tomorrow.) |
a03ea38
to
28831db
Compare
I added a complete test audit of the latest |
What do you say @woodruffw? Should we just flat-out ignore all symbols starting with |
Yeah, that sounds right to me -- I think in practice a single |
Nice! Now cryptography audits are green on this branch, too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nicholasjng, LGTM! Needs update/rebase but otherwise should be good to go.
(Sorry for the delay here -- I've been on PTO.)
This is the final step in selective auditing: Obtaining the symbol visibility info, and then deciding if this symbol is an ABI3 violation with the found visibility. The rationale is that the non-ABI3-tagged symbols which we know can be legal are all static inline, which means they should show up as local in a symbol table.
To the best of my knowledge, taken from derivative works on the Mach-O documentation found in Google Search. One nice side effect of these bit hacks is that we can address the TODO in the loop, because the external marker check is nothing more than seeing if the first bit in N_TYPE is set.
Also simplify the startswith() check for Python symbols.
Also, Windows symbol visibilities are all assumed global, because of the way Windows library exports work.
Downloads wheels from PyPI, caching them for local development, and audits them across platforms. For one version only, currently v42.0.7.
These are global, but do not have predetermined names - a shared object's entry point symbol is always named PyInit_$EXTNAME, with $EXTNAME being the C++ extension name. As such, we exclude them in a separate if-branch in the iteration over all symbols. Since there should only be one entry point per extension, we expect to find at maximum one match.
2d8863d
to
102cdba
Compare
Rebased. My pleasure, hope you had a great time! Happy to see this go through, the CPython upstream bit of work for this is now also on my plate, I hope to get to it soon. |
Thanks again @nicholasjng! If you'd like a release for this, I can do one today or tomorrow. Just let me know. |
This is the final step in selective auditing: Obtaining the symbol visibility info, and then deciding if this symbol is an ABI3 violation with the found visibility.
The rationale is that the non-ABI3-tagged symbols which we know can be legal are all static inline, which means they should show up as local in a symbol table.
Currently only Linux - I took the map values from this SO thread quoting
man elf
.Mach-O is harder. It gives me a bunch of binary values as ints, and I'm not sure how to interpret them. The returned type for
symbol
isNList64
, which is a Python version of this struct: https://llvm.org/doxygen/structllvm_1_1MachO_1_1nlist__64.htmlAccording to https://en.wikipedia.org/wiki/Mach-O#LINKEDIT_Symbol_table, the visibility should be found in the symbol type value (
n_type
) - I tested on thedockerfile
wheels mentioned in the abi3audit blog post, and got an example value of 14 (or0b1110
) for n_type, not sure what this means.This is very much work in progress.