Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--bootstrap_impl=script breaks pkg_tar, bazel-lib tar and py_binary with py_package + py_wheel #2489

Open
ewianda opened this issue Dec 9, 2024 · 14 comments

Comments

@ewianda
Copy link
Contributor

ewianda commented Dec 9, 2024

🐞 bug report

Affected Rule

py_package + py_wheel

Is this a regression?

Yes, related commit #2409

Description

Using the resulting py_binary with py_package+py_wheel fails to find the interpreter symlink. This also breaks pkg_tar from rules_pkg

🔬 Minimal Reproduction

diff --git a/examples/wheel/BUILD.bazel b/examples/wheel/BUILD.bazel
index 58a43015..f06f6ee8 100644
--- a/examples/wheel/BUILD.bazel
+++ b/examples/wheel/BUILD.bazel
@@ -17,6 +17,7 @@ load("@bazel_skylib//rules:write_file.bzl", "write_file")
 load("//examples/wheel/private:wheel_utils.bzl", "directory_writer", "make_variable_tags")
 load("//python:packaging.bzl", "py_package", "py_wheel")
 load("//python:pip.bzl", "compile_pip_requirements")
 load("//examples/wheel/private:wheel_utils.bzl", "directory_writer", "make_variable_tags")
 load("//python:packaging.bzl", "py_package", "py_wheel")
 load("//python:pip.bzl", "compile_pip_requirements")
+load("//python:py_binary.bzl", "py_binary")
 load("//python:py_library.bzl", "py_library")
 load("//python:py_test.bzl", "py_test")
 load("//python:versions.bzl", "gen_python_config_settings")
@@ -414,3 +415,29 @@ py_console_script_binary(
     pkg = "@pypiserver//pypiserver",
     script = "pypi-server",
 )
+
+py_binary(
+    name = "main_binary",
+    srcs = ["main.py"],
+    main = "main.py",
+    deps = [
+        "//examples/wheel/lib:module_with_data",
+        "//examples/wheel/lib:simple_module",
+    ],
+)
+
+py_package(
+    name = "binary_package",
+    # Only include these Python packages.
+    packages = ["examples"],
+    deps = [":main_binary"],
+)
+
+py_wheel(
+    name = "minimal_binary",
+    distribution = "requires_files",
+    version = "0.1.0",
+    deps = [
+        ":binary_package",
+    ],
+)

🔥 Exception or Error

    runpy.run_path(main_filename, run_name="__main__")
  File "<frozen runpy>", line 291, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "/home/ewianda/.cache/bazel/_bazel_ewianda/87bd9802a0763f5c88728552ef326f75/sandbox/linux-sandbox/14/execroot/_main/bazel-out/k8-opt-exec-ST-d57f47055a04/bin/tools/wheelmaker.runfiles/_main/tools/wheelmaker.py", line 631, in <module>
    main()
  File "/home/ewianda/.cache/bazel/_bazel_ewianda/87bd9802a0763f5c88728552ef326f75/sandbox/linux-sandbox/14/execroot/_main/bazel-out/k8-opt-exec-ST-d57f47055a04/bin/tools/wheelmaker.runfiles/_main/tools/wheelmaker.py", line 541, in main
    maker.add_file(package_filename, real_filename)
  File "/home/ewianda/.cache/bazel/_bazel_ewianda/87bd9802a0763f5c88728552ef326f75/sandbox/linux-sandbox/14/execroot/_main/bazel-out/k8-opt-exec-ST-d57f47055a04/bin/tools/wheelmaker.runfiles/_main/tools/wheelmaker.py", line 304, in add_file
    self._whlfile.add_file(package_filename, real_filename)
  File "/home/ewianda/.cache/bazel/_bazel_ewianda/87bd9802a0763f5c88728552ef326f75/sandbox/linux-sandbox/14/execroot/_main/bazel-out/k8-opt-exec-ST-d57f47055a04/bin/tools/wheelmaker.runfiles/_main/tools/wheelmaker.py", line 156, in add_file
    with open(real_filename, "rb") as fsrc:
         ^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'bazel-out/k8-fastbuild/bin/examples/wheel/_main_binary.venv/bin/python3'
Target //examples/wheel:minimal_binary failed to build
@rickeylev
Copy link
Collaborator

I think this is somewhat WAI. A plain py_binary can't really be given as something for py_package to process, and the plain py_binary isn't meant to be redistributable.

The first order problem is that zip doesn't support symlinks. There's some extensions to allow it to store them, though. However, that might be tricky because most of files it is given are going to be symlinks to something else, so we'd need some way to tell "dereference these symlinks, but not these symlinks". Maybe by reading the symlink and see if its non-absolute? IDK.

If you really want to package the whole binary, then you're probably better off packaging the zip file version of the binary. That has special code in its startup to handle the case of coming from a zip file that couldn't store symlinks.

Why do you want to pass a py_binary to py_package?

@aignas
Copy link
Collaborator

aignas commented Dec 10, 2024

Is this also affecting the tar rule from the aspect for usage in rules_oci?

@chowder
Copy link
Contributor

chowder commented Dec 10, 2024

Yeah it's the same problem.

The problem as I see it, is that the venv is created both in the runfiles directory, but also in the directory containing the runfiles directory (is there a name for this?).

Only the former is required, the latter isn't functional (broken symlink) and never gets invoked anyway; stage-1 bootstrap runs the former.

I didn't see an easy way to fix this - usually I would've tried to use runfiles symlinks if I wanted to create a symlink only inside the runfiles directory, but this doesn't let you create arbitrary relative symlinks.

Personally I see this as a deficiency of the related tar rules, but rickylev's already explained why this is a bit tricky.

@ewianda
Copy link
Contributor Author

ewianda commented Dec 10, 2024

Is this also affecting the tar rule from the aspect for usage in rules_oci?

I don't know, It is affecting pkg_tar . Should be related to bazelbuild/rules_pkg#115

@ewianda
Copy link
Contributor Author

ewianda commented Dec 10, 2024

Why do you want to pass a py_binary to py_package?

This is a very old usecase for us where we build wheels for Apache-beam application for Dataflow jobs

@aignas
Copy link
Collaborator

aignas commented Dec 10, 2024

I personally think the py_binary -> py_package -> py_wheel is definitely not the intended usage. It might be better to have py_library -> py_package -> py_wheel + extra entry_points.txt METADATA so that you get a console script when you install a package.

Do I understand correctly that the py_package will pull all of the third party deps and bundle them into the wheel built by bazel?

As for pkg_tar maybe there is also an easy way to tell it to create a particular symlink needed for stage2 bootstrap...


That said, I think the current py_binary behaviour with the --bootstrap=script is something that we have to have for #2161 to work... unless we fully solve #2156 and do our own venvs as part of creating py_binary. Are there other better options?

@ewianda
Copy link
Contributor Author

ewianda commented Dec 10, 2024

@aignas just tested with tar from aspect precisely https://github.com/aspect-build/bazel-examples/tree/main/oci_python_image

ianda/00aeeab5df0e41ab9e87aa07d33c6a73/sandbox/linux-sandbox/15/execroot/_main -w /tmp -S /home/ewianda/.cache/bazel/_bazel_ewianda/00aeeab5df0e41ab9e87aa07d33c6a73/sandbox/linux-sandbox/15/stats.out -D 

/home/ewianda/.cache/bazel/_bazel_ewianda/00aeeab5df0e41ab9e87aa07d33c6a73/sandbox/linux-sandbox/15/debug.out -- external/aspect_bazel_lib~~toolchains~bsd_tar_linux_amd64/tar --create --file bazel-out/k8-fastbuild/bin/benchsci/buildtools/containers/py_image_example.interpreter_layer.tar @bazel-out/k8-fastbuild/bin/benchsci/buildtools/containers/py_image_example.interpreter_tar_manifest.spec)

tar: Error reading archive bazel-out/k8-fastbuild/bin/benchsci/buildtools/containers/py_image_example.interpreter_tar_manifest.spec: Can't open bazel-out/k8-fastbuild/bin/benchsci/buildtools/containers/_archive.venv/bin/python3

@ewianda
Copy link
Contributor Author

ewianda commented Dec 10, 2024

Do I understand correctly that the py_package will pull all of the third party deps and bundle them into the wheel built by bazel?

I think py_package filters which first-party source code goes into the wheel. I think third party deps have to be specified manually.

@ewianda
Copy link
Contributor Author

ewianda commented Jan 9, 2025

If you really want to package the whole binary, then you're probably better off packaging the zip file version of the binary. That has special code in its startup to handle the case of coming from a zip file that couldn't store symlinks.

Is there any downside to containerizing the zip file, then?

OR

can we maintain a function similar to create_zip_file for creating tar files that can be consumed by rules_oci ?

@groodt
Copy link
Collaborator

groodt commented Jan 9, 2025

I think it would help if you explained your use-case a bit more thoroughly. This sounds like a deficiency in the workflow or the way that Apache Beam is consuming runtime code for execution.

Building a "fat wheel" (like a "fat jar" in Java) isn't the remit of py_package and py_wheel rules. The closest thing to a "fat jar" is the Python zip support via output groups. But it's unclear to me if this is supported by the target runtime (Dataflow?) that you're using.

@ewianda
Copy link
Contributor Author

ewianda commented Jan 9, 2025

I think it would help if you explained your use-case a bit more thoroughly. This sounds like a deficiency in the workflow or the way that Apache Beam is consuming runtime code for execution.

Building a "fat wheel" (like a "fat jar" in Java) isn't the remit of py_package and py_wheel rules. The closest thing to a "fat jar" is the Python zip support via output groups. But it's unclear to me if this is supported by the target runtime (Dataflow?) that you're using.

Sorry @groodt , I was referring to building docker images and not wheels since --bootstrap_impl=script is broken for both pkg_tar and tar from bazel-lib

We have migrated most of our dataflow jobs to docker images, just a couple still use wheels, so that is less of a problem.

@rickeylev
Copy link
Collaborator

can we maintain a function similar to create_zip_file for creating tar files that can be consumed by rules_oci ?

Probably? Ultimately, I'd like to remove all the zip stuff from py_binary itself:

  • Replace --build_python_zip with py_zipapp_binary. It creates a self-executable, zipapp-based, zip file based upon an input py_binary.
  • Replace the python_zip_file output group with py_zipapp. It creates a zipapp-based zip file, but isn't self-executable, based upon an input py_binary.

Presumably, if one can create e.g. a zip from a py_binary, then using a different format, e.g. tar, would be fairly simple.

To clarify, though -- the output is a zipapp-based thing. That is not quite the same as just putting all of a py_binary into a tar file (or equiv). The former means deriving a slightly different "runnable thing" from the py_binary. The latter means simply putting all the py_binary files into a tar file (essentially tar bazel-bin/foo.runfiles).

If you want the latter, then you're probably better off using e.g. rules_pkg, since that has various facilities to tar up an arbitrary binary and its runfiles.

@ewianda
Copy link
Contributor Author

ewianda commented Jan 9, 2025

I came to this line of thinking becuase python_zip_file maintains the absolute path to the Python interpreter, not the symlink created in the .venv. My understanding so far is that it is not trivial ( or impossible given the current state) for any of the tar tools to know and create the correct symlink. I guess one option is to manipulate mtree created in the the case of bazel-lib tar since that might be easier compared to modifying or patching rules_pkg

Unfortunately I cannot use --bootstrap=system because of Argument List too long error.

@rickeylev
Copy link
Collaborator

python_zip_file has the absolute path to the interpreter

I think that's just incidental. The zip file doesn't contain the venv bin/python symlinks at all. Instead they get created after the zip file is extracted. I don't remember doing anything special when creating those symlinks, so yeah, they probably have an absolute path stored in them. It'd be fine to have them write a relative path instead. It should work the same (that is how the non-zip based invocations work)

non-trivial/impossible for tar to know the correct symlink

Ah, hm, yeah. Because the input is the bazel-bin symlink forest, so you can't distinguish a "real" symlink from a "convenient" symlink?

Actually, maybe File.is_symlink would allow solving that. Basically, that can tell us which files are supposed to be symlinks. So then it's just a matter of telling the tool to not dereference paths where File.is_symlink is true.

Looking at the CLI of tar, this looks rather tedious, but possible. I think what you'd have to do is one invocation with --dereference (pass it all files for which is_symlink=False), then a second invocation with --no-dereference (pass it all files for which is_symlink=True).

@ewianda ewianda changed the title --bootstrap_impl=script breaks py_binary with py_package + py_wheel --bootstrap_impl=script breaks pkg_tar, bazel-lib tar and py_binary with py_package + py_wheel Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants