-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New lock file format #1209
Comments
This WIP PR conda/conda-lock#106 has a bunch of handy things that could also go into metadata |
This lockfile spec should have a version and ideally a reference to some standard jsonschema representation of the structure.
|
Ooh, this is very exciting!!! My thoughts regarding the lockfile metadata are that I'd like my lockfiles to be self-documenting. For instance, I'd like them to know how they were created, for example with which command. I'd also like to be able to add "comments" as explanation for colleagues. That way by looking at the lockfile, it'll be obvious what it is, where it came from, and how to update it. I think it's useful to be able to choose which fields to include or exclude. For instance, some people may find it useful to include the timestamp and the username, but others might find the timestamp annoying with git, or might not want to leak their username. I have also realized that the feature I'd really like is to be able to stick my "non-explicit" dependencies in the metadata so that I can run a command like I was not bold enough to propose a new format, so I have been working with a header consisting of commented yaml. Obviously a proper YAML format like this would be better. I have things mostly implemented; I'm just working on a system for versioning the metadata generation process so that it can be extended or modified in the future. I wanted to finish it this weekend but I ended up not having enough time. |
Yeah, we could make this format a super set of the existing YAML environment files. So that you could have
then some micromamba command could automatically update the lockfiles + certain metadata keys. |
EDIT: Or was it |
hmm, I don't think micromamba would complain about extra keys. With mamba, we're just using |
For a normal environment file it seems to produce a warning.
For an explicit lockfile, I'm getting:
|
Yeah, in explicit lockfiles you need to use a comment |
@maresb yeah, none of this exists yet, we're designing what we want from a lockfile format for conda. |
Here is a summary of what I came up with in my PR: conda-lock-metadata:
about: This lockfile was generated by conda-lock to ensure reproducibility.
comment: |-
Run the following command to update this project's dependencies.
command: conda-lock -f environment.yml --metadata=all
command_with_path: /root/conda/envs/conda-lock/bin/conda-lock -f environment.yml --metadata=all
conda_lock_version: 0.11.3.dev0+gf2ba8d4.d20210904
created_by: root
input_hash: f15a045753a401da73dd7c1693fd031e0ad41c0b4c9ca8545c0a8ab56c21d16c
platform: win-64
timestamp: 2021-09-05 23:43:18+02:00
metadata_version: v1
dependencies:
- mamba
- conda-lock On the command line, you should specify |
Thinking a bit through some of the human consumable parts for this would we want something like this instead metadata:
spec: explicit-1.0
description: ... # optional
channels:
- conda-forge
name: myenv
packages:
# probably will be alphabetically ordered
xyz:
linux-64:
version: 0.15.0
resolved: https://conda.anaconda.org/conda-forge/linux-64/xyz-0.15.0-had123.tar.bz2
sha256: 123123123123123sjadalkjdlkajsk
signature: ... ? # we need to also have certain metadata to validate signatures, though
osx-64:
version: 0.15.0
resolved: https://conda.anaconda.org/conda-forge/osx-64/xyz-0.15.0-had123.tar.bz2
sha256: 123123123123123sjadalkjdlkajsk
pip:
linux-64:
version: 1.15.0
resolved: https://conda.anaconda.org/conda-forge/noarch/pip-1.15.0-had123.tar.bz2
sha256: 123123123123123sjadalkjdlkajsk
abc:
osx-64:
version: 0.16.0
resolved: https://conda.anaconda.org/conda-forge/osx-64/abc-0.16.0-had123.tar.bz2
sha256: 123jk1lk23j1kl2j3kj12k3jlj1lk2
install_order:
linux-64:
- xyz
- pip
osx-64:
- xyz
- abc Noarch packages will still be repeated per platform since there may be a platform specific variant of something that is usually noarch. By grouping the packages together we make it easier to review updates to lockfiles as when you relock you can ensure that all the versions you expect to move, move. |
I would also like to make it a superset of a regular yaml environment file, by the way. |
you could also consider blake3 instead of sha256, I think it's much faster (parallel/multithreaded). |
I don't think the hashing speed matters so much here. One nice thing about sha256 is that we can directly pull from a OCI registry with it. |
I'm -1 on this. Lockfiles are generated by machines. The sources are generated by humans. When both a human and a machine edit the same file you're asking for trouble. |
@mariusvniekerk I know it's a bit dangerous, and I've been debating this point with myself for a while.
The fundamental problem that I'm trying to solve is as follows: I'm working on some project, generate a lockfile for it, but then I forget to document how I generated the lockfile. I move on to something else. Some weeks/months later, I return to the project. I'd like to update the lockfile, but I forgot the exact Put in a different way, a lockfile is supposed to guarantee reproducibility of the environment. I think it would be great if the lockfile could also guarantee its own |
Oh i'm 100% for stuffing as much metadata into the lockfile as possible for reproducibility, but i do not want the ability to accidentally use a lockfile for something its not for. Basically every language community around that has lockfiles as a core concept makes the output of the locking process as a separate file with its own dedicated format (cargo, go.mod, yarn, etc) |
I'd like to put the environment file into the lockfile's metadata. As soon as this has been done, it becomes extremely tempting to edit that and/or use that copy of the environment file as a new basis for generating the lockfile. Where do we draw the line? Do we say that we can include a copy of the environment file, but we refuse to acknowledge that copy as machine-readable? |
All for dumping the source files into the metadata for the lock. It can even be machine readable. But once you have that be editable by a human bad things will happen. |
I'm not sure I understand your -1 then... Let's say we define our new lockfile format which includes the source Unless we somehow provide some deterrent, people will then naturally delete the original You say that bad things will now happen. What specifically, data corruption? How can we prevent/discourage those bad things? |
What happens is that users will just edit the user-editable part of the lock file and not update it. At that point the lock is entirely a lie. |
Thanks! Now I understand. One potential mitigation for this problem could be to include a checksum based on the dependencies from which the lock was generated. Any program which installs a lockfile should verify this checksum. In case it doesn't match, scream "These dependencies are a lie!!!" and refuse to do anything until the lockfile is updated. This would require the cooperation of any program which can install a lockfile. The programs I'm aware of are Conda, Mamba, Micromamba, and conda-lock. Among the two of you, we have pretty good coverage in here! 🤣 |
I've just started to work on a |
I found my way here via a hint from @wolfv at PackagingCon, and would also like to see a richer, structured, multi-platform lock file. Here are some more fields that would be useful to include for each package, mostly to support extensions to To support optional subsets of packages that need to be mutually compatible, but that you may not want to install in some contexts (e.g. installing dev dependencies in CI, but only required dependencies in production):
(This is mostly relevant in the context of requirements parsed out of a To support
(In pip mode, the url would point to a wheel or sdist rather than a conda package) |
I think the format should be human-readable. To me, one of the key requirements for the lock file format should be that it's diffable. Lock files tend to become difficult to grasp but resolving conflicts on them should still be possible. Cargo went through a similar process I think we can learn from that! |
That's really good point, thanks for pointing to the Cargo discussion! FYI, here's an example of (and model for) what we settled on for
Are there any other considerations we missed? |
@jvansanten @mariusvniekerk one small nitpick I have would be that maybe instead of Or alternatively it could be Or
|
@jvansanten Yes thanks! Quick side question: Does |
From this comment in the source code, what I've gleaned is that:
What I still don't understand is what happens when other packages are not compatible with the newly updated package. I see a few things that could be done:
I don't think there's an obvious best way to do this. It's a hard problem that has definitely been grappled with before, and there are some pretty convoluted solutions. For example, Ruby's package manager Bundler updates to the latest if the spec isn't compatible OR there are no transitive dependencies. I see |
My understanding of @maresb's request for including the source The original If the user was able to update the lockfile from the original requested spec ( |
You could then also allow adding new packages to the requested deps or specifying other constraints when updating the lockfile - e.g.
|
What is the status of this feature? #1577 seems to have implemented them but my attempts to use
result in
Micromamba 0.25 |
@shughes-uk I think the file has to end with |
Mamba hits me with
micromamba hits me with
|
Here's the lockfile (renamed conda-lock.txt from conda-lock.lock so github would let me upload it) |
Should've taken a proper look before -- |
I was using |
What platform are you on? E.g. I tried your lockfile on a M1 mac, and nothing got done (becuase there are no packages in the lockfile for this platform). |
Ahhh that would do it. Thank you!! Perhaps a nice fix here would have the lack of a relevant platform section be an error code instead? |
I've been trying to make serious use of the new lock file format lately. I've encountered various issues with Micromamba's implementation, all of which I've reported. I think I've also discovered a logical oversight in the specification itself, namely with the category field: category: str = "main" Problem: This construction requires that each package belongs to a single category, but packages should be able to belong to multiple categories. (For instance, I want Suggestion: Convert this to categories: list[str] = ["main"] Background: As a refresher, we can have multiple environment files, for instance Let's suppose I'm developing a containerized app. I have a devcontainer where I install both As a workaround, I have to remove Question: Does my suggestion make sense, or am I somehow thinking about this in the wrong way? Thanks! |
One way to solve for dev-only, prod-only, and both-dev-and-prod would be to have 3 different categories. But that will require 2^n categories, generally speaking. I agree that it would be nice to be able to attach multiple categories. |
|
I also agree that multiple categories would be handy. All the dependency solvers I'm aware of demand that the total solution is self-consistent, i.e. if you install all packages, you should get no conflicts. The same should be true of any subset, and there's no reason those subsets need to be disjoint, other than the fact that some solvers like poetry treat them that way. |
Definitely. Luckily there's https://github.com/conda-incubator/conda-lock/blob/7d9bd6d67d59fdd30d92a2ace3ee344aafa6a2b1/conda_lock/src_parser/lockfile.py#L79 |
Thanks @jvansanten, I just noticed that. (I was looking in the |
I was rather surprised that my lock file needed to end with
A better error message or more intelligent parsing of the yaml file to detect it is a lock file would be useful. |
@wholtz, there have also been complaints about this on the conda-lock side (conda/conda-lock#280). Probably the strategy there will be to first attempt to parse as yaml (for new style) and on failure fall back to parsing as explicit ( = old style). |
I was surprised recently in the opposite respect: when I try to install from a standard environment YAML formatted "lock file" (without hashes) generated with
I work around this by renaming the file, but this was still surprising to not be able to install from a file following standard environment YAML format if it has a certain naming pattern. |
It would be great to have a new lockfile format.
The current conda lockfile format (explicit env format) has quite a bunch of shortcomings: it's a weird ad-hoc format and only supports MD5 sums (and not even by default, I think & SHA256 is much better).
The command to export an explicit environment in conda is
conda list --explicit [--md5]
Micromamba already improves on this by changing the command to
micromamba env export --explicit [--no-md5]
(ie. it uses theenv
subcommand and defaults to add--md5
hashes).I am thinking it would be nice to replace this with a proper YAML based format.
I am proposing something like:
The explicit packages would contain a list per (supported) subdir. The list would be the full env resolution (including noarch pacakges) and in the correct order for installation (as current lockfiles today).
The text was updated successfully, but these errors were encountered: