Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort output of S3_File.list and Enso_File.list #11929

Merged
merged 25 commits into from
Jan 10, 2025
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -156,3 +156,11 @@ type Decomposed_S3_Path
if self.parts.is_empty then Nothing else
new_parts = self.parts.drop (..Last 1)
Decomposed_S3_Path.Value new_parts self.go_to_root

type S3_Path_Comparator
compare x y = Ordering.compare [x.bucket, x.key] [y.bucket, y.key]

hash x = S3_Path_Comparator.hash_builtin x
hash_builtin x = @Builtin_Method "Default_Comparator.hash_builtin"

Comparable.from that:S3_Path = Comparable.new that S3_Path_Comparator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Comparable.from that:S3_Path = Comparable.new that S3_Path_Comparator
## PRIVATE
Comparable.from that:S3_Path = Comparable.new that S3_Path_Comparator

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

;-)

It would be easier to change the IDE to not display from methods in the component browser than adding this ## PRIVATE at every conversion method occurrence.

Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ type S3_File
S3_File.Value (S3_Path.Value bucket key) self.credentials
files = pair.second . map key->
S3_File.Value (S3_Path.Value bucket key) self.credentials
sub_folders + files
sub_folders + files . sort on=.s3_path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this just sort files but not sub_folders?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we could just sort on .path and then we wouldn't need to implement the custom comparator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to not use a comparator? I was thinking of adding a comparator for S3_File as well -- if a user might have a vector of them, then they might want to sort it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it sorted the files and sub-folders properly:

s3://enso-data-samples/Store Data.xlsx
s3://enso-data-samples/data_2500_rows.csv
s3://enso-data-samples/examples/
s3://enso-data-samples/locations.json
s3://enso-data-samples/products.csv
s3://enso-data-samples/sample.xml
s3://enso-data-samples/spreadsheet.xls
s3://enso-data-samples/spreadsheet.xlsx
s3://enso-data-samples/tableau/
s3://enso-data-samples/transactions.csv

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still add parentheses, the precedence of + vs . is not something that everyone knows immediately, so I think making it clear with parentheses will make it better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to not use a comparator? I was thinking of adding a comparator for S3_File as well -- if a user might have a vector of them, then they might want to sort it.

Hmm, I guess we could do that yeah. But still I'd rely on comparing the path as Text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


## ALIAS load bytes, open bytes
ICON data_input
Expand Down
23 changes: 23 additions & 0 deletions distribution/lib/Standard/Base/0.0.0-dev/src/Data/Vector.enso
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import project.Data.Filter_Condition.Filter_Condition
import project.Data.Index_Sub_Range.Index_Sub_Range
import project.Data.List.List
import project.Data.Numbers.Integer
import project.Data.Ordering.Ordering
import project.Data.Pair.Pair
import project.Data.Range.Range
import project.Data.Sort_Direction.Sort_Direction
Expand All @@ -20,6 +21,7 @@ import project.Errors.Problem_Behavior.Problem_Behavior
import project.Errors.Wrapped_Error.Wrapped_Error
import project.Function.Function
import project.Internal.Array_Like_Helpers
import project.Internal.Ordering_Helpers.Default_Comparator
import project.Math
import project.Meta
import project.Nothing.Nothing
Expand Down Expand Up @@ -1563,3 +1565,24 @@ type No_Wrap

## PRIVATE
Wrapped_Error.from (that : Map_Error) = Wrapped_Error.Value that that.inner_error

type Vector_Comparator
compare x y =
min_length = x.length.min y.length
when_prefixes_equal =
## At this point, if the vectors are the same length, then they are
identical; otherwise, the shorter one is lesser.
if x.length == y.length then Ordering.Equal else
Ordering.compare x.length y.length
go i =
if i >= min_length then when_prefixes_equal else
x_elem = x.at i
y_elem = y.at i
Ordering.compare x_elem y_elem . and_then <|
@Tail_Call go (i + 1)
k = go 0
k

hash x = Default_Comparator.hash_builtin x

Comparable.from (that : Vector) = Comparable.new that Vector_Comparator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this just re-implements Vector_Lexicographic_Order.compare?

I think we did not register it into Comparable on purpose - comparing vectors is not unambiguously defined operation - there's multiple orderings that may seem right depending on context. So we did not want to provide any default one to avoid user confusion and require to choose one explicitly (we implement only lexicographic one but for example an (incomplate) ordering could be the point-wise order.

Of course we can decide that it is beneficial to include a default < for vectors. But I think the original decision was not wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside to making the comparable is that if a user accidentally nests vectors, and sorts them, they won't know they made a mistake. But it will be easy to see that the data is nested. Without a comparator, they'll get an error.

The upside is that they'll be able to sort vectors containing more things.

@JaroslavTulach You asked if modern languages allow this; Haskell does:

 main = do
   msp $ [1, 2] < [3, 4]
=> True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the Vector_Comparator since it's redundant and it's a separate question.

Original file line number Diff line number Diff line change
Expand Up @@ -444,10 +444,11 @@ type Enso_File
if self.is_directory.not then Error.throw (Illegal_Argument.Error "Cannot `list` a non-directory.") else
# Remove secrets from the list - they are handled separately in `Enso_Secret.list`.
assets = list_assets self . filter f-> f.asset_type != Enso_Asset_Type.Secret
assets.map asset->
results = assets.map asset->
file = Enso_File.Value (self.enso_path.resolve asset.name)
Asset_Cache.update file asset
file
results.sort on=.enso_path

## GROUP Output
ICON folder_add
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
private

import project.Data.Ordering.Comparable
import project.Data.Ordering.Ordering
import project.Data.Text.Text
import project.Data.Vector.Vector
import project.Enso_Cloud.Enso_File.Enso_File
import project.Enso_Cloud.Enso_User.Enso_User
import project.Error.Error
import project.Errors.Illegal_Argument.Illegal_Argument
import project.Errors.Unimplemented.Unimplemented
import project.Internal.Ordering_Helpers.Default_Comparator
import project.Internal.Path_Helpers
import project.Nothing.Nothing
from project.Data.Boolean import Boolean, False, True
Expand Down Expand Up @@ -72,3 +75,11 @@ normalize segments =
## We need to call normalize again, because technically a path `enso://a/../~/../../~` is a valid path
that points to the user home and it should be correctly normalized, but requires numerous passes to do so.
@Tail_Call normalize new_segments

type Enso_Path_Comparator
compare x y = Ordering.compare x.path_segments y.path_segments

hash x = Enso_Path_Comparator.hash_builtin x
hash_builtin x = @Builtin_Method "Default_Comparator.hash_builtin"

Comparable.from that:Enso_Path = Comparable.new that Enso_Path_Comparator
6 changes: 6 additions & 0 deletions test/AWS_Tests/src/S3_Spec.enso
Original file line number Diff line number Diff line change
Expand Up @@ -261,6 +261,12 @@ add_specs suite_builder =
r3.should_be_a Vector
r3.map .name . should_contain object_name

group_builder.specify "list should sort its output" <|
r = root.list
r.should_be_a Vector
r . should_equal (r.sort on=.s3_path)
r . should_equal (r.sort on=(x-> x.s3_path.to_text))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, .s3_path is private and implementation detail that should be a black box in the tests, use path that returns Text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to .path


group_builder.specify "will fail if no credentials are provided and no Default credentials are available" pending=(if AWS_Credential.is_default_credential_available then "Default AWS credentials are defined in the environment and this test has no way of overriding them, so it is impossible to test this scenario in such environment.") <|
root_without_credentials = S3_File.new "s3://"+bucket_name+"/"
r = root_without_credentials.list
Expand Down
38 changes: 38 additions & 0 deletions test/Base_Tests/src/Data/Vector_Spec.enso
Original file line number Diff line number Diff line change
Expand Up @@ -887,6 +887,44 @@ type_spec suite_builder name alter = suite_builder.group name group_builder->
expected = "abet".utf_8
input.sort . should_equal expected

group_builder.specify "should be comparable for sorting" <|
([1, 2] < [1, 3]) . should_be_true
([1, 2] > [1, 3]) . should_be_false
([1, 2] == [1, 3]) . should_be_false

([1, 3] < [1, 2]) . should_be_false
([1, 3] > [1, 2]) . should_be_true
([1, 3] == [1, 2]) . should_be_false

([1, 2] < [1, 2]) . should_be_false
([1, 2] > [1, 2]) . should_be_false
([1, 2] == [1, 2]) . should_be_true

Ordering.compare [1] [1, 2] . should_equal Ordering.Less
Ordering.compare [] [1, 2] . should_equal Ordering.Less

Ordering.compare [1, 2] [1] . should_equal Ordering.Greater
Ordering.compare [1, 2] [] . should_equal Ordering.Greater

Ordering.compare [1, 2] [1, 2] . should_equal Ordering.Equal

Ordering.compare [] [] . should_equal Ordering.Equal
Ordering.compare [] [1] . should_equal Ordering.Less
Ordering.compare [1] [] . should_equal Ordering.Greater

group_builder.specify "vectors of vectors can be sorted" <|
v = [[1, 2], [3, -4], [1, 1], [3, -5], [3, -2, 4], [3, -2], [1, -1], [1], [], [1, 0]]
expected = [[], [1], [1, -1], [1, 0], [1, 1], [1, 2], [3, -5], [3, -4], [3, -2], [3, -2, 4]]
v.sort . should_equal expected

v2 = [['b', 'c'], ['a', 'c'], ['a', 'b'], ['b', 'a'], ['b', 'b'], ['b', 'd'], ['d', 'b']]
expected2 = [['a', 'b'], ['a', 'c'], ['b', 'a'], ['b', 'b'], ['b', 'c'], ['b', 'd'], ['d', 'b']]
v2.sort . should_equal expected2

v3 = [[[1, 2], [4, 4]], [[1, 2], [3, 5]], [[1, 2], [3, 4]], [[], [3, 4]], [[1], [3, 4]], [[1, 3], [3, 4]], [[], [2, 4]], [[2, 3], [3, 4]], [[2], [3, 4]]]
expected3 = [[[], [2, 4]], [[], [3, 4]], [[1], [3, 4]], [[1, 2], [3, 4]], [[1, 2], [3, 5]], [[1, 2], [4, 4]], [[1, 3], [3, 4]], [[2], [3, 4]], [[2, 3], [3, 4]]]
v3.sort . should_equal expected3

group_builder.specify "should report only a limited number of warnings for incomparable values" <|
gen x = case (x % 10) of
0 -> Nothing
Expand Down
6 changes: 6 additions & 0 deletions test/Base_Tests/src/Network/Enso_Cloud/Enso_File_Spec.enso
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,12 @@ add_specs suite_builder setup:Cloud_Tests_Setup = suite_builder.group "Enso Clou
Data.list test_root.get . map .name . should_contain "test_file.json"
Data.list test_root.get.path . map .name . should_contain "test_file.json"

group_builder.specify "list should sort its output" <|
r = Enso_File.home.list
r.should_be_a Vector
r . should_equal (r.sort on=.path)
r . should_equal (r.sort on=.enso_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enso_path is private, I don't think the second check can work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.


group_builder.specify "should allow to create and delete a directory" <|
my_name = "my_test_dir-" + (Random.uuid.take 5)
my_dir = (test_root.get / my_name).create_directory
Expand Down
Loading