Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partition vector algorithms test: move out lex compare family #5063

Merged
merged 1 commit into from
Nov 8, 2024

Conversation

AlexGuteniev
Copy link
Contributor

VSO_0000000_vector_algorithms tends to take longer with more algorithms vectorized.

Think we need to partition it.

Starting with lex compare family. They are taking about 30% of total run time,
Also the lex compare direction looks complete, and unlikely to have overlaps with other algorithms.

@AlexGuteniev AlexGuteniev requested a review from a team as a code owner November 3, 2024 14:04
@StephanTLavavej StephanTLavavej self-assigned this Nov 3, 2024
@StephanTLavavej StephanTLavavej added the test Related to test code label Nov 3, 2024
@StephanTLavavej
Copy link
Member

Would it be better to keep everything in a single file, but controlled via macros?

RUNALL_CROSSLIST
* PM_CL="/DTEST_INPUT"
* PM_CL="/DTEST_FORWARD"
* PM_CL="/DTEST_BIDIRECTIONAL"
* PM_CL="/DTEST_RANDOM"

With the variant_msvc tests, I permanently split them into a separate file because they're developed totally separately from the LLVM-derived tests. With the vector algorithms tests, I think keeping them in a single file might be better, because that way we can easily re-group how much we run each time.

@AlexGuteniev
Copy link
Contributor Author

Would it be better to keep everything in a single file, but controlled via macros?

Is there an easy way to select a group using some extra parameter python tests\utils\stl-lit\stl-lit.py ..\..\tests\std\tests\VSO_0000000_vector_algorithms -v command?

@AlexGuteniev
Copy link
Contributor Author

I'm not yet concerned about run time of a single configuration. The total run time seems too long.

@AlexGuteniev
Copy link
Contributor Author

Also, partitioning algorithms by algorithm type seems more natural than partitioning views by iterator type.
Algorithms implementations are unrelated (not going to split where they are related).

@StephanTLavavej
Copy link
Member

We could use tags (like with ASAN), but since the algorithms are unrelated and there's no commonality between the test support machinery, having a separate test does make sense. Let's keep this as-is, thanks!

@AlexGuteniev
Copy link
Contributor Author

there's no commonality between the test support machinery

There is no new commonality extracted during this separation, as all the commonality, which is mostly the randomness initialization, was previously extracted in #4734 into /tests/std/include/test_vector_algorithms_support.hpp to support separate floating minmax testing.

This specific commonality, I think, is not an indication that these tests should be together.
Instead, it seems to be something potentially reusable in more separate tests.
<charconv> test mentioned in #933 is a candidate:

void initialize_randomness(mt19937_64& mt64, const int argc, char** const argv) {
constexpr size_t n = mt19937_64::state_size;
constexpr size_t w = mt19937_64::word_size;
static_assert(w % 32 == 0);
constexpr size_t k = w / 32;
vector<uint32_t> vec(n * k);
puts("USAGE:");
puts("test.exe : generates seed data from random_device.");
puts("test.exe filename.txt : loads seed data from a given text file.");
if (argc == 1) {
random_device rd;
generate(vec.begin(), vec.end(), ref(rd));
puts("Generated seed data.");
} else if (argc == 2) {
const char* const filename = argv[1];
ifstream file(filename);
if (!file) {
printf("ERROR: Can't open %s.\n", filename);
abort();
}
for (auto& elem : vec) {
file >> elem;
if (!file) {
printf("ERROR: Can't read seed data from %s.\n", filename);
abort();
}
}
printf("Loaded seed data from %s.\n", filename);
} else {
puts("ERROR: Too many command-line arguments.");
abort();
}
puts("SEED DATA:");
for (const auto& elem : vec) {
printf("%u ", elem);
}
printf("\n");
seed_seq seq(vec.cbegin(), vec.cend());
mt64.seed(seq);
puts("Successfully seeded mt64. First three values:");
for (int i = 0; i < 3; ++i) {
// libc++ uses long for 64-bit values.
printf("0x%016llX\n", static_cast<unsigned long long>(mt64()));
}
}

@AlexGuteniev
Copy link
Contributor Author

AlexGuteniev commented Nov 3, 2024

Actually I'm floating the separation idea with this PR.

When thinking of search_n I concluded that comprehensive enough test would have a cubic run time: O(H*N*F) where H is haystack length, N is needle length and F is the frequency of a match, will gradually increment all three. And it has no relationship to anything else. So I think it deserves the whole separate .cpp

@StephanTLavavej
Copy link
Member

When thinking of search_n I concluded that comprehensive enough test would have a cubic run time

If a comprehensive test would have a long run time, then I'd recommend having an off-by-default "EXHAUSTIVE" mode that allows the whole space to be manually run, but having automated runs select a randomized subset of cases.

Expensive tests are problematic both from a machine resourcing and timeout perspective, so we need to be careful.

@StephanTLavavej StephanTLavavej removed their assignment Nov 7, 2024
@StephanTLavavej StephanTLavavej self-assigned this Nov 7, 2024
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit 2e5c251 into microsoft:main Nov 8, 2024
39 checks passed
@StephanTLavavej
Copy link
Member

Thanks for keeping an eye on this ever-growing test! 🐣 🐤 🐔

@AlexGuteniev AlexGuteniev deleted the partition branch November 8, 2024 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Related to test code
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants