Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DaCe + Likwid #475

Open
lukastruemper opened this issue Jul 25, 2022 · 3 comments
Open

DaCe + Likwid #475

lukastruemper opened this issue Jul 25, 2022 · 3 comments

Comments

@lukastruemper
Copy link

lukastruemper commented Jul 25, 2022

Hello,

I just wanted to let you know that we integrated Likwid into the codegen of our parallel programing framework DaCe. This means that users can now instrument their DaCe programs by just setting a flag on the SDFG (our intermediate representation). We're working on a new docu for DaCe, but you might want to check out our sample.

The whole integration is still experimental and was just merged into main. We would be very happy to receive comments/feedback. Thanks for your awesome tool!

Cheers,
Lukas

@TomTheBear
Copy link
Member

Hi Lukas,

thanks for the great news!

  • Why InstrumentationType.LIKWID_Counters and not just InstrumentationType.LIKWID?
  • Can there be a more handy way to get the list of supported groups for the platform? Looking up the list in the LIKWID repo might not be enough for a system. It might provide more or less groups. Also own custom groups ($HOME/.likwid/groups/) are not listed there.
  • Are multiple groups possible in LIKWID_EVENTS env variable? (just for curiosity, no need to support that!)
  • Manual access to the report seems quite complicated. How do I know these (sdfg_id, state_id, node_id) in a more complicated code? Is "state_0_0_-1" not a derivative of (sdfg_id, state_id, node_id)?
  • Since you work with groups (the default case), I would show how to get the list of metrics out of the report. First a list of available metrics and secondly how to access one metric for a single core and all cores.
  • Does the report look like the default LIKWID MarkerAPI tables or is it an own format? If it is an own format, I would show some excerpt in the comments.
  • Do OMP_NUM_THREADS need to be set "from the outside"? Is there no way to specify parallelism inside the code? Can I set LIKWID_EVENTS in the code or is it read at startup?

Thanks for your efforts to integrate LIKWID in such a important framework.

Best regards,
Thomas

@lukastruemper
Copy link
Author

Thanks for the feedback!

Why InstrumentationType.LIKWID_Counters and not just InstrumentationType.LIKWID?

We already had a flag PAPI_Counters, so we kept it consistent. But we're considering changing it to make it cleaner.

Can there be a more handy way to get the list of supported groups for the platform? Looking up the list in the LIKWID repo might not be enough for a system. It might provide more or less groups. Also own custom groups ($HOME/.likwid/groups/) are not listed there.

Good point, I'll add a method to retrieve the list of the available groups from python.

Are multiple groups possible in LIKWID_EVENTS env variable? (just for curiosity, no need to support that!)

We're currently just passing the variable on to Likwid (as in the internal-markerAPI example). According to my tests, it doesn't support this right now

Since you work with groups (the default case), I would show how to get the list of metrics out of the report. First a list of available metrics and secondly how to access one metric for a single core and all cores.

This is indeed the most critical feature that we want to support. I guess, since we only support a single group right now, it is not too come complex (the currently active group is the only one measured). Would be cool to see how we can get this through perfmon calls.

Does the report look like the default LIKWID MarkerAPI tables or is it an own format? If it is an own format, I would show some excerpt in the comments.

Good point, I will add a figure to the docu that we're currently creating and some minimal excerpt to the sample.

Do OMP_NUM_THREADS need to be set "from the outside"? Is there no way to specify parallelism inside the code? Can I set LIKWID_EVENTS in the code or is it read at startup?

LIKWID_EVENTS is read at the code generation (sdfg.compile in the python code), so you can actually set it in python before calling .compile. We're currently converting the maps in our intermediate representation to loops with OMP pragmas, but we can only set the number of threads globally with OMP_NUM_THREADS. We're working on supporting more dynamic schemas in the future.

Whenever you run a DaCe program that calls .compile, you can inspect the generated code in the .dacecache/src/*.cpp file of your current working directory.

@TomTheBear
Copy link
Member

Are multiple groups possible in LIKWID_EVENTS env variable? (just for curiosity, no need to support that!)

We're currently just passing the variable on to Likwid (as in the internal-markerAPI example). According to my tests, it doesn't support this right now

The internal-markerAPI example contains '|' between groups/eventsets. So it's already possible to specify them but DaCe has to switch between them.

Since you work with groups (the default case), I would show how to get the list of metrics out of the report. First a list of available metrics and secondly how to access one metric for a single core and all cores.

This is indeed the most critical feature that we want to support. I guess, since we only support a single group right now, it is not too come complex (the currently active group is the only one measured). Would be cool to see how we can get this through perfmon calls.

After you read in the MarkerAPI file, you can use the common functions:

err = perfmon_readMarkerFile(getenv("LIKWID_FILEPATH"));
for (t = 0; t < NUM_THREADS; t++)  {
  for (i = 0; i < perfmon_getNumberOfRegions(); i++) {
    int gid = perfmon_getGroupOfRegion(i);
    for (k = 0; k < perfmon_getNumberOfMetrics(gid); k++)  {
       char* metric_name = perfmon_getMetricName(gid, k);
       double result = metric_value = perfmon_getMetricOfRegionThread(i, k, t);
    }
  }
}

see https://github.com/RRZE-HPC/likwid/blob/master/examples/C-internalMarkerAPI.c#L436

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants