Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Weighted correlations #508

Open
wants to merge 10 commits into
base: develop
Choose a base branch
from
Open

Conversation

calum-chamberlain
Copy link
Member

What does this PR do?

Adds the ability to weight cross-correlations. Weights are currently expected to be in tr.stats.extra.weight for all stream correlation functions. This PR should be considering in tandem with #506

Why was it initiated? Any relevant Issues?

#445

PR Checklist

  • develop base branch selected?
  • This PR is not directly related to an existing issue (which has no PR yet).
  • All tests still pass.
  • Any new features or fixed regressions are be covered via new tests.
  • Any new or changed features have are fully documented.
  • Significant changes have been added to CHANGES.md.
    - [ ] First time contributors have added your name to CONTRIBUTORS.md.

@flixha
Copy link
Collaborator

flixha commented Aug 8, 2022

@calum-chamberlain From the current commits, it reads to me that you are right now intending to give one weight to each template, rather than weight each trace in the template during the correlation - is that right? Or were you also intending to let traces in each template be weighted, e.g. like fast_matched_filter can do?
Maybe it's too early to ask because it's still work in progress; I did a few tests with trace-weights in FMF and I think it would be promising to define trace weights based on the SNR of each trace in the template.

@calum-chamberlain
Copy link
Member Author

I definitely intend to weight individual channels of individual templates in the same way that fast_matched_filter can. This is supported by these commits (I think!).

The array correlation functions that take multiple templates and one "stream" work on single-channel data, so only one weight per template is passed. The weights are extracted from the templates in the _get_array_dicts function here and are in the same shape as the pads (one for each trace of each template).

There are a lot of small changes to the correlation code to support this, and I might keep this PR separate from another PR that would actually open up the interface to make templates with weights.

@flixha
Copy link
Collaborator

flixha commented Jan 5, 2023

Right now it's possible to set trace weights greater 1, and then one can run into an AssertionError when the detect_val is greater than the number of channels in

assert abs(self.detect_val) <= self.no_chans

To solve that, here are some options:

  1. do not allow weights to be larger than 1 / smaller than -1
  2. allow detect_val to become greater than number of channels.
  3. for each template, normalize weights according to the number of nonzero weights (i.e., weights for real channels, excluding nan-channels etc.).

All options have an effect on the detection results with threshold_type='absolute' and 'av_chan_corr' that may be a bit hard to follow / interpret. With MAD-threshold the results should be independent of the normalization, except for very small weights below the stability threshold that matters in some correlation backends. I prefer option 3 because I think that the absolute detection values are still somewhat interpretable compared compared to the other options.

Code to normalize weights according to number of nonzero weights in eqcorrscan.utils.correlate._get_array_dicts:

    # Normalize weights for each template so that CC sum stays smaller than
    # number of channels (else: AssertionError in Detection).
    weights = np.array([template_weights
                        for seed_id, template_weights in weight_dict.items()])
    weights = weights * (np.count_nonzero(weights, axis=0, keepdims=True) /
                         weights.sum(axis=0, keepdims=True))
    # weights = np.float32(weights)  # Required for FFTW backend
    Logger.info('Setting weights from trace-stats, minimum weight: %s, '
                'maximum weight %s, total sum of weights: %s',
                np.min(weights), np.max(weights), np.sum(weights))
    # Update weights in weight-dict with normalized values:
    for seed_id, normalized_template_weights in zip(seed_ids, weights):
        weight_dict.update({seed_id: normalized_template_weights})```

@calum-chamberlain
Copy link
Member Author

Good point @flixha - I agree that interpreting the weighted absolute and av_chan correlation results will be less intuitive. For that reason I prefer to enforce that the user set weights between +/-1 rather than us normalize them for them. I can imagine the case where someone thinks they know what they are doing and sets a threshold and weights carefully which we then normalize resulting in unexpected results.

If we do enforce that then we should check the weights as soon as possible to avoid wasting people's time (e.g. raise an error as soon as possible, and maybe suggest some normalized weights that they can paste into their code).

@calum-chamberlain
Copy link
Member Author

TODO:

  • Add weights check - provide option to accept normalised weights from command line (e.g. accept_weights = input("Weights outside +/-1 bound, use normalised weights? [Y/n]"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants