Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About implementing specific custom primitives #32

Open
breakds opened this issue Sep 24, 2023 · 7 comments
Open

About implementing specific custom primitives #32

breakds opened this issue Sep 24, 2023 · 7 comments
Labels

Comments

@breakds
Copy link

breakds commented Sep 24, 2023

Sorry about using GitHub issues for asking questions again. After switching to cpp20 branch I was able to successfully run examples. Thanks!

Now I would like to discuss on whether (and how) the following can be achieved. The dataset that I having is actually a time series which is ordered. This means that theoretically primitives such as "shift 1" and "shift -1" or "rolling mean" are valid primitives that can batch operate on each of the original variables and intermediate variables. After briefly reading the code, mainly functions.hpp, I have a few questions on how to implement the above:

  1. It seems that there is an upper limit on the number of primitives we can have because NodeType is 32 bit integer. If I would like to add more primitives, should I extend this to uint64_t?
  2. It seems that each time when the primitive in function.cpp is called, it is called on a batch of a variable or intermediate. Is it possible to make it call across the whole dataset (across the data point dimension, which in the time series case is the time dimension)?
  3. If the above two can be resolved, I think I can probably come up with a solution. Is there any other approach that you would recommend?

Thank you!

@foolnotion
Copy link
Member

foolnotion commented Sep 24, 2023

Hi, yes this is theoretically possible. This is what the Dynamic node type is supposed to do. My plan is to eventually get rid of all the hard-coded function types and only rely on functions registered at runtime.

The idea is to define a custom function and register it in the dispatch table. Here is an example:

The atomic potentials code relies on an older version of Operon but you should be able to make it work with the latest cpp20 head.

If you run into issues or find bugs please let me know.

@breakds
Copy link
Author

breakds commented Sep 24, 2023

Thanks a lot for the detailed explanation and links to the files. I am now understanding the code better. Let me try to implement the idea and update. Appreciate the prompt response!

@breakds
Copy link
Author

breakds commented Oct 2, 2023

I was slowly learning the concepts, and there is a few more questions if you don't mind.

  1. I do not understand the if branch at here. What does it mean by symbolic == true or symbolic == false? How is this related to the template argument to be int or Scalar?
  2. There are a few options for TreeCreator. Is there a rule of thumb to pick from those candidates before digging into the implementation?
  3. On c++20 branch, In order to create an Interpreter, it requires having a dispatch table, a dataset and a tree. I am not sure what the tree that I should supply to it. My current vague understanding is that trees are "symbolic formulas" generated as candidates for evaluation during the solving phase of the algorithm. And because of this (probably wrong) understanding, I found it confusing how and why I should provide a tree to Interpreter construction, which happens before the algorithm starts running.

Thanks a lot! Still, sorry if some of the questions seems dumb, I didn't have time to full go through all the detailed code yet.

@foolnotion
Copy link
Member

I do not understand the if branch at here. What does it mean by symbolic == true or symbolic == false? How is this related to the template argument to be int or Scalar?

The symbolic boolean flag was meant to configure the algorithm in a certain way as to promote "nice" models (formulas):

  • only integer coefficients (during the run and during initialization)
  • mutation operator configured to only support integer values
  • nonlinear least squares coefficient tuning disabled

There are a few options for TreeCreator. Is there a rule of thumb to pick from those candidates before digging into the implementation?

In general I've noticed that the choice of creator does not make a difference in algorithm performance. I would recommend using the BalancedTreeCreator which imho is a better version of PTC2. It may also be beneficial to limit max tree size during initialization to a smaller limit (5-15 nodes). Keep the max tree size during the run to a larger value.

On c++20 branch, In order to create an Interpreter, it requires having a dispatch table, a dataset and a tree. I am not sure what the tree that I should supply to it.

Yes, this was a big change from before, in the interest of making it easier to program the entire tree evaluation / optimization infrastructure and integration with likelihoods.

The tree is kept in the Genotype property of the Individual https://github.com/heal-research/operon/blob/cpp20/include/operon/core/individual.hpp#L18

So normally you'd want to use an interpreter in a context where you already have an individual, so then you'd pass individual.Genotype to the interpreter.

Similar to here: https://github.com/heal-research/operon/blob/cpp20/source/operators/evaluator.cpp#L196

@breakds
Copy link
Author

breakds commented Oct 2, 2023

Thank you for the explanation! I now understand why using int for symbolic case and more about the tree creator!

One more question about Interpreter if you don't mind.

I am actually creating the Interpreter before having anything yet. This is because (I might be wrong) to create the algorithm instance (e.g. NSGA2), it seems that the following need to be constructed:

InterpreterErrorEvaluatorGeneratorNSGA2

If ErrorEvaluator is going to be able to evaluate all sorts of trees, which specific tree do I need to construct to provide to the Interpreter? This is at the stage that the algorithm is yet to be constructed - does that mean I just create an arbitrary tree by hand?

Thanks!

@foolnotion
Copy link
Member

Hi,

I am actually creating the Interpreter before having anything yet. This is because (I might be wrong) to create the algorithm instance (e.g. NSGA2), it seems that the following need to be constructed:

Normally you shouldn't need to initialize the interpreter yourself.

The flow should be:
DispatchTable ⇨ ErrorEvaluator ⇨ Generator ⇨ NSGA2

The specific type of interpreter can be passed as a template parameter to the DispatchTable.

If ErrorEvaluator is going to be able to evaluate all sorts of trees, which specific tree do I need to construct to provide to the Interpreter?

The interpreter will know how to evaluate any kind of tree (or, more accurately, any type of node inside the tree) by querying the dispatch table for the appropriate function primitive. The interpreter is meant to be a lightweight cheap object initialized on the spot whenever a tree needs to be interpreted (so you'd construct an interpreter within an evaluator context when you already have a tree). You do not need to construct an interpreter manually before the algorithm.

If you show me your code I can assist more.

Copy link

github-actions bot commented Nov 9, 2023

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants