Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] What is the difference between probabilities output from result.get_probabilities() and result.get_total_causal_effects()? #117

Open
lhl881210 opened this issue Nov 28, 2023 · 6 comments

Comments

@lhl881210
Copy link

Hi, I am a beginner.
I'm not quite sure the difference between probabilities output from result.get_probabilities() and result.get_total_causal_effects() after bootstrap, i.e., result = model.bootstrap(data, n_sampling=1000).
I would appreciate it if you could give me more info.

@sshimizu2006
Copy link
Collaborator

result.get_probabilities() gives the bootstrap probabilities of whether direct effects are non-zero (directed edges exist).
result.get_total_causal_effects() give the bootstrap probabilities of whether total effects are non-zero (directed paths exist).

@lhl881210
Copy link
Author

Shimizu-Sense
Thank you for your answer.
I have a follow question.
From my humble knowledge, the total_causal_effects of a path needs to be based on a specific DAE before it can be calculated.
However, since Bootstrap outputs multiple DAEs, which DAE is total_causal_effects based on?
Thanks.

@sshimizu2006
Copy link
Collaborator

sshimizu2006 commented Nov 30, 2023

Hi, those total effects in the bootstrap outputs are the medians over the bootstrap samples. You can find all the bootstrap results here: https://lingam.readthedocs.io/en/latest/reference/bootstrap.html

@lhl881210
Copy link
Author

Thank you very much for your reply.

I have another question.
I have a set of data from a questionnaire, 6 questions, they are discrete variables, collected using the 5 point likert scale.
Also I have 3 types of behavioral data, such as time spent on task, they are continuous variables.
I want to do the causal discovery for these 6 discrete variables and 3 continuous variables.

I'm wondering if it's appropriate to use DirectLiNGAM for this kind of data.

Because I know that the original LiNGAM as well as ICA-LiNGAM require the data to be continuous variables. But in your Tutorial of DirectLiNGAM, the requirement for continuous variables is removed.
https://lingam.readthedocs.io/en/latest/tutorial/lingam.html

Thanks again for your help.

@sshimizu2006
Copy link
Collaborator

if your discrete variables are collected using 5 point likert scale, it would be ok to use DirectLiNGAM thinking they are approximately continuous.

DirectLiNGAM assume variables are continuous. Error variables are continuous. Their liner sums, i.e., observed variables, are also continuous.

@lhl881210
Copy link
Author

Thank you so much for your quick reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants