Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Brain Invaders datasets #283

Merged
merged 4 commits into from
Apr 6, 2022
Merged

Conversation

sylvchev
Copy link
Member

Adding new ERP datasets from GIPSA-lab.

The datasets were almost supported, as @plcrodrigues wrote repos based on MOABB. Currently, they are not up to date and require an old MNE version. This is a blocking issue in pyriemann-qiskit.

@jsosulski All datasets have been through a sanity check: you could see that @plcrodrigues have plotted evoked potentials and made basic classification tests.

I added all P300 datasets but py.ALPHA.EEG.2017-GIPSA (only alpha waves), py.PHMDML.EEG.2017-GIPSA (music listening), py.VR.EEG.2018-GIPSA (recording during VR session) that are not P300 datasets.

@sylvchev sylvchev self-assigned this Mar 29, 2022
@sylvchev sylvchev added the dataset Supporting new dataset label Mar 29, 2022
@jsosulski
Copy link
Collaborator

Great that there are already some plots. If this is merged, I would still just run my scripts to create plots similar to the ones for the other datasets and to make sure that in the process of adding these to MOABB there is no regression or something.

OT: Do we want to "officially" host these plots somwhere? Currently my webserver is still working, but at some point I will probably run into traffic limitations 😅

@gcattan
Copy link
Contributor

gcattan commented Mar 29, 2022

Thank you for your help @sylvchev :) Do you think we could also integrate the other datasets in MOABB in the future?

@toncho11
Copy link
Contributor

Thank you for your work!

@jsosulski where are the sanity checks (code and plots)?


from moabb.datasets import download as dl
from moabb.datasets.base import BaseDataset


BI2013a_URL = "https://zenodo.org/record/1494240/files/"
BI2012a_URL = "https://zenodo.org/record/2649069/files/"
BI2013a_URL = "https://zenodo.org/record/2669187/files/"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am currently running into page not found issues on zenodo (I think thats why CI is currently failing as well) so I cant check, but: is there a difference in bi2013a between the old and the new URLs? If so we probably need to bump minor version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The files are available on https://zenodo.org/record/1494240/ and https://zenodo.org/record/2649069/
The old link was v2, the new link is v7. The difference is the storage format: it was gdf and it is now csv + mat. The data are the same. I will make a version bump for to include these datasets.

chtypes = ["eeg"] * 17 + ["stim"]
X = loadmat(file_path)[condition].T
S = X[1:18, :]
stim = (X[18, :] + X[19, :])[None, :]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not necessary for this first merge to make it work with current binary P300, however if there is stimulus information available, we could keep it to be able to classify the letter / stimulus x,y location in the Braininvaders case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this is a good thing that the letter information are available in the data. We could use this for P300-speller classification.

@sylvchev
Copy link
Member Author

OT: Do we want to "officially" host these plots somwhere? Currently my webserver is still working, but at some point I will probably run into traffic limitations 😅

There ebrains.eu but it seems complex to host data. OSF might be a better choice: https://help.osf.io/article/386-project-storage, there is up to 50 Gb of data for public project it seems

@jsosulski
Copy link
Collaborator

jsosulski commented Mar 29, 2022

@toncho11 the code is here. Note that this script will download ALL available P300 datasets in MOABB and create the plots which takes a) very long and b) a lot of disk space.

Currently I hosted the plots for all P300 datasets in MOABB here: http://public.jan-sosulski.de/moabb_sanity/master.html
In the long term, I would like to make that hosting more interactive / link to relevant sections directly from dataset documentation.
But again, this hosting can (and probably will soon) go offline as my webserver does not support a high load of traffic.

@sylvchev
Copy link
Member Author

Do you think we could also integrate the other datasets in MOABB in the future?

Yes, if we have the corresponding paradigms, that is why I could not add them right away.

@jsosulski
Copy link
Collaborator

jsosulski commented Apr 2, 2022

I just tried out to load the dry EEG data from your branch @sylvchev and noticed that the data needs to be scaled by 1-e6, as it is stored in uV and mne expects V. Probably worth checking how the other paradigms store the data.

@gcattan
Copy link
Contributor

gcattan commented Apr 6, 2022

Hi,
Just realized that [py.VR.EEG.2018-GIPSA](https://github.com/plcrodrigues/py.VR.EEG.2018-GIPSA) is also a P300 paradigm. But seems that there is already a lot of work in this PR ^^'

@toncho11
Copy link
Contributor

toncho11 commented Apr 6, 2022

I just tested bi2012 and it works.

@sylvchev
Copy link
Member Author

sylvchev commented Apr 6, 2022

Hi, Just realized that [py.VR.EEG.2018-GIPSA](https://github.com/plcrodrigues/py.VR.EEG.2018-GIPSA) is also a P300 paradigm. But seems that there is already a lot of work in this PR ^^'

Yes, I have seen that, but in Pedro's repo the data are not formatted like the other. So I'm leaving it for another PR.

@sylvchev
Copy link
Member Author

sylvchev commented Apr 6, 2022

thanks for your remark @jsosulski, several datasets were not stored in the correct format. I checked all dataset and scaled down with uncorrected values.
The code is updated, LGTM.

@sylvchev sylvchev merged commit f310573 into NeuroTechX:develop Apr 6, 2022
@sylvchev sylvchev deleted the bi_datasets branch January 3, 2023 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset Supporting new dataset
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants