Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up dependencies indexing #243

Merged
merged 28 commits into from
Jun 9, 2021
Merged
Changes from 2 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
b48d0b9
attempt speed up layout
Remi-Gau May 29, 2021
802613a
fix
Remi-Gau May 29, 2021
ac8f466
remove unnecessary condition
Remi-Gau Jun 1, 2021
6c9949b
transfer file duplication into append to layout
Remi-Gau Jun 2, 2021
19b4e41
remove extra checks on indexing json files when using schema
Remi-Gau Jun 2, 2021
a8e4d37
fix group indexing and refactor
Remi-Gau Jun 2, 2021
8c5067f
update tests
Remi-Gau Jun 2, 2021
73c791c
pass verbose flag when copying
Remi-Gau Jun 1, 2021
674a726
update timing
Remi-Gau Jun 2, 2021
c47d2aa
Merge branch 'dev' into speed_up_dependencies_indexing
Remi-Gau Jun 2, 2021
77c72c2
improve regex to list files
Remi-Gau Jun 2, 2021
69dc2ac
simplify prefix parsing
Remi-Gau Jun 2, 2021
2752d6d
add test
Remi-Gau Jun 2, 2021
7d05d12
add warning to append to layout and refactor
Remi-Gau Jun 2, 2021
66a3032
update doc
Remi-Gau Jun 2, 2021
229ddd6
update tests
Remi-Gau Jun 2, 2021
cf1bdea
more general way to deal with MEG folder "file format"
Remi-Gau Jun 2, 2021
92371dc
make pipeline name obligatory when copying file
Remi-Gau Jun 2, 2021
e533d67
add comments
Remi-Gau Jun 2, 2021
0257a02
change default output folder for copy to derivatives
Remi-Gau Jun 2, 2021
e43f2e3
Merge branch 'improve_regex' into speed_up_dependencies_indexing
Remi-Gau Jun 2, 2021
09b2428
fix prefix parsing issues
Remi-Gau Jun 2, 2021
852f5e2
when possible gunzip directly instead of copy then gunzip
Remi-Gau Jun 2, 2021
e5d9cc2
failed attempt at fixing some octave warning
Remi-Gau Jun 2, 2021
08ad8ed
fix windows bug
Remi-Gau Jun 2, 2021
38f6718
remove attempt to copy with matlab if system copy fails
Remi-Gau Jun 9, 2021
b997e49
cover OS edge cases and update comments
Remi-Gau Jun 9, 2021
3c49dd6
update test bids query due to a change on the bids-example input
Remi-Gau Jun 9, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 57 additions & 24 deletions +bids/layout.m
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,15 @@

file_list = return_file_list(modality, subject, schema);

% dependency previous file
dep_prev_files = struct( ...
'index_group', 0, ...
'group_base', '', ...
'group_len', 1, ...
'index_data', 0, ...
'data_base', '', ...
'data_len', 1);

for iFile = 1:size(file_list, 1)

[subject, parsing] = bids.internal.append_to_layout(file_list{iFile}, ...
Remi-Gau marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -223,7 +232,11 @@

if ~isempty(parsing)

subject = index_dependencies(subject, modality, file_list{iFile});
[subject, dep_prev_files] = index_dependencies(subject, ...
modality, ...
file_list{iFile}, ...
iFile, ...
dep_prev_files);

switch subject.(modality)(end).suffix

Expand Down Expand Up @@ -346,7 +359,7 @@

end

function subject = index_dependencies(subject, modality, file)
function [subject, dep_prev_files] = index_dependencies(subject, modality, file, i, dep_prev_files)
%
% Each file structure contains dependencies sub-structure with guaranteed fields:
%
Expand All @@ -365,36 +378,56 @@
pth = fullfile(subject.path, modality);
fullpath_filename = fullfile(pth, file);

subject.(modality)(end).metafile = bids.internal.get_meta_list(fullpath_filename);
subject.(modality)(end).dependencies.explicit = {};
subject.(modality)(end).dependencies.data = {};
subject.(modality)(end).dependencies.group = {};
if ~isfield(subject.(modality)(end), 'dependencies') || ...
isempty(subject.(modality)(end).dependencies)
subject.(modality)(end).dependencies.explicit = {};
subject.(modality)(end).dependencies.data = {};
subject.(modality)(end).dependencies.group = {};
end
Remi-Gau marked this conversation as resolved.
Show resolved Hide resolved

ext = subject.(modality)(end).ext;
suffix = subject.(modality)(end).suffix;
pattern = strrep(file, ['_' suffix ext], '_[a-zA-Z0-9.]+$');
candidates = bids.internal.file_utils('List', pth, ['^' pattern '$']);
candidates = cellstr(candidates);
if strncmp(dep_prev_files.data_base, file, dep_prev_files.data_len)

for ii = 1:numel(candidates)
% subject.(modality)(end + 1, 1) = subject.(modality)(end, 1);
% subject.(modality)(end, 1).ext = file(dep_prev_files.data_len:end);
% subject.(modality)(end, 1).filename = file;

if strcmp(candidates{ii}, file)
continue
end
dep_fname = fullfile(pth, subject.(modality)(end - 1, 1).filename);
subject.(modality)(end).dependencies.data{end + 1} = dep_fname;
Remi-Gau marked this conversation as resolved.
Show resolved Hide resolved

if bids.internal.ends_with(candidates{ii}, '.json')
continue
end
else
subject.(modality)(end).metafile = bids.internal.get_meta_list(fullpath_filename);

match = regexp(candidates{ii}, ['_' suffix '\..*$'], 'match');
% different suffix
if isempty(match)
subject.(modality)(end).dependencies.group{end + 1, 1} = fullfile(pth, candidates{ii});
% same suffix
end

% Checking dependencies
if strncmp(dep_prev_files.group_base, file, dep_prev_files.group_len)
if strncmp(dep_prev_files.data_base, file, dep_prev_files.data_len)
% same data
for di = dep_prev_files.index_data:i - 1
subject.(modality)(di).dependencies.data{end + 1} = fullpath_filename;
end
else
subject.(modality)(end).dependencies.data{end + 1, 1} = fullfile(pth, candidates{ii});
% not same data but same group
dep_prev_files.index_data = i;
dep_prev_files.data_len = find(file == '.', 1);
dep_prev_files.data_base = file(1:dep_prev_files.data_len);

for gi = dep_prev_files.index_group:i
dep_fname = fullfile(pth, subject.(modality)(gi).filename);
subject.(modality)(end).dependencies.group{end + 1} = dep_fname;
subject.(modality)(gi).dependencies.group{end + 1} = fullpath_filename;
end
end

% new group
else
dep_prev_files.index_group = i;
dep_prev_files.group_len = find(file == '_', 1, 'last');
dep_prev_files.group_base = file(1:dep_prev_files.group_len);

dep_prev_files.index_data = i;
dep_prev_files.data_len = find(file == '.', 1);
dep_prev_files.data_base = file(1:dep_prev_files.data_len);
end

end
Expand Down