Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grnboost2 TypeError: Must supply at least one delayed object #42

Open
anna4kaa opened this issue Oct 8, 2024 · 3 comments
Open

grnboost2 TypeError: Must supply at least one delayed object #42

anna4kaa opened this issue Oct 8, 2024 · 3 comments

Comments

@anna4kaa
Copy link

anna4kaa commented Oct 8, 2024

Hi!

GRNBoost2 produces an error at the very last step. The same happens when I use GENIE3. It seems to be a problem with Dask, however, I could not figure out what is going on.

The code:

import os
import pandas as pd
from distributed import Client, LocalCluster
from arboreto.algo import grnboost2, genie3
from arboreto.utils import load_tf_names

in_file= '/Users/annasve/Desktop/data/transcriptomics/output/PyWGCNA/NBC_00001/log_tpm.csv'
tf_file = '/Users/annasve/Desktop/data/transcriptomics/output/arboreto/output/NBC_00001/tf_list.csv'

ex_matrix = pd.read_csv(in_file, index_col = 0)
tf_names = load_tf_names(tf_file)

network = grnboost2(expression_data=ex_matrix, tf_names=tf_names, verbose = True)

The error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[15], line 9
      6 tf_names = load_tf_names(tf_file)
      8 # Run GRNBoost2 with explicitly provided gene_names and tf_names
----> 9 network = grnboost2(expression_data=ex_matrix, tf_names=tf_names, verbose = True)
     11 network.to_csv(out_file)

File ~/anaconda3/envs/arboreto/lib/python3.11/site-packages/arboreto/algo.py:39, in grnboost2(expression_data, gene_names, tf_names, client_or_address, early_stop_window_length, limit, seed, verbose)
     10 def grnboost2(expression_data,
     11               gene_names=None,
     12               tf_names='all',
   (...)
     16               seed=None,
     17               verbose=False):
     18     """
     19     Launch arboreto with [GRNBoost2] profile.
     20 
   (...)
     36     :return: a pandas DataFrame['TF', 'target', 'importance'] representing the inferred gene regulatory links.
     37     """
---> 39     return diy(expression_data=expression_data, regressor_type='GBM', regressor_kwargs=SGBM_KWARGS,
     40                gene_names=gene_names, tf_names=tf_names, client_or_address=client_or_address,
     41                early_stop_window_length=early_stop_window_length, limit=limit, seed=seed, verbose=verbose)

File ~/anaconda3/envs/arboreto/lib/python3.11/site-packages/arboreto/algo.py:120, in diy(expression_data, regressor_type, regressor_kwargs, gene_names, tf_names, client_or_address, early_stop_window_length, limit, seed, verbose)
    117 if verbose:
    118     print('creating dask graph')
--> 120 graph = create_graph(expression_matrix,
    121                      gene_names,
    122                      tf_names,
    123                      client=client,
    124                      regressor_type=regressor_type,
    125                      regressor_kwargs=regressor_kwargs,
    126                      early_stop_window_length=early_stop_window_length,
    127                      limit=limit,
    128                      seed=seed)
    130 if verbose:
    131     print('{} partitions'.format(graph.npartitions))

File ~/anaconda3/envs/arboreto/lib/python3.11/site-packages/arboreto/core.py:450, in create_graph(expression_matrix, gene_names, tf_names, regressor_type, regressor_kwargs, client, target_genes, limit, include_meta, early_stop_window_length, repartition_multiplier, seed)
    448 # gather the DataFrames into one distributed DataFrame
    449 all_links_df = from_delayed(delayed_link_dfs, meta=_GRN_SCHEMA)
--> 450 all_meta_df = from_delayed(delayed_meta_dfs, meta=_META_SCHEMA)
    452 # optionally limit the number of resulting regulatory links, descending by top importance
    453 if limit:

File ~/anaconda3/envs/arboreto/lib/python3.11/site-packages/dask_expr/io/_delayed.py:115, in from_delayed(dfs, meta, divisions, prefix, verify_meta)
    112     dfs = [dfs]
    114 if len(dfs) == 0:
--> 115     raise TypeError("Must supply at least one delayed object")
    117 if meta is None:
    118     meta = delayed(make_meta)(dfs[0]).compute()

TypeError: Must supply at least one delayed object
@nsapoval
Copy link

Hi,

I ran into the same issue recently while trying to run grnboost2 from a Python 3.12 conda environment with the default versions of dask and distributed.

I found a thread with the same bug on pySCENIC GitHub Issues: #561. It appears that this is caused by some recent (?) changes in dask/distributed packages. The proposed fix in the thread suggests installing the following versions: dask-expr==0.5.3 distributed==2024.2.1. I tried doing so in a Python 3.12 environment, but that led to the same error as you have reported in the pySCENIC thread.

I have then tried to rebuild the environment with Python 3.10.15 and dask-expr==0.5.3 distributed==2024.2.1. This change resulted in the code running properly to completion. Hopefully this can be of help to other users who encounter the same issue.

tl;dr: Python 3.10.15 + dask-expr==0.5.3 distributed==2024.2.1 works fine, newer versions of Python, dask, distributed lead to the bug above.

@Yajie-Rong
Copy link

Hi,

I ran into the same issue recently while trying to run grnboost2 from a Python 3.12 conda environment with the default versions of dask and distributed.

I found a thread with the same bug on pySCENIC GitHub Issues: #561. It appears that this is caused by some recent (?) changes in dask/distributed packages. The proposed fix in the thread suggests installing the following versions: dask-expr==0.5.3 distributed==2024.2.1. I tried doing so in a Python 3.12 environment, but that led to the same error as you have reported in the pySCENIC thread.

I have then tried to rebuild the environment with Python 3.10.15 and dask-expr==0.5.3 distributed==2024.2.1. This change resulted in the code running properly to completion. Hopefully this can be of help to other users who encounter the same issue.

tl;dr: Python 3.10.15 + dask-expr==0.5.3 distributed==2024.2.1 works fine, newer versions of Python, dask, distributed lead to the bug above.

I also encountered the same problem. According to your suggestion, I changed from 3.12.3 to 3.10.15, but I found that the same error still occurred. Would it be different using conda to create an environment (conda create -n pyscenic python=3.10.15) instead of deleting python 3.12.3 and reinstalling Python 3.10.15?

@ruoyeruolan
Copy link

This error is caused by the source code create_graph of arboreto.core and have been fixed in github. However, it has not been updated into 0.1.6. You can modify the code in the package 0.1.6 like this: aertslab/pySCENIC#592 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants