Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker pyscenic grn gives empty adjacency file[results] #200

Closed
SBata opened this issue Aug 13, 2020 · 1 comment
Closed

Docker pyscenic grn gives empty adjacency file[results] #200

SBata opened this issue Aug 13, 2020 · 1 comment
Labels
results Question about pySCENIC results

Comments

@SBata
Copy link

SBata commented Aug 13, 2020

hi there

I'm running pyscenic using docker and the info here (https://pyscenic.readthedocs.io/en/latest/installation.html#docker-and-singularity-images)

the code is the following

docker run -it --rm \
    -v /Users/myUser/scRNASeq/SCENIC_analysis/scenicdata:/scenicdata \
    aertslab/pyscenic:0.9.18 pyscenic grn \
        --num_workers 6 \
        -o /scenicdata/expr_mat.adjacencies.tsv \
        /scenicdata/expr_mat.csv \
        /scenicdata/mm_mgi_tfs.txt

the head of the input file, with the first 5 cols is:
head expr_mat.csv | awk -v FS="," '{print $1, $2, $3, $4, $5}'

cell_id Mrpl15 Lypla1 Tcea1 Atp6v1h
CTR_CC30_ATCCTATGTCCATCTC-1 0 0 2 0
CTR_CC30_TTGGTTTTCGGTGCAC-1 1 0 2 0
CTR_CC30_TCGCACTAGCAGGCAT-1 1 2 0 0
CTR_CC30_CACAACAGTGTTACTG-1 0 0 0 0
CTR_CC30_CCTGTTGAGGCATCTT-1 0 2 0 0
CTR_CC30_ATTACTCCACGGTAGA-1 0 0 1 0
CTR_CC30_AGTCATGAGCGCCTTG-1 1 0 0 1
CTR_CC30_GGAGAACCAGAGAATT-1 0 1 0 0
CTR_CC30_TTCCGTGGTTGTGGCC-1 0 0 0 0

and the TF matrix comes from here (https://github.com/aertslab/pySCENIC/tree/master/resources)

the problem I'm having is that it reads the file then it makes an empty expr_mat.adjacencies.tsv file and stops there.
I don't get any errors, and reading through the pyscenic.py code I can't figure out why that is.

EDIT:
I found out that the initial matrix was too big (16k cells x 19k genes).

I increased the memory and I reduced the expression matrix to 2k cells x 19k genes but now I have problems with dusk:

docker run -it --rm \
	--memory="2g" --memory-swap="4g" \
    -v /Users/myUser/scRNASeq/SCENIC_analysis/scenicdata:/scenicdata \
    aertslab/pyscenic:0.10.0 pyscenic grn \
        --num_workers 6 \
        -o /scenicdata/expr_mat.adjacencies.tsv \
        /scenicdata/expr_mat.csv \
        /scenicdata/mm_mgi_tfs.txt

which gives me:

/opt/venv/lib/python3.7/site-packages/dask/config.py:161: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  data = yaml.load(f.read()) or {}
preparing dask client
parsing input
/opt/venv/lib/python3.7/site-packages/arboreto/algo.py:214: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
  expression_matrix = expression_data.as_matrix()
creating dask graph
4 partitions
computing dask graph
distributed.nanny - WARNING - Worker process 28 was killed by unknown signal
distributed.nanny - WARNING - Worker process 26 was killed by unknown signal
distributed.nanny - WARNING - Worker process 30 was killed by unknown signal
distributed.nanny - WARNING - Worker process 32 was killed by unknown signal
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
not shutting down client, client was created externally
finished
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
/opt/venv/lib/python3.7/site-packages/dask/config.py:161: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  data = yaml.load(f.read()) or {}
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
distributed.nanny - WARNING - Worker process 114 was killed by signal 15
distributed.nanny - WARNING - Worker process 116 was killed by signal 15
distributed.nanny - WARNING - Worker process 118 was killed by signal 15
distributed.nanny - WARNING - Worker process 120 was killed by signal 15
Traceback (most recent call last):
  File "/opt/venv/lib/python3.7/site-packages/distributed/client.py", line 1648, in _gather
    exception = st.exception
AttributeError: exception

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/venv/bin/pyscenic", line 8, in <module>
    sys.exit(main())
  File "/opt/venv/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 420, in main
    args.func(args)
  File "/opt/venv/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 72, in find_adjacencies_command
    network = method(expression_data=ex_mtx, tf_names=tf_names, verbose=True, client_or_address=client, seed=args.seed)
  File "/opt/venv/lib/python3.7/site-packages/arboreto/algo.py", line 41, in grnboost2
    early_stop_window_length=early_stop_window_length, limit=limit, seed=seed, verbose=verbose)
  File "/opt/venv/lib/python3.7/site-packages/arboreto/algo.py", line 135, in diy
    .compute(graph, sync=True) \
  File "/opt/venv/lib/python3.7/site-packages/distributed/client.py", line 2758, in compute
    result = self.gather(futures)
  File "/opt/venv/lib/python3.7/site-packages/distributed/client.py", line 1822, in gather
    asynchronous=asynchronous,
  File "/opt/venv/lib/python3.7/site-packages/distributed/client.py", line 753, in sync
    return sync(self.loop, func, *args, **kwargs)
  File "/opt/venv/lib/python3.7/site-packages/distributed/utils.py", line 331, in sync
    six.reraise(*error[0])
  File "/opt/venv/lib/python3.7/site-packages/six.py", line 693, in reraise
    raise value
  File "/opt/venv/lib/python3.7/site-packages/distributed/utils.py", line 316, in f
    result[0] = yield future
  File "/opt/venv/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
    value = future.result()
  File "/opt/venv/lib/python3.7/site-packages/tornado/gen.py", line 742, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/opt/venv/lib/python3.7/site-packages/distributed/client.py", line 1651, in _gather
    six.reraise(CancelledError, CancelledError(key), None)
  File "/opt/venv/lib/python3.7/site-packages/six.py", line 693, in reraise
    raise value
concurrent.futures._base.CancelledError: finalize-74981cdf92a00aae1db9a1563fb8a420

I'm running Docker on my MacOS 10.14.6

any suggestions?

tnx

@SBata SBata added the results Question about pySCENIC results label Aug 13, 2020
@cflerin
Copy link
Contributor

cflerin commented Aug 17, 2020

Hi @SBata ,

I don't know about your original error if there was no error messages, but pySCENIC does create the empty output file when first starting the process, so that can at least be explained.

For your second issue, it looks like you're running into a memory limit again. I have never used the --memory="2g" --memory-swap="4g" options but I guess it just limits the memory/swap that Docker can use. But 2 GB of memory is certainly not enough for your full matrix. For reference, a matrix of 10k cells x 20k genes took ~2 hours to run using 20 workers on our HPC. Memory use peaked at 34 GB for the GRN step (so maybe 1.7 GB per worker). It also looks like you might be running this on a laptop, and it could take quite some time for this to run using "only" 6 workers.

But for a possible solution, I would use the latest pySCENIC image, and use the alternate GRN script (see #163 for more info). For example, with Docker:

docker run -it --rm \
    -v /Users/myUser/scRNASeq/SCENIC_analysis/scenicdata:/scenicdata \
    aertslab/pyscenic:0.10.3 pyscenic arboreto_with_multiprocessing.py \
    --num_workers 6 \
    -o /scenicdata/expr_mat.adjacencies.tsv \
    /scenicdata/expr_mat.csv \
    /scenicdata/mm_mgi_tfs.txt

@cflerin cflerin closed this as completed Aug 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
results Question about pySCENIC results
Projects
None yet
Development

No branches or pull requests

2 participants