Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Want to get coverage for 3rd party dependencies' code used by my project. #1759

Open
amaranthjinn opened this issue Mar 27, 2024 · 10 comments
Open
Labels
support A support question from a user

Comments

@amaranthjinn
Copy link

@nedbat

Question: What do I need to do to allow coverage to run against the 3rd party dependencies installed in the .virtualenv's site packages directory?
I see that 3rd party dependency coverage is removed at 0285af9. However, I want to run coverage against my project code and see what lines are triggered in the 3rd party dependencies' code.
The usage I'm imagining is:
I have some app code in my project/src directory, A.py, which imports 3rd party dependencies B and C. B and C are deployed to the virtual env's site packages directory (outside of the project directory).
A_test.py is written to exercise A.py (A_test.py in the same directory as A.py). Run coverage against A_test.py can show the % of code (and the lines of code ) triggered in B and C.
I see -L is used to toggle coverage for stdlib, is there some similar way to toggle coverage for the 3rd party dependencies?

I have the project using Poetry to manage its dependencies.
The project structure:
home/xiaojin/Workspace/p
├── project.toml
├── requirements.txt
├── src
|---- p
├── poetry.lock
...

and virtual environment's site packages directory at:
home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages
|---- numpy
|----matlib.py
...
My project uses a number of 3rd party dependencies, which are deployed to the .virtualenvs's site-packages.

I ran coverage against some pytest in the project folder, but seems like the coverage only covers src and opt directory (by design it seems, https://nedbatchelder.com/blog/202104/coveragepy_and_thirdparty_code.html, btw, thank you for the prompt response).

from the python virtual env, I ran a coverage command in home/xiaojin/Workspace/p folder:
coverage run -m pytest /usr/.../Workspace/p/xxj_test.py
where xxj_test simply imports numpy, to test if the coverage can run against 3rd party dependencies in the .virtualenv directory (where numpy is installed).
However, the coverage returned includes only src, opt directories, and files under the project p directory.

I tried with -L, stdlib dependencies are added.

I tried with --include 'venv/*', got error:
===================================================== test session starts =====================================================
platform linux -- Python 3.11.7, pytest-7.1.1, pluggy-1.0.0
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
PyQt5 5.14.2 -- Qt runtime 5.14.2 -- Qt compiled 5.14.2
rootdir: /usr/local/home/xiaojin/Workspace/p, configfile: project.toml
plugins: benchmark-3.4.1, xdist-2.5.0, forked-1.4.0, pyfakefs-4.3.3, typeguard-2.13.3, dash-2.14.2, anyio-3.5.0, cov-3.0.0, qt-4.0.2, flaky-3.7.0, timeout-2.1.0, asyncio-0.19.0
timeout: 450.0s
timeout method: signal
timeout func_only: False
asyncio: mode=Mode.STRICT
collected 0 items

==================================================== no tests ran in 0.20s ====================================================
/usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/coverage/control.py:793: CoverageWarning: No data was collected. (no-data-collected)
self._warn("No data was collected.", slug="no-data-collected")

Other tries:

  1. ran coverage command from within the .virtualenv directory, no difference.
  2. ran coverage debug sys:

-- sys -------------------------------------------------------
coverage_version: 6.3.2
coverage_module: /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/coverage/init.py
tracer: -none-
CTracer: unavailable
plugins.file_tracers: -none-
plugins.configurers: -none-
plugins.context_switchers: -none-
configs_attempted: .coveragerc
setup.cfg
tox.ini
pyproject.toml
configs_read: -none-
config_file: None
config_contents: -none-
data_file: -none-
python: 3.11.7 (main, Jan 14 2024, 00:38:57) [GCC 10.2.1 20210110]
platform: Linux-6.6.13-1rodete3-amd64-x86_64-with-glibc2.37
implementation: CPython
executable: /usr/local/../xiaojin/.virtualenvs/p/bin/python3
def_encoding: utf-8
fs_encoding: utf-8
pid: 986639
cwd: /usr/local/../xiaojin/Workspace/p/users/xiaojin
path: /usr/local/../xiaojin/.virtualenvs/p/bin
/usr/local/../xiaojin/Workspace/p/src
/usr/local/buildtools/current/sitecustomize
/opt/python3.11.7/lib/python311.zip
/opt/python3.11.7/lib/python3.11
/opt/python3.11.7/lib/python3.11/lib-dynload
/usr/local/../xiaojin/.virtualenvs/p/lib/python3.11/site-packages
environment: HOME = /usr/local/home/xiaojin
PYTHONPATH = /usr/local/buildtools/current/sitecustomize
VIRTUALENVWRAPPER_PYTHON = /usr/bin/python3
command_line: /usr/local/../xiaojin/.virtualenvs/p/bin/coverage debug sys
sqlite3_version: 2.6.0
sqlite3_sqlite_version: 3.45.0
sqlite3_temp_store: 0
sqlite3_compile_options: ATOMIC_INTRINSICS=1; COMPILER=gcc-13.2.0; DEFAULT_AUTOVACUUM
DEFAULT_CACHE_SIZE=-2000; DEFAULT_FILE_FORMAT=4; DEFAULT_JOURNAL_SIZE_LIMIT=-1
DEFAULT_MMAP_SIZE=0; DEFAULT_PAGE_SIZE=4096; DEFAULT_PCACHE_INITSZ=20
DEFAULT_RECURSIVE_TRIGGERS; DEFAULT_SECTOR_SIZE=4096; DEFAULT_SYNCHRONOUS=2
DEFAULT_WAL_AUTOCHECKPOINT=1000; DEFAULT_WAL_SYNCHRONOUS=2; DEFAULT_WORKER_THREADS=0
DIRECT_OVERFLOW_READ; ENABLE_COLUMN_METADATA; ENABLE_DBSTAT_VTAB
ENABLE_FTS3; ENABLE_FTS3_PARENTHESIS; ENABLE_FTS3_TOKENIZER
ENABLE_FTS4; ENABLE_FTS5; ENABLE_LOAD_EXTENSION
ENABLE_MATH_FUNCTIONS; ENABLE_PREUPDATE_HOOK; ENABLE_RTREE
ENABLE_SESSION; ENABLE_STMTVTAB; ENABLE_UNLOCK_NOTIFY
ENABLE_UPDATE_DELETE_LIMIT; HAVE_ISNAN; LIKE_DOESNT_MATCH_BLOBS
MALLOC_SOFT_LIMIT=1024; MAX_ATTACHED=10; MAX_COLUMN=2000
MAX_COMPOUND_SELECT=500; MAX_DEFAULT_PAGE_SIZE=32768; MAX_EXPR_DEPTH=1000
MAX_FUNCTION_ARG=127; MAX_LENGTH=1000000000; MAX_LIKE_PATTERN_LENGTH=50000
MAX_MMAP_SIZE=0x7fff0000; MAX_PAGE_COUNT=0xfffffffe; MAX_PAGE_SIZE=65536
MAX_SCHEMA_RETRY=25; MAX_SQL_LENGTH=1000000000; MAX_TRIGGER_DEPTH=1000
MAX_VARIABLE_NUMBER=250000; MAX_VDBE_OP=250000000; MAX_WORKER_THREADS=8
MUTEX_PTHREADS; SECURE_DELETE; SOUNDEX
SYSTEM_MALLOC; TEMP_STORE=1; THREADSAFE=1
USE_URI

  1. python -m coverage run -m pytest /usr/local/home/xiaojin/Workspace/p/xxj_test.py --debug=trace

writing pytest debug information to trace
================================================================================================================================ test session starts ================================================================================================================================
platform linux -- Python 3.11.7, pytest-7.1.1, pluggy-1.0.0 -- /usr/local/home/xiaojin/.virtualenvs/p/bin/python
using: pytest-7.1.1
setuptools registered plugins:
pytest-benchmark-3.4.1 at /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/pytest_benchmark/plugin.py
pytest-xdist-2.5.0 at /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/xdist/plugin.py
pytest-xdist-2.5.0 at /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/xdist/looponfail.py
pytest-forked-1.4.0 at /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/pytest_forked/init.py
pyfakefs-4.3.3 at /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/pyfakefs/pytest_plugin.py
typeguard-2.13.3 at /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/typeguard/pytest_plugin.py
dash-2.14.2 at /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/dash/testing/plugin.py
anyio-3.5.0 at /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/anyio/pytest_plugin.py
pytest-cov-3.0.0 at /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/pytest_cov/plugin.py
pytest-qt-4.0.2 at /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/pytestqt/plugin.py
flaky-3.7.0 at /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/flaky/flaky_pytest_plugin.py
pytest-timeout-2.1.0 at /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/pytest_timeout.py
pytest-asyncio-0.19.0 at /usr/local/home/xiaojin/.virtualenvs/p/lib/python3.11/site-packages/pytest_asyncio/plugin.py
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
PyQt5 5.14.2 -- Qt runtime 5.14.2 -- Qt compiled 5.14.2
rootdir: /usr/local/home/xiaojin/Workspace/p, configfile: pyproject.toml
plugins: benchmark-3.4.1, xdist-2.5.0, forked-1.4.0, pyfakefs-4.3.3, typeguard-2.13.3, dash-2.14.2, anyio-3.5.0, cov-3.0.0, qt-4.0.2, flaky-3.7.0, timeout-2.1.0, asyncio-0.19.0
timeout: 450.0s
timeout method: signal
timeout func_only: False
asyncio: mode=Mode.STRICT
collected 0 items

  1. coverage run -m --source=numpy pytest /usr/local/google/home/xiaojin/Workspace/pyle/xxj_test.py
    coverage ran against all files in the virtualenvs numpy folder, but doesn't show coverage for xxj_test.py?
@amaranthjinn amaranthjinn added needs triage support A support question from a user labels Mar 27, 2024
@amaranthjinn
Copy link
Author

It seems like the following command is close to what I want to run:
coverage run --source=/usr/local/home/xiaojin/Workspace/pyle,/usr/local/home/xiaojin/.virtualenvs/pyle/lib64/python3.11/site-packages deps_analysis.py --debug=trace

where deps_analysis.py is the following just to test out what coverage can return:
import numpy # test if coverage covers site package directory of 3rd party installations
print(numpy.file)

the result does return coverage of files in the .virtualenvs/.../site-packages, but I'm not clear how to interpret:

  1. it includes other .py' in the home/xiaojin/Workspace/pyle, while I imagine it should only be the deps_analysis.py from the repo directory (its only dependency is numpy, which is in .virtualenvs/). Why is coverage running against those other .py' files?
  2. there are a lot more dependencies from the .virtualenvs than expected, many with coverage 0%, so I'm not sure if they are included because they are really dependencies of numpy or all 3rd party dependencies in the .virtualenvs/ are scraped by Coverage?

Would appreciate any clarifications on how to use the coverage tool correctly and effectively.

@amaranthjinn
Copy link
Author

@nedbat , I'm able to update the coverage to the latest version. However, I'm having the same issue with the report.
With --source specified, Coverage report shows a lot of files with 0 lines which give 100% of coverage, or no lines covered (0%), and coverage for seems like every file in the directories.
However, with no site-packages directory specified in the source, the report will not show any package coverage in the 3rd party's site-packages folder.
I was hoping to see coverage for deps_analysis.py + its direct/indirect dependencies (3rd party). What's the recommended way to go about it?

@nedbat
Copy link
Owner

nedbat commented Apr 10, 2024

Coverage report shows a lot of files with 0 lines which give 100% of coverage, or no lines covered (0%)

You can exclude empty files (skip_empty), and 100% files (skip_covered) if they are distracting you.

I don't understand if you are getting measurement of third-party packages you aren't interested in though?

@amaranthjinn
Copy link
Author

Yeah, I seem to be getting measurements of packages I'm not interested in, from both project src and 3rd party package directories.
Here's the scenario I'm trying:
I wrote a dummy test in the project src directory that only does import numpy (deployed in the 3rd party package directory), and print.
I ran coverage against the dummy test.
I was expecting coverage of the dummy test, numpy, and maybe some other basic libraries for printing.
Instead, I got coverage for all the files in src directory, and all the files in the 3rd party package directory, regardless of whether they are dependencies of the dummy test or not.

My command is something like:
coverage run --source=/usr/local/google/home/Workspace/p,/usr/local/google/home/.virtualenvs/p/lib64/python3.11/site-packages /usr/local/google/home/Workspace/p/src/dummy_test.py --debug=trace

and then:
coverage html --ignore-errors --skip-empty

The behavior seems to be, with --source specified, coverage is running against all files in the listed directories?
However, without --source specified, coverage run against the dummy test will only show coverage for dummy test in the src directory, nothing from the 3rd party dependency directory (I want to see the coverage for numpy).

I was hoping to use the coverage tool to determine

  1. what are the package dependencies (internal and external/3rd party) of a specific feature/program/component
  2. how much of the dependent package is utilized by this specific feature/program/component.

How should I use the coverage tool to achieve the above goal?
I'm using coverage 7.4.4.

@amaranthjinn
Copy link
Author

amaranthjinn commented Apr 11, 2024

An update, I think the coverage is measuring only the dependencies invoked by the dummy test. It's just that all the files in the directory seem to be listed, with 0% coverage, so it's a bit confusing to sort through at first glance.
I think it can be a bit confusing if I want to find out if there's a dependency that should be called but has 0% coverage, and not because the dependency is not in the chain of calls and therefore 0% coverage? Or did I misunderstand some usage of the tool?

@amaranthjinn
Copy link
Author

I'm able to --skip-empty https://coverage.readthedocs.io/en/latest/cmd.html, and that removed 0% coverage files. However, of the files left, I'm not sure if they are triggered as dependencies of my test, seems more than that.
@nedbat, I'm hoping dynamic context would help me trace the dependencies, but it still doesn't seem to work for me, see https://stackoverflow.com/questions/78330426/coverages-dynamic-context-usage-returns-no-contexts-were-measured.

@stasos24
Copy link

stasos24 commented May 6, 2024

Hi @amaranthjinn
Just comment these lines:

.venv/lib/python3.11/site-packages/coverage/inorout.py-16478- #if self.third_match.match(filename) and not self.source_in_third_match.match(filename):
.venv/lib/python3.11/site-packages/coverage/inorout.py:16579: # return "inside --source, but is third-party"

@nedbat
Copy link
Owner

nedbat commented May 7, 2024

There are a few things I still don't understand:

  • Why do you want to measure the coverage of numpy? Are you testing numpy?
  • How should coverage know which third-party dependencies you are interested in and which you are not?

You should run coverage so that it measures all third-party packages, and then extract the information you want from the JSON report.

@amaranthjinn
Copy link
Author

Sorry for the delayed response, got roped into a different project.

Just to recap, I want to determine

  1. What 3rd party dependencies are used by my project (and % code of the dependency used)?
  2. What legacy 3rd party dependencies are not used by my project (but still sitting in my poetry, requirements config, and taking space)?

And I call coverage in my src directory:
coverage run --source=/usr/local/google/home/Workspace/p,/usr/local/google/home/.virtualenvs/p/lib64/python3.11/site-packages -m pytest --debug=trace
coverage html --skip-empty --ignore-errors

@stasos24 , thank you. I tried your suggestion, I don't see any obvious difference from using coverage with arguments --source? I see files in the site-packages as well as project src directory being traced, no dynamic context information though.

@nedbat

  • I created an experiment test class that depends on numpy, to figure out how coverage can give me a report for 3rd party dependencies (in .virtualenvs/../site_packages).

  • I think I had the wrong assumption about how the coverage tool works. I thought coverage will first determine the dependencies of the target (the target I passed to the coverage tool), and then run the target to see how much code in the dependencies are touched during execution. I see now that is not the correct assumption. Coverage is given some directories to monitor, and run the target, to see if any code in the directories is touched? It doesn't know if the code touched should be a dependency, or code not touch should not be a dependency? Is this the correct understanding of the tool?

The coverage tool is still useful for me to answer my two questions, but it seems like I need to figure out ways to use it.

  1. The report right now has a lot of items with 0% coverage, dependencies from project src and 3rd party intermixed, few thousands of files. It's hard for me to extract a pattern out (thus I tried to use dynamic context to group the dependencies by test method, but hasn't succeeded). What are your recommendations on how to organize the results (via command arguments, configurations, or customized code based on API)?
  2. For the dependencies that have 0% or very low coverage, I want to find out what in the project's source code is/should be calling it (or not at all). I guess the coverage tool cannot provide that? Do you have any recommendations on what tool can be leveraged to provide this dependency tracing (up the calling chain)?

@gitgithan
Copy link

gitgithan commented Jul 9, 2024

I have very similar questions as this issue title.
I finished reading all docs and struggling hard with setting source correctly in .coveragerc

I'll explain with a minimal example. I believe the answer to this would partially answer amaranthjinn's issue too.

I'm using coverage to learn about how third party libraries like pandas work.
I want to know what lines are executed under the hood when I use their high level api.
I know i can use a debugger to do the same, but coverage helps in having a static report view highlighting which branch got executed depending on what input.
I don't care about coverage % because i'm not developing pandas and there are no test files because the purpose is not to test, but to learn a pip installed library.

Here's cov.py

import pandas as pd
from pandas.core import frame
# from helper import add

data = pd.DataFrame({"A": [1, 1, 3], "B": [4, 5, 6]})
grouped_data = data.groupby("A").sum()
print(grouped_data)
# print(add(1, 2))

I run with python -m coverage run cov.py (see Footnote below)

Here's .coveragerc

[run]
source =
    /home/hanqi/.local/lib/python3.10/site-packages/pandas/core/frame.py
    /home/hanqi/.local/lib/python3.10/site-packages/pandas/core/groupby
debug=trace
    
[report]
skip_empty = True
omit = 
    **/__init__.py
exclude_lines =
    import

Ideally, I want coverage to automatically only show me files that have their function/method bodies executed.
I don't want to see every single file under pandas just because they were imported.

Since i did import pandas as pd, it caused the __init__.py of every submodule in pandas to run, including all function/class definitions, bloating the report when initially i only added pandas to source.
Adding those 2 lines in source of coveragerc was a manual attempt to limit report length, which partially worked.
It's not ideal because it requires that i already know the paths, which should not be if i'm learning the library.

Problem

I know the DataFrame class is defined in /home/hanqi/.local/lib/python3.10/site-packages/pandas/core/frame.py which i added under source. I expect the report to show def __init__ of class DataFrame(NDFrame, OpsMixin): being run and frame.py appearing in report.

Why do I get

/home/hanqi/.local/lib/python3.10/site-packages/coverage/inorout.py:503: CoverageWarning: Module /home/hanqi/.local/lib/python3.10/site-packages/pandas/core/frame.py was never imported. (module-not-imported)
  self.warn(f"Module {pkg} was never imported.", slug="module-not-imported")

Adding from pandas.core import frame does not fix this warning too. (answer to this is nice to have)
The warning makes sense that I never imported frame.py, but I know that the DataFrame class from this module will be used when code does pd.DataFrame(), so it's valuable to see how its def __init__ ran in the report.

Another strange thing is /home/hanqi/.local/lib/python3.10/site-packages/pandas/core/groupby actually worked to limit the report to only modules in groupby folder. This makes me think source only works for folders and not files? I tested by adding my own simple helper.py (commented out in cov.py) and output still says CoverageWarning: Module helper.py was never imported

I suspect the above issue is due to docs saying

Only importable files (ones at the root of the tree, or in directories with a __init__.py file) will be considered

/home/hanqi/.local/lib/python3.10/site-packages/pandas/core/groupby is a directory with __init__.py, so it worked.
/home/hanqi/.local/lib/python3.10/site-packages/pandas/core/frame.py is not a file at the root of the tree (I assume this means the pwd where coverage was ran), so it failed?

skip_empty, omit and exclude_lines were my attempts to reduce the number of files shown in report.
skip_empty and omit did work to exclude some files but still many files are left in report.
I was hoping if every line in a file is excluded by exclude_lines, the whole file would not show up in report too, but that was not the case. I don't want to see files in report, which only have imports being run but their classes/functions were never used.

The ideal

Docs describe skip_empty as Don’t report files that have no executable code (such as __init__.py files).
What i want exactly is to hide all files from report where "effectively" no code was executed (yes definitions of class/functions were executed when imported by others but their body was not used), and only show files where at least some part of their class/functions were used.

If this is possible, i'll have

  • a clean report without too many files that were actually never used, which are also files that wouldn't be interesting to someone using a debugger to learn logic (other than import patterns).
  • report showing files with classes/functions that had their bodies ran, and also parts of the class methods/functions in the same file that were not ran (for eg. because of if-else responding to various data inputs)

Footnote

the coverage command is not found even though i expected pip install coverage to make the cli available. I can't add it to PATH since i don't know where it's supposed to be.
I'm on WSL2 Ubuntu 20.04, Coverage.py, version 7.5.4 with C extension.
pip installed libraries are not installed in any virtual environment.

Others reported similar issue and answer with no explanation why the binary is not available after install or where it should be: https://stackoverflow.com/a/69630406/8621823

Similar request for using coverage as runtime inspection tool: https://stackoverflow.com/q/37979365/8621823

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
support A support question from a user
Projects
None yet
Development

No branches or pull requests

4 participants