Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include CRC32 in FileInfo #135

Closed
myers opened this issue Jun 1, 2020 · 10 comments
Closed

Include CRC32 in FileInfo #135

myers opened this issue Jun 1, 2020 · 10 comments
Assignees
Labels
enhancement New feature or request for extraction Issue on extraction, decompression or decryption

Comments

@myers
Copy link

myers commented Jun 1, 2020

Is your feature request related to a problem? Please describe.
I would like to get the crc32 of a file without having to extract it.

Describe the solution you'd like
Add crc32 as a field on FileInfo

Describe alternatives you've considered
I suspect this is already in some deeper structure of py7zr, but I don't understand enough to find it. I know it is listed when doing 7z l -slt file.7z.

@miurahr miurahr added enhancement New feature or request for extraction Issue on extraction, decompression or decryption labels Jun 1, 2020
@miurahr miurahr self-assigned this Jun 1, 2020
@miurahr
Copy link
Owner

miurahr commented Jun 1, 2020

There are several CRC value in 7zip archive. Which one should be included?

  • startheadercrc
  • nextheadercrc
  • each archive file crcs[] (optional)
  • each folders(lzma containers) crc[] (optional)

Mandatory CRC is header crc and it may not be good for FileInfo field

@miurahr
Copy link
Owner

miurahr commented Jun 1, 2020

Oh,

** the crc32 of a file **

If a file means '*.7z' file, there is no value in the archive.

@miurahr miurahr added the needs more info Need more information or test data to reproduce label Jun 1, 2020
@myers
Copy link
Author

myers commented Jun 1, 2020

Have you ever used the built in python module zipfile or https://pypi.org/project/rarfile/ ? Both of these allow you to get the crc32 of the files in the archive, as they are in the metadata of the archive.

I was able to find what I needed:

        self._zf = py7zr.SevenZipFile(self.filepath, "r")
        # later in the class
        for fi in self._zf.list():
            if fi.is_directory:
                assert fi.uncompressed == 0
                continue

            fnfo = dict(filepath=fi.filename, size=fi.uncompressed)
            for ff in self._zf.header.files_info.files:
                if ff["filename"] != fi.filename:
                    continue
                crc32_uint32 = ctypes.c_uint32(0)
                crc32_uint32.value = ff["digest"]
                fnfo["crc32"] = f"{crc32_uint32.value:08X}"

@myers
Copy link
Author

myers commented Jun 1, 2020

Very odd, the code above gives me the CRC32 of the file in the archive. If it's not in the metadata of the archive, where is it getting it from?

@miurahr
Copy link
Owner

miurahr commented Jun 1, 2020

You may mean

each archive file crcs[] (optional)

@myers
Copy link
Author

myers commented Jun 1, 2020

You may mean

each archive file crcs[] (optional)

Yes, sorry. That is what I'm looking for.

@miurahr
Copy link
Owner

miurahr commented Jun 1, 2020

Sorry, just now, I cannot find a place where is a filed of CRC for each Files info in header.files_info.files.

Here is a header decode block for files_info

py7zr/py7zr/archiveinfo.py

Lines 682 to 736 in a2971e4

def _read(self, fp: BinaryIO):
numfiles = read_uint64(fp)
self.files = [{'emptystream': False} for _ in range(numfiles)]
numemptystreams = 0
while True:
prop = fp.read(1)
if prop == Property.END:
break
size = read_uint64(fp)
if prop == Property.DUMMY:
# Added by newer versions of 7z to adjust padding.
fp.seek(size, os.SEEK_CUR)
continue
buffer = io.BytesIO(fp.read(size))
if prop == Property.EMPTY_STREAM:
isempty = read_boolean(buffer, numfiles, checkall=False)
list(map(lambda x, y: x.update({'emptystream': y}), self.files, isempty)) # type: ignore
numemptystreams += isempty.count(True)
elif prop == Property.EMPTY_FILE:
self.emptyfiles = read_boolean(buffer, numemptystreams, checkall=False)
elif prop == Property.ANTI:
self.antifiles = read_boolean(buffer, numemptystreams, checkall=False)
elif prop == Property.NAME:
external = buffer.read(1)
if external == b'\x00':
self._read_name(buffer)
else:
dataindex = read_uint64(buffer)
current_pos = fp.tell()
fp.seek(dataindex, 0)
self._read_name(fp)
fp.seek(current_pos, 0)
elif prop == Property.CREATION_TIME:
self._read_times(buffer, 'creationtime')
elif prop == Property.LAST_ACCESS_TIME:
self._read_times(buffer, 'lastaccesstime')
elif prop == Property.LAST_WRITE_TIME:
self._read_times(buffer, 'lastwritetime')
elif prop == Property.ATTRIBUTES:
defined = read_boolean(buffer, numfiles, checkall=True)
external = buffer.read(1)
if external == b'\x00':
self._read_attributes(buffer, defined)
else:
dataindex = read_uint64(buffer)
# try to read external data
current_pos = fp.tell()
fp.seek(dataindex, 0)
self._read_attributes(fp, defined)
fp.seek(current_pos, 0)
elif prop == Property.START_POS:
self._read_start_pos(buffer)
else:
raise Bad7zFile('invalid type %r' % prop)

@miurahr
Copy link
Owner

miurahr commented Jun 1, 2020

Maybe around L463.

py7zr/py7zr/py7zr.py

Lines 458 to 467 in a2971e4

file_info['folder'] = folder
file_info['maxsize'] = maxsize
file_info['compressed'] = compressed
file_info['uncompressed'] = uncompressed
file_info['packsizes'] = packsize
if subinfo.digestsdefined[pstat.outstreams]:
file_info['digest'] = subinfo.digests[pstat.outstreams]
if folder is None:
pstat.src_pos += file_info['compressed']
else:

@miurahr
Copy link
Owner

miurahr commented Jun 1, 2020

Try to implement in #136
Could you test it and give me a feedback?

@miurahr miurahr added feedback Need feedback from issuer and removed needs more info Need more information or test data to reproduce feedback Need feedback from issuer labels Jun 1, 2020
@myers
Copy link
Author

myers commented Jun 2, 2020

This looks good to me!

@miurahr miurahr closed this as completed Jun 2, 2020
@miurahr miurahr added this to the v0.8: Encryption support milestone Jun 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request for extraction Issue on extraction, decompression or decryption
Projects
None yet
Development

No branches or pull requests

2 participants