Python通过md5值过滤重复文件

原创 zayki 2021-09-06

727

# !/usr/bin/python
# -*- coding: UTF-8 -*-

from collections import defaultdict
from hashlib import md5
from os import getcwd, walk
import os.path


def find_files(filepath):
    for root, directories, filenames in walk(filepath):
        for filename in filenames:
            yield os.path.join(root, filename)


file_hashes = defaultdict(list)
for path in find_files(getcwd()):
    with open(path, mode='rb') as my_file:
        file_hash = md5(my_file.read()).hexdigest()
        file_hashes[file_hash].append(path)

for paths in file_hashes.values():
    if len(paths) > 1:
        print("Duplicate files found:")
        print(*paths, sep='\n')

python md5

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

Python通过md5值过滤重复文件

评论