Recursive directory analyzer with stats, top files and duplicate detection
Project description
Directory Analyzer
Directory Analyzer is a simple Python tool that scans a directory recursively and generates a structured report.
Features
- Count total number of files and folders.
- Group files by extension and calculate total size per extension.
- Find top 10 largest files in the directory tree.
- Detect duplicate files by comparing file content (hash).
- Handle file access errors safely (permission denied, missing files).
- Print report to console or save it to a file.
Command line arguments
path(required): path to the target directory.
Requirements
- Python 3.8 or newer.
- Standard library only (os, hashlib, collections, argparse, etc.).
How it works
- The program receives a directory path from the command line.
- It uses
os.walk()to recursively traverse all subdirectories and files. - For each file it:
- Counts total files and folders.
- Determines the file extension and adds the file size to extension statistics.
- Stores the file path and size for later sorting (top 10 largest files).
- Reads the file in small chunks and calculates an MD5 hash to detect duplicates.
- After scanning, the program:
- Builds a list of the top 10 largest files.
- Groups files by hash and keeps only hashes that have more than one file (duplicates).
- Generates a text report with all collected statistics.
Main functions / methods
-
analyze_directory(path)or class methodDirectoryAnalyzer.analyze_directory()
Scans the directory recursively and fills all statistics. -
get_top_files(n=10)
Returns a list of the largest files (path and size). -
find_duplicates()
Returns groups of files that have identical content. -
generate_report()
Example report (simplified)
Analysis of: /home/user/documents Files: 1247, Folders: 156
Extensions (total size in bytes): .jpg: 2456789 .py: 124567 .txt: 45678
Top-10 largest files: 1234567 bytes: images/photo.jpg 987654 bytes: videos/movie.mp4
Duplicates: Group (2 files): /home/user/documents/file1.txt /home/user/documents/copy/file1_copy.txt
Download
pip install directory-analyzer==0.1.0
Example of usage
from directory_analyzer import DirectoryAnalyzer:
path = r'C:\directory'
analyzer = DirectoryAnalyzer(path)
analyzer.analyze_directory()
print(analyzer.generate_report())
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file directory_analyzer-0.1.1.tar.gz.
File metadata
- Download URL: directory_analyzer-0.1.1.tar.gz
- Upload date:
- Size: 3.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
61fdf5705db6e76ea34f122c499e29bff1611da46315d1099cada4b7d037d6ed
|
|
| MD5 |
300087b8c1e05ebd0226a96f19232c93
|
|
| BLAKE2b-256 |
f372850b2cda2b37da75a9b504146a4a82c939174c1866d82b9e5034e0f260db
|