Skip to main content

Recursive directory analyzer with stats, top files and duplicate detection

Project description

Directory Analyzer

Directory Analyzer is a simple Python tool that scans a directory recursively and generates a structured report.

Features

  • Count total number of files and folders.
  • Group files by extension and calculate total size per extension.
  • Find top 10 largest files in the directory tree.
  • Detect duplicate files by comparing file content (hash).
  • Handle file access errors safely (permission denied, missing files).
  • Print report to console or save it to a file.

Command line arguments

  • path (required): path to the target directory.

Requirements

  • Python 3.8 or newer.
  • Standard library only (os, hashlib, collections, argparse, etc.).

How it works

  1. The program receives a directory path from the command line.
  2. It uses os.walk() to recursively traverse all subdirectories and files.
  3. For each file it:
    • Counts total files and folders.
    • Determines the file extension and adds the file size to extension statistics.
    • Stores the file path and size for later sorting (top 10 largest files).
    • Reads the file in small chunks and calculates an MD5 hash to detect duplicates.
  4. After scanning, the program:
    • Builds a list of the top 10 largest files.
    • Groups files by hash and keeps only hashes that have more than one file (duplicates).
    • Generates a text report with all collected statistics.

Main functions / methods

  • analyze_directory(path) or class method DirectoryAnalyzer.analyze_directory()
    Scans the directory recursively and fills all statistics.

  • get_top_files(n=10)
    Returns a list of the largest files (path and size).

  • find_duplicates()
    Returns groups of files that have identical content.

  • generate_report()

Example report (simplified)

Analysis of: /home/user/documents Files: 1247, Folders: 156

Extensions (total size in bytes): .jpg: 2456789 .py: 124567 .txt: 45678

Top-10 largest files: 1234567 bytes: images/photo.jpg 987654 bytes: videos/movie.mp4

Duplicates: Group (2 files): /home/user/documents/file1.txt /home/user/documents/copy/file1_copy.txt

Download

pip install directory-analyzer==0.1.0

Example of usage

from directory_analyzer import DirectoryAnalyzer:

path = r'C:\directory'

analyzer = DirectoryAnalyzer(path)

analyzer.analyze_directory()

print(analyzer.generate_report())

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

directory_analyzer-0.1.1.tar.gz (3.8 kB view details)

Uploaded Source

File details

Details for the file directory_analyzer-0.1.1.tar.gz.

File metadata

  • Download URL: directory_analyzer-0.1.1.tar.gz
  • Upload date:
  • Size: 3.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for directory_analyzer-0.1.1.tar.gz
Algorithm Hash digest
SHA256 61fdf5705db6e76ea34f122c499e29bff1611da46315d1099cada4b7d037d6ed
MD5 300087b8c1e05ebd0226a96f19232c93
BLAKE2b-256 f372850b2cda2b37da75a9b504146a4a82c939174c1866d82b9e5034e0f260db

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page