Skip to main content

This tool offers a solution for identifying and managing similar images across multiple directories.

Project description

Duplicate Image Finder and Remover

DIFR(Duplicate Image Finder and Remover) offers a solution for identifying and managing similar images across multiple directories. Utilizing Convolutional Neural Networks for feature extraction, it provides users with the capability to efficiently compare images, detect similarities, and manage image datasets with ease. Key features include:

  • Flexible Directory Comparisons: Specify target and comparison directories to find similar images, ensuring precise dataset management.
  • Feature Caching: Reduce computation time on subsequent runs by caching image features in a specified directory.

Installing

pip install difr

Usage

Use Case 1: Removing Duplicates within a Single Directory

Detect and remove duplicate images within a single directory (--target_dir_path) to declutter your image collection and save space (--save_dir_path).

Example:

difr --target_dir_path ./path/to/imagedir --save_dir_path ./path/to/savedir -th 0.95 --compare_self

Use Case 2: Comparing Images Across Directories

Identify and remove images in one directory that are duplicates of images in another directory (--compare_dir_path). This is useful for merging collections without retaining duplicates.

Example:

difr --target_dir_path ./path/to/imagedir --save_dir ./path/to/savedir -th 0.95 --compare_dir_path comp_dir_1 comp_dir_2

Advanced Options

  • --cache_dir_path: Specify a directory to cache image features.
  • --batch_size: Specifies the number of images to process in a single batch when extracting features. A larger batch size can speed up the processing by taking advantage of parallel computing capabilities, but it will also require more memory.
  • --workers: Set the number of workers for parallel image loading.
  • --similarity_threshold: Set the cosine similarity score threshold for considering two images as similar. The value must be between 0 (no similarity) and 1 (identical images), where a higher value means a stricter criterion for similarity.

Example Usage and Output

Command:

difr --target_dir_path ./data/train --save_dir_path ./data/train_unique --compare_dir_path data/val data/test --cache_dir_path .difr_cache --similarity_threshold 0.7 --batch_size 32 --workers 16

Output:
output1

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

difr-1.0.3.tar.gz (10.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

difr-1.0.3-py3-none-any.whl (10.9 MB view details)

Uploaded Python 3

File details

Details for the file difr-1.0.3.tar.gz.

File metadata

  • Download URL: difr-1.0.3.tar.gz
  • Upload date:
  • Size: 10.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.10.18 Linux/6.11.0-1018-azure

File hashes

Hashes for difr-1.0.3.tar.gz
Algorithm Hash digest
SHA256 b21d4657e155b118802fb321fed645f16c411807a78743c0a80441885d0804b0
MD5 d796e0b4f5995ad6e0c9d3fc31a9bb07
BLAKE2b-256 752d13cdb7b3cb315b8d9edf318acbdeb5a2215a261ff3d4f56e970324917bd0

See more details on using hashes here.

File details

Details for the file difr-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: difr-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 10.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.10.18 Linux/6.11.0-1018-azure

File hashes

Hashes for difr-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 723346e79ea5318ea19eb2084e0207e9100e74a0b4a90dd1ddb3f3a53e19228e
MD5 4d69aae8e4306d5a25d37ad5db74beea
BLAKE2b-256 4bbf2a55ab51968459e03bd7e99fe5cd45af0378d2617930fce0a4001ea2d3cf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page