Web crawler and sitemap generator.
Project description
Sitemap generator library for python. Fork from https://github.com/Haikson/sitemap-generator.
Installing
pip install nix-sitemap-generator
Usage
1. Import crawler from pysitemap
from pysitemap import crawler
2. Call crawler()
crawler(
'https//site.com', out_file='debug/sitemap.xml', exclude_urls=[".pdf", ".jpg", ".zip"],
http_request_options={"ssl": False}, parser=Parser
)
Example
import sys
import logging
from pysitemap import crawler
from pysitemap.parsers.lxml_parser import Parser
if __name__ == '__main__':
if '--iocp' in sys.argv:
from asyncio import events, windows_events
sys.argv.remove('--iocp')
logging.info('using iocp')
el = windows_events.ProactorEventLoop()
events.set_event_loop(el)
# root_url = sys.argv[1]
root_url = 'https://www.haikson.com'
crawler(
root_url, out_file='debug/sitemap.xml', exclude_urls=[".pdf", ".jpg", ".zip"],
http_request_options={"ssl": False}, parser=Parser
)
Changes
v. 0.10.1
Refactored the code to make it more readable.
Removed prints() calls from code.
Added verbose mode to crawler().
Added type hints to crawler() arguments.
Add ValueError handling when try to add_signal_handler()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nix-sitemap-generator-0.10.1.tar.gz.
File metadata
- Download URL: nix-sitemap-generator-0.10.1.tar.gz
- Upload date:
- Size: 13.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5c448ead092a83ae14c61cfb95b1dfe3f8b627e8e8401a01e69750f91f3fab9b
|
|
| MD5 |
69e8368411be6ccb83b3bf01473e3164
|
|
| BLAKE2b-256 |
a9509a4861b00146dfb984a6bed78c7e5bbc07b127aa2b554a38db6f11d21889
|
File details
Details for the file nix_sitemap_generator-0.10.1-py3-none-any.whl.
File metadata
- Download URL: nix_sitemap_generator-0.10.1-py3-none-any.whl
- Upload date:
- Size: 16.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f28d09d7aafa0e61a23846a1ab5e01b37fe894c442209e99c3f80a56e75ba75
|
|
| MD5 |
d40d552f4a9a8a717adc6f1dbb48f75d
|
|
| BLAKE2b-256 |
03c7762813012d878fb9a59dd0918578327df55eea004034a7b102e12586f9a1
|