2021-03-30 23:58:13 -05:00
2021-03-28 16:34:14 -05:00
2021-03-30 23:58:13 -05:00
2021-03-28 16:37:33 -05:00
2021-03-30 00:58:26 -05:00
2021-03-28 16:33:26 -05:00

#Marxist Ebook Scraper

Use this script alongside Calibre to pull any number of articles from Marxists.org and convert them into a single ebook.

#NOTE:

This is very much a work in progress, and may break frequently. The index_crawler.py script seems much more successful so I'm working on that now. It's roughly the same thing but the way it identifies chapters works better with various authors.

Rather than me documenting the ever-changing CLI, for now just use "python3 index_crawler.py --help". Note that there are placeholder arguments for some unimplemented features.

##Requirements

In addition to the python libraries listed in requirements.txt, this script requires Calibre and its add-on EpubMerge. Right now the executables "ebook-merge" and "calibre-debug" must be in your path.

Marxist Ebook

##Usage

python3 marxistbook.py [-h] [-o OUTPUT] [-t TITLE] [-a AUTHOR] url [url ...]

positional arguments:

url urls to download

optional arguments:

-h, --help

                    show this help message and exit

-o OUTPUT, --output OUTPUT

                    name of output file

-t TITLE, --title TITLE

                    set the title manually

-a AUTHOR, --author AUTHOR

                    set the author manually (currently not working)

URLs should be one of two types: a table of contents, or an actual article. A table of contents is a page like this one. Each chapter will be downloaded individually, the links at the bottom of the page will be removed, and they will be merged into a single book. An article is a page like this, which contains the actual text. URLs of both types can be combined in any order. Each URL will be downloaded and made into an epub individually, then they will all be merged into a single book. This book will be either converted or renamed, based on the output filetype.

Description
No description provided
Readme 54 KiB
Languages
Python 100%