Ebook Edit: Difference between revisions
Appearance
mNo edit summary |
(No difference)
|
Latest revision as of 07:19, 12 November 2024
I have some old and poorly formatted ebooks. This is how I fixed them.
Used the application https://calibre-ebook.com that comes with a gui and a cli tool.
I found Calibre to give better results than pandoc.
Automated fixes using Calibre GUI
open ebook in Calibre editor, e.g. via shortcut "T" in Calibre GUI
- Tools
- Check book "F7"
- Remove unused CSS rules
- Upgrade book internals (convert epub V2 to V3)
Edit via Python etc.
Export source code
- open ebook in Calibre editor, e.g. via shortcut "T" in Calibre GUI
- merge all text files into one (by marking them and right-click
- copy & paste the resulting complete html code into any editor and save the file (this is better than exporting the epub as html, as this leads to modifications)
perform some magic via Python, Perl, etc. (do not modify the exported file, use it read-only)
- first do some manual fixes, like
cont = cont.replace("Some Typo", "Fixed Typo", 1)
- use asserts to ensure problem stays fixed, even after further change of script code
was = r"< p >Part: ([^<+])< /p >"
cnt_parts = len(re.findall(was, cont))
cont = re.sub(was, r"\n\n< h1 >Part: \1< /h1 >\n\n", cont)
assert cnt_parts == 4, f"{cnt_parts} == 4"
assert "< p >Part" not in cont
- ensure to have title and author in head block as well as inline CSS
<title>The Title of the Book</title>
<meta name="author" content="Lastname, Firstname">
<style>
div.myclass{font-style: italic;}
</style>
- use h1,h2,h3 for ToC structuring
- re-create the epub ebook via Calibre CLI tool:
ebook-convert "$FILE.html" "$FILE.epub" --level1-toc "//h:h1" --level2-toc "//h:h2" --level3-toc "//h:h3" --language de-DE --no-default-epub-cover --cover "cover.jpg"
- for completeness, here the alternative pandoc command:
pandoc --standalone --from=html "$FILE.html" -o "$FILE.epub" --epub-cover-image="cover.jpg" --epub-chapter-level=2 -c "pandoc.css"