Modifying PDFs
Required Software
2019: New powerful tool: Coherent PDF
Linux
sudo apt-get install pdftk
Windows
Join several files
pdftk in1.pdf in2.pdf cat output out.pdf # or using handles pdftk A=in1.pdf B=in2.pdf cat A B output out.pdf # or using wildcards pdftk *.pdf cat output out.pdf # # Remove 'page 13' from in1.pdf to create out1.pdf pdftk in1.pdf cat 1-12 14-end output out.pdf # or pdftk A=in1.pdf cat A1-12 A14-end output out1.pdf # # join parts of several files pdftk A=file1.pdf B=file2.pdf C=file3.pdf cat A B2-end C1-23 output out.pdf
extract and convert page to image
gs -dSAFER -r600 -sDEVICE=pngalpha -dFirstPage=1 -dLastPage=1 -o tmp/title-en.png tmp/hpmor.pdf convert -density 150 tmp/title-en.png -resize 1186x1186\> -quality 75 tmp/title-en.jpg # Here ghostscript instead of imagemagick used, since imagemagick throw this error: # convert -density 150 tmp/hpmor.pdf[0] -quality 75 tmp/title-en.jpg # attempt to perform an operation not allowed by the security policy
extract page range
via PDFtk(Windows)
@echo off set pdftk=C:\Users\torben\Progs\PortableApps\PDFTKBuilderPortable\App\pdftkbuilder\pdftk.exe set start=12 set end=40 for %%F in (input\*.pdf) do ( echo "%%~nF" %pdftk% "%%F" cat %start%-%end% output "output\%%~nF-cat-pdftk.pdf" )
via GhostScript (Windows) + image compression
@echo off set gs=C:\Users\torben\Progs\PortableApps\CommonFiles\Ghostscript\bin\gswin32c.exe REM output image quality is reduced to 300 dpi via -dPDFSETTINGS=\printer for 150 dpi use \ebook, for 75 dpi use \screen set start=12 set end=40 for %%F in (input\*.pdf) do ( echo "%%~nF" %gs% ^ -o "output\%%~nF-cat-gs.pdf" ^ -sDEVICE=pdfwrite ^ -dFirstPage=%start% -dLastPage=%end% ^ -dCompatibilityLevel=1.7 ^ -dPDFSETTINGS=/printer ^ -dAutoRotatePages=/None ^ -dNOPAUSE -dBATCH ^ -f "%%F" )
Crop white borders
(useful for reading a book on a small netbook/laptop screen)
Linux
quick and dirty:
pdfcrop file.pdf file-crop.pdf
better, since preserves links and generates smaller filesize: Use this script pdfcrop.sh or the modified version below pdfcrop2.sh
# requires pdftk pdfcrop2.sh file.pdf file-crop.pdf
Windows
use pdfcrop.bat
Render two or more pages onto one page
# requires pdftk sudo apt-get install pdfjam # brings pdfnup pdfnup AS_Chapter1.pdf
pdf shrinking and protection-removal using Ghostscript
Shrinks file size and removes protection, output -o comes first!!!
gswin64c.exe ^ -o "file-shrink.pdf" ^ -sDEVICE=pdfwrite ^ -dCompatibilityLevel=1.7 ^ -dPDFSETTINGS=/printer ^ -dAutoRotatePages=/None ^ -dNOPAUSE -dBATCH ^ -f "file.pdf"
( Windows: gswin64c.exe and ^ ; Linux: gs and \ )
Further parametes Converting Bitmaps to 300dpi
-dPDFSETTINGS=/printer
Converting Bitmaps to 150dpi
-dPDFSETTINGS=/ebook
Converting Bitmaps to 75dpi
-dPDFSETTINGS=/screen
Page range
-dFirstPage=100 -dLastPage=123
Vector text on Raster image
Do not use Inkscape when working with colorful images like pictures, since internally it stores all raster images as png, not as jpg when exporting to pdf. Better use:
- Gimp to optimize file
- crop borders
- scale image
- set dpi via Image -> Scale Image -> X+Y Resolution (300px/inch is fine for printing)
- set image->mode->grayscale if suitable
- export in wanted quality ( e.g. 85% for jpg or image->mode->indexed for png)
- LibreOffice
- import image, preferable as link, since than you can still modify it afterwards using Gimp
- place text or drawings
- File-> export as pdf
- you might choose "Lossless compression" if already done by Gimp
- no need for "reduce image resolution" if already done by Gimp
Color to Greyscale
gs -sOutputFile=grayscale.pdf -sDEVICE=pdfwrite -sColorConversionStrategy=Gray -dProcessColorModel=/DeviceGray -dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH c-color.pdf < /dev/null
Appendix Scripts
pdfcrop.bat
requires GhostScript and sed (included in UnixUtils for Windows)
@echo off for %%F in (input\*.pdf) do ( echo "%%~nF" echo uncompressing REM using pdtk to uncompress pdftk "%%F" output uncompressed.pdf uncompress echo adding Boxes REM sed from unix utils package REM calc via mm2pt.xlsx sed -e "s/\(\(Crop\|Media\)Box\).*/\1 [55.0 85.0 413.0 590.0]/g" uncompressed.pdf > uncompressed2.pdf del uncompressed.pdf REM using ghostscript to trim echo compressing e:\win\progs\gs9.21\bin\gswin64c.exe ^ -o "output\%%~nF-crop.pdf" ^ -sDEVICE=pdfwrite ^ -dCompatibilityLevel=1.7 ^ -dAutoRotatePages=/None ^ -dNOPAUSE -dBATCH ^ -f "uncompressed2.pdf" del uncompressed2.pdf )
pdfcrop2.sh
#!/bin/bash # from http://tex.stackexchange.com/questions/42236/pdfcrop-generates-larger-file function usage () { echo "Usage: `basename $0` [Options] <input.pdf> [<output.pdf>]" echo echo " * Removes white margins from each page in the file. (Default operation)" echo " * Trims page edges by given amounts. (Alternative operation)" echo echo "If only <input.pdf> is given, it is overwritten with the cropped output." echo echo "Options:" echo echo " -m \"<left> [<top> [<right> <bottom>]]\"" echo " adds extra margins in default operation mode. Unit is bp. A single number" echo " is used for all margins, two numbers \"<left> <top>\" are applied to the" echo " right and bottom margins alike." echo echo " -t \"<left> [<top> [<right> <bottom>]]\"" echo " trims outer page edges by the given amounts. Unit is bp. A single number" echo " is used for all trims, two numbers \"<left> <top>\" are applied to the" echo " right and bottom trims alike." echo echo " -hires" echo " %%HiResBoundingBox is used in default operation mode." echo echo " -help" echo " prints this message." } c=0 mar=(0 0 0 0); tri=(0 0 0 0) bbtype=BoundingBox while getopts m:t:h: opt do case $opt in m) eval mar=($OPTARG) [[ -z "${mar[1]}" ]] && mar[1]=${mar[0]} [[ -z "${mar[2]}" || -z "${mar[3]}" ]] && mar[2]=${mar[0]} && mar[3]=${mar[1]} c=0 ;; t) eval tri=($OPTARG) [[ -z "${tri[1]}" ]] && tri[1]=${tri[0]} [[ -z "${tri[2]}" || -z "${tri[3]}" ]] && tri[2]=${tri[0]} && tri[3]=${tri[1]} c=1 ;; h) if "$OPTARG" == "ires" then bbtype=HiResBoundingBox else usage 1>&2; exit 0 fi ;; \?) usage 1>&2; exit 1 ;; esac done shift $((OPTIND-1)) -z "$1" && echo "`basename $0`: missing filename" 1>&2 && usage 1>&2 && exit 1 input=$1;output=$1;shift; -n "$1" && output=$1 && shift; # by TM if [ $input == $output ] ; then output="`basename $output`" # remove dirs output="${output%\.*}" # remove ext .pdf & .PDF output="$output-crop.pdf" fi echo "$input -> $output" ( "$c" -eq 0 && gs -dNOPAUSE -q -dBATCH -sDEVICE=bbox "$input" 2>&1 | grep "%%$bbtype" pdftk "$input" output - uncompress ) | perl -w -n -s -e ' BEGIN {@m=split /\s+/, $mar; @t=split /\s+/, $tri;} if (/BoundingBox:\s+([\d\.\s]+\d)/) { push @bbox, $1; next;} elsif (/\/MediaBox\s+\[([\d\.\s]+\d)\]/) { @mb=split /\s+/, $1; next; } elsif (/pdftk_PageNum\s+(\d+)/) { $p=$1-1; if($c){ $mb[0]+=$t[0];$mb[1]+=$t[1];$mb[2]-=$t[2];$mb[3]-=$t[3]; print "/MediaBox [", join(" ", @mb), "]\n"; } else { @bb=split /\s+/, $bbox[$p]; $bb[0]+=$mb[0];$bb[1]+=$mb[1];$bb[2]+=$mb[0];$bb[3]+=$mb[1]; $bb[0]-=$m[0];$bb[1]-=$m[1];$bb[2]+=$m[2];$bb[3]+=$m[3]; print "/MediaBox [", join(" ", @bb), "]\n"; } } print; ' -- -mar="${mar[*]}" -tri="${tri[*]}" -c=$c | pdftk - output "$output" compress