JAW Speak

Jonathan Andrew Wolter

Merging pdf’s on Mac OS X from a non-duplex scanner

without comments

Reading time: 2 – 3 minutes

Goal: scan in hundreds of duplex documents in a non-duplex scanner and combine into 1 pdf in automated way. Status: it was harder than it should have been, and not that automated, but this works.

Scan in the papers as pdf’s from your paper-feed equipped scanner. Scan them right side up, then flip over and scan the other sides. The two pdf’s will contain pages: 1, 3, 5… and 2, 4, 6…

Reverse the even pages.

#!/usr/bin/ruby
 
if __FILE__ == $0
  puts "Run this on ubuntu or somewhere that pdftk is easy to be had. (which isn't os x)"
 
  if ARGV.length != 1
    puts "Syntax: #{__FILE__} pdf_to_reverse.pdf"
    exit
  end
 
  pdf = ARGV[0]
  reversed_pdf = pdf.gsub(/\.pdf/i, "_reversed.pdf")
 
  page_count = `pdfinfo #{pdf} | grep Pages`.scan(/\d+/)
 
  `pdftk #{pdf} cat #{page_count}-#{1} output #{reversed_pdf}`
end

Lastly, combine the two pdf’s, shuffling every other page, starting with the odds. Note it has some dependencies on pdftk and pdfinfo for the reversing (which are excruciatingly difficult to install on os x), and os x (for the merging).

#!/usr/bin/ruby
 
if __FILE__ == $0
  puts "Run this on os x to shuffle two pdf's, where the
        even pages are already reversed (reverse them with other script)"
 
  if ARGV.length != 3
    puts "Syntax: #{__FILE__} odds.pdf reversed_evens.pdf output.pdf"
    exit
  end
 
  odds_pdf = ARGV[0]
  reversed_evens_pdf = ARGV[1]
  output_pdf = ARGV[2]
 
  # obviously, only works on os x.  I didn't see an easy way to combine pdf's
  # in pdftk or other tools I searched for
  `python '/System/Library/Automator/Combine PDF Pages.action/Contents/Resources/join.py' --output '#{output_pdf}' --shuffle '#{odds_pdf}' '#{reversed_evens_pdf}'`
end

References:

  • pdftk – pdf toolkit, I could have installed with ports install pdftk, but that has a very long build dependency on gcj.
  • Another technique which would work if you didn’t need to reverse pages, using automator. And without automator (like I do with a script directly).
Bookmark and Share

Written by Jonathan

August 5th, 2009 at 8:41 am

Posted in automation

Leave a Reply