Clojure is a dynamical functional language runs on JVM. Its also a modern LISP with elegant syntax and powerful data structures. And whats more, Clojure is fully inter operatable with JAVA, all the JAVA libraries are readily available.

This post introduce you an easy way to save a slideshare as one pdf file with Clojure.

1. Find a pdf library with Clojure API

The first thing is to decide which pdf library to use. You can find many of them in GitHub or the office libraries for Clojure the Clojars.org. After some searchings I decide to use clj-pdf. Its a Clojure wrapper for iText library, and its almost intuitively easy to use it.

Add dependencies into you project.clj

 
[clj-pdf/clj-pdf "1.11.21"]
 

In this case we only deal with image, here is how to add image to pdf.

 
(use 'clj-pdf.core)
(pdf 
  [{
   :header "Enter Matrix"
   :size ::crown-quarto}
   [:image 
     {:align :center 
     }
     "https://image.slidesharecdn.com/2013-11-14-20enterthematrix-131207071455-phpapp02/95/enter-the-matrix-1-1024.jpg?cb=1386422224"]
 
  [:pagebreak]
   [:image 
     {:align :center 
     }
     "https://image.slidesharecdn.com/2013-11-14-20enterthematrix-131207071455-phpapp02/95/enter-the-matrix-2-1024.jpg?cb=1386422224"]
  ]
  "doc.pdf"
)
 

This code add an image specified by the url and then a pagebreak then add another image to the second page of the pdf file.

2. Extract image of Slideshare

The next step will be extracting all the necessary image url of a Slideshare then add them one by one to our pdf file. When you find an interesting Slideshare in a web page, you can look at its source and it will be something like this

 
<div class="slide_container jsplBgColorBigfoot wide_img">
                  <div data-index="1" class="slide show" slidenumber="1" style="min-height: 0px;">
                    <img id="img_slide_image" class="slide_image" src="https://image.slidesharecdn.com/2013-11-14-20enterthematrix-131207071455-phpapp02/95/enter-the-matrix-1-638.jpg?cb=1386422224" data-normal="https://image.slidesharecdn.com/2013-11-14-20enterthematrix-131207071455-phpapp02/95/enter-the-matrix-1-638.jpg?cb=1386422224" data-full="https://image.slidesharecdn.com/2013-11-14-20enterthematrix-131207071455-phpapp02/95/enter-the-matrix-1-1024.jpg?cb=1386422224" data-small="https://image.slidesharecdn.com/2013-11-14-20enterthematrix-131207071455-phpapp02/85/enter-the-matrix-1-320.jpg?cb=1386422224">
                  </div>
                  <div data-index="2" class="slide" slidenumber="2" style="min-height: 0px;">
                    <img class="slide_image" src="https://image.slidesharecdn.com/2013-11-14-20enterthematrix-131207071455-phpapp02/95/enter-the-matrix-2-638.jpg?cb=1386422224" data-normal="https://image.slidesharecdn.com/2013-11-14-20enterthematrix-131207071455-phpapp02/95/enter-the-matrix-2-638.jpg?cb=1386422224" data-full="https://image.slidesharecdn.com/2013-11-14-20enterthematrix-131207071455-phpapp02/95/enter-the-matrix-2-1024.jpg?cb=1386422224" data-small="https://image.slidesharecdn.com/2013-11-14-20enterthematrix-131207071455-phpapp02/85/enter-the-matrix-2-320.jpg?cb=1386422224">
                  </div>
                  <!--  more slides -->
          </div>
 

Copy this text out. What we need is the attribute data-full. There plenty of ways we can extract this url from this pieces of text. I will use VIM's search and replace functionality to do this. We can totally automate it with some kind of scripting language.

Execute these replace command one by one in VIM.

 
%s/^ \{-}<div.\{-}$//g
%s/^ \{-}<\/div.\{-}$//g
%s/^ \{-}<img.\{-}data-full=//g
%s/data-small.\{-}$//g
 

You will get all the urls list line by line. This is the data will be feed to our Clojure code.

3. Clojure code to accept url list and write them to pdf

With the data extracted in step 2, we can now write Clojure code to process them.

 
(pdf 
  [{
   :header "Enter Matrix"
   :size :crown-quarto}
 
   (let [imagelist (for [url  urllist]
      [:image 
       {:align :center 
       }
       url]
      )]
      (let [merged (loop [v imagelist s []] (if (empty? v) s (recur (rest v) (conj s (first v) [:pagebreak]) )))]
        (for [x merged] x)
      ))
  ]
  "doc.pdf"
)
 
 

We use the :crown-qurto as our pdf page size. Which is better fit for image. The urllist is a vector containing image urls

 
(def urllist [
   ; copy text here
])
 

You can directly copy the text we extracted in step to the vector.

Then we transform it into a vector that represent an image in pdf. Its also an vector , the first element is keyword :image, and then an options map to set for the image for example how to align the image in the pdf, and then the url string.

We want each page of pdf contains one image, so after each image we need insert an :pagebreak which will cause the next image be write to next page of the pdf file.

what the code in second let statement do is do a transform like this

 
:image :image :image ...
 
->
 
:image :pagebreak :image :pagebreak :image :pagebreak ...
 

To make the process more generalize I write another function to do the insertion.

 
(defn implode-lazy [vec sep] 
  (let [c (count vec)] 
    (for [ i (range 0 (* 2 c))] 
      (if (=  (mod i 2) 0)
        (nth vec (/ i 2) )
        sep
      )
    )
  )
)
 

This will accept an vector and a separator and insert the separator for after each element of the vector and return the final list.

Now we can write our main logic more elegantly

 
(pdf 
  [{
   :header "Enter Matrix"
   :size :crown-quarto}
 
   (let [imagelist (for [url  urllist]
      [:image 
       {:align :center 
       }
       url]
      )]
      (implode-lazy imagelist [:pagebreak]))
  ]
  "doc.pdf"
)
 
 

Use interpose function

Our function implode-lazy function can have an even more simpler form by using built in function interpose. This function works like PHP implode function.

 
(defn implode [v sep]
  (conj (vec (interpose sep v)) sep)
)