Java InputStream read hangs forever and how to fix it with timeout and retry in Clojure

The Java low level API InputStream can be dangerous when you use it to read file from a remote sever. The problem of this API is it doesn't have timeout and retry mechanism when some exception happens between the client and server. And this kind of exceptions happens quite frequently.

But they are totally normal, most of us have encountered situations in which the web page needs a refresh to load successfully. The same for file download, sometimes the download failed, not because the server is down, you just need a retry.

By default, if you use InputStream to download a file, it doesn't consider the possibility of any network exceptions, it will keep read from a socket until all data is sent, if for whatever reason the server don't respond before the file is fully downloaded, the InputStream will simply wait, block the thread, never return, there is no timeout, the only thing you can do is restart the JVM.

The best practice is never use the raw API directly, for example, the following Clojure code is not safe

 
(defn saveurl-unsafe [uri file]
  (with-open [in (clojure.java.io/input-stream uri)
              out (clojure.java.io/output-stream file)]
    (clojure.java.io/copy in out)))
 

I could have write it in Java and call it from Clojure, but I think a Clojure version would be better and more convenient.

I found this useful code on github: file.clj

I modified it a little bit, remove all external dependencies, you can simply copy the code snippet and use it immediately.

 
(defn pretty-file-size
  [n-bytes]
  (let [n-kb (int (/ n-bytes 1024))
        n-mb (int (/ n-kb 1024))]
    (cond
     (< n-bytes 1024) (str n-bytes " B")
     (< n-kb 1024)    (str n-kb " KB")
     :else            (str n-mb " MB"))))
 
 
(defn print-file-copy-status
  [num-copied-bytes buf-size file-size slices]
  (let [min    num-copied-bytes
        max    (+ buf-size num-copied-bytes)]
    (when-let [slice (some (fn [slice]
                             (when (and (> (:val slice) min)
                                        (< (:val slice) max))
                               slice))
                           slices)]
      (println (str (:perc slice) "% (" (pretty-file-size num-copied-bytes)  ") completed")))))
 
(defn percentage-slices
  [size num-slices]
  (map (fn [slice]
         (let [perc (/ (inc slice) num-slices)]
           {:perc (* 100 perc)
            :val  (* size perc)}))
       (range num-slices)))
 
 
(defn remote-file-copy [in-stream out-stream]
  (let [buf-size 2048
        buffer   (make-array Byte/TYPE buf-size)]
    (loop [bytes-copied 0]
      (let [size (.read in-stream buffer)]
        (when (pos? size)
          (do (.write out-stream buffer 0 size)
              (recur (+ size bytes-copied))))))
    (println "--> Download successful")))
 
 
(defn download-file-with-timeout
  [url target-path timeout]
  (let [
        url         (java.net.URL. url)
        con         (.openConnection url)]
    (.setReadTimeout con timeout)
    (with-open [in (.getInputStream con)
                out (clojure.java.io/output-stream target-path)]
      (remote-file-copy in out))
    target-path))
 
 
 
 
(defn download-file*
  ([url path timeout]                  (download-file-with-timeout url path timeout))
  ([url path timeout n-retries]        (download-file* url path timeout n-retries 5000))
  ([url path timeout n-retries wait-t] (download-file* url path timeout n-retries wait-t 0))
  ([url path timeout n-retries wait-t attempts-made]
     (when (>= attempts-made n-retries)
       (throw (Exception. (str "Aborting! Download failed after "
                               n-retries
                               " attempts. URL attempted to download: "
                               url ))))
 
     (let [path path]
       (try
         (download-file-with-timeout url path timeout)
         (catch Exception e
           (Thread/sleep wait-t)
           (println (str "Download timed out. Retry " (inc attempts-made) ": " url ))
           (download-file* url path timeout n-retries wait-t (inc attempts-made)))))))
 
 
(defn download-file
  ([url path timeout]
     (download-file* url path timeout))
  ([url path timeout n-retries]
 
     (download-file* url path timeout n-retries))
  ([url path timeout n-retries wait-t]
 
     (download-file* url path timeout n-retries wait-t)))
 
;(download-file-with-timeout "http://makble.com/images/yandex-search-highlight.jpg" "c:\\tmp\\aa.jpg" (* 5 60 1000)) ; set timeout to 5 minutes