Find Youtube video url with Clojure

Clojure can be the best scripting language with the declarative programming style, REPL and Java ecosystem, you can do almost anything with it and most important thing: solving problem with happy and elegant. Other languages like Ruby, Python or PHP don't give you these feelings, even they all can solve your problem. But it's not joyful to programming in these languages, the most important reason is they are all imperative in nature, the dynamic typing gives a little help about it.

Another problem is the way you gonna interact with them, what you get is a file, a script file you can pass parameter to, that is another indirection. On command line you can only pass string to your script with a lot of limitations, you usually need a layer of code to transform the parameters to the form that you code really need. That job is daunting and lame.

In Clojure, the story is different, you go straight to the problem itself, write a small function and test it, make sure it will do what it is designed to, and you write the next function. When all the building blocks are written and tested, you just write an entry function to compose them together. That function becomes a tool, a command, call that function with any parameter you want to pass. All happens in REPL.

This task is about extracting Youtube video URL address from Youtube API. Clojure good at tasks like this.

Each Youtube video has an unique ID associate with it. You can see them in you browser address bar, something like this

 
watch?v=J6vIS8jb6Fs
 

You can also retrieve video file information from the Youtube API

 
https://www.youtube.com/get_video_info?video_id=J6vIS8jb6Fs&el=detailpage&ps=default&eurl=&gl=US&hl=en
 

All parameters are optional except the video_id, but the default value of el is embedded, which may cause the error below if the owner forget to set it

 
Watch this video on YouTube.Playback on other websites has been disabled by the video owner
 

If you request that url in browser, Youtube will return a large chunk of data, it contains a lot of information, what we need is all the available video format and their download address.

There are online code and application that can parse the data and extract out the download address. This is an implementation with Clojure language.

 
(defn parse_str [s]
  (apply merge 
    (map #(hash-map (keyword (first %)) (second %)) 
      (map #(.split % "=")
        (.split s "&")
      )
    )
  )
)
 
(defn youtube-stream-handle [stream]
  (let [parsed-stream (parse_str stream)]
    (clojure.pprint/pprint (assoc parsed-stream :url (java.net.URLDecoder/decode (:url parsed-stream) "UTF-8")))
  )
)
 
(defn get-youtube-video [parsed]
  (let [url_encoded_fmt_stream_map (:url_encoded_fmt_stream_map parsed)
        adaptive_fmts (:adaptive_fmts parsed)
        splited-stream (.split (java.net.URLDecoder/decode url_encoded_fmt_stream_map "UTF-8") ",")
        splited-adaptive_fmts (.split (java.net.URLDecoder/decode adaptive_fmts "UTF-8") ",")
       ]
    (doall (map #(youtube-stream-handle %) splited-stream))
    (print "===============adaptive_fmts starts\n")
    (map #(youtube-stream-handle %) splited-adaptive_fmts)
  )
)
 
(defn construct-url-youtube [id]
  (str "http://youtube.com/get_video_info?video_id=" id "&el=detailpage&ps=default&eurl=&gl=US&hl=en")
)
 
; usage
(get-youtube-video (parse_str video-info))
 

The parse_str do the same thing as the same name function in PHP which is built in. So there are actually only two functions.

The output will look like this

 
{:type "video%2Fmp4%3B+codecs%3D%22avc1.64001F%2C+mp4a.40.2%22",
 :url
 "http://r1---sn-i3b7kn7k.googlevideo.com/videoplayback?itag=22&nh=IgpwcjA0LmhrZzAxKgkxMjcuMC4wLjE&id=o-AIv9I4CwT5q42utYabEg85sj-TswIVJ_V2aLI66BWqXp&ip=117.18.8.1&key=yt6&pl=24&lmt=1406296907991006&source=youtube&signature=A8ABF66FC9DADBC2F19F8074382472B0121A8B39.310AB51A96E34B12A82A5ED818C0A46475242FA3&dur=1922.449&mv=u&mt=1448610744&mm=31&ms=au&fexp=9408506%2C9408710%2C9409246%2C9413137%2C9413277%2C9416126%2C9417683%2C9419446%2C9420452%2C9420540%2C9420716%2C9421170%2C9421249%2C9422596%2C9422618%2C9422857%2C9423294%2C9423662%2C9423785%2C9423846%2C9424166%2C9424299%2C9424963&expire=1448632661&sver=3&ratebypass=yes&mime=video%2Fmp4&upn=3szgm3yMIYo&mn=sn-i3b7kn7k&sparams=dur%2Cid%2Cip%2Cipbits%2Citag%2Clmt%2Cmime%2Cmm%2Cmn%2Cms%2Cmv%2Cnh%2Cpl%2Cratebypass%2Csource%2Cupn%2Cexpire&ipbits=0",
 :itag "22",
 :quality "hd720",
 :fallback_host "tc.v10.cache8.googlevideo.com"}
{:type "video%2Fwebm%3B+codecs%3D%22vp8.0%2C+vorbis%22",
 :url
 "http://r1---sn-i3b7kn7k.googlevideo.com/videoplayback?itag=43&nh=IgpwcjA0LmhrZzAxKgkxMjcuMC4wLjE&id=o-AIv9I4CwT5q42utYabEg85sj-TswIVJ_V2aLI66BWqXp&ip=117.18.8.1&key=yt6&pl=24&lmt=1381323998094725&source=youtube&signature=27380B34A0800FE889711ADC026ACF10AADF06FB.4DF361E5E89665F3AE6EB4F1EE884B3BD037195B&dur=0.000&mv=u&mt=1448610744&mm=31&ms=au&fexp=9408506%2C9408710%2C9409246%2C9413137%2C9413277%2C9416126%2C9417683%2C9419446%2C9420452%2C9420540%2C9420716%2C9421170%2C9421249%2C9422596%2C9422618%2C9422857%2C9423294%2C9423662%2C9423785%2C9423846%2C9424166%2C9424299%2C9424963&expire=1448632661&sver=3&ratebypass=yes&mime=video%2Fwebm&upn=3szgm3yMIYo&mn=sn-i3b7kn7k&sparams=dur%2Cid%2Cip%2Cipbits%2Citag%2Clmt%2Cmime%2Cmm%2Cmn%2Cms%2Cmv%2Cnh%2Cpl%2Cratebypass%2Csource%2Cupn%2Cexpire&ipbits=0",
 :itag "43",
 :quality "medium",
 :fallback_host "tc.v16.cache2.googlevideo.com"}
{:type "video%2Fmp4%3B+codecs%3D%22avc1.42001E%2C+mp4a.40.2%22",
 :url
 "http://r1---sn-i3b7kn7k.googlevideo.com/videoplayback?itag=18&nh=IgpwcjA0LmhrZzAxKgkxMjcuMC4wLjE&id=o-AIv9I4CwT5q42utYabEg85sj-TswIVJ_V2aLI66BWqXp&ip=117.18.8.1&key=yt6&pl=24&lmt=1406297356495826&source=youtube&signature=51C2AF01B30AB5AB950C5CBA829C524566558964.8C4ED263E391C304C8BE12992E11B932F2122BD7&dur=1922.449&mv=u&mt=1448610744&mm=31&ms=au&fexp=9408506%2C9408710%2C9409246%2C9413137%2C9413277%2C9416126%2C9417683%2C9419446%2C9420452%2C9420540%2C9420716%2C9421170%2C9421249%2C9422596%2C9422618%2C9422857%2C9423294%2C9423662%2C9423785%2C9423846%2C9424166%2C9424299%2C9424963&expire=1448632661&sver=3&ratebypass=yes&mime=video%2Fmp4&upn=3szgm3yMIYo&mn=sn-i3b7kn7k&sparams=dur%2Cid%2Cip%2Cipbits%2Citag%2Clmt%2Cmime%2Cmm%2Cmn%2Cms%2Cmv%2Cnh%2Cpl%2Cratebypass%2Csource%2Cupn%2Cexpire&ipbits=0",
 :itag "18",
 :quality "medium",
 :fallback_host "tc.v24.cache7.googlevideo.com"}
 

The code worked well for a long time, but someday it's broken with the error reports the decode function gets null parameter. Because for unknown reason the field adaptive_fmts is missing in the map, maybe the video itself hasn't this field or the Google API is changed. The simplest way is to add the null check and if a field is missing, we print a warning and cancel the following operation. But I learned a bit about monad and I know it's not a good practice to write verbose null check code to clutter the code logic and I know the Maybe Monad and solve the problem elegantly.

So there is this library in Clojure that provides facilities to allow me use maybe monad, the code looks like this:

 
(use 'clojure.algo.monads)
(defn get-youtube-video [parsed]
  (let [url_encoded_fmt_stream_map (:url_encoded_fmt_stream_map parsed)
        splited-stream (.split (java.net.URLDecoder/decode url_encoded_fmt_stream_map "UTF-8") ",")
       ]
    (doall (map #(youtube-stream-handle %) splited-stream))
 
    (domonad maybe-m
      [adaptive_fmts (:adaptive_fmts parsed)
       splited-adaptive_fmts (.split (java.net.URLDecoder/decode adaptive_fmts "UTF-8") ",")
      ]
      (map #(youtube-stream-handle %) splited-adaptive_fmts)
    )
  )
)
 

Read the property of the map that comes from remote URL is an IO effect, we can not guarantee we will get the desired value. And the next value depends on the previous value.

In each step of the reading, if anyone ends up nil, the rest of computation should be terminated, the maybe monad automatically inject check code at each step and short-circuit the whole process when encounter the nil, so we don't have to check null in each and every step.

All the point here is to guarantee that anyone who don't expect nil will never receive the nil. Without monad, someone has to do the checking, either be the caller or the callee, both clutter the code. This is one of the cases in which the if statement is considered harmful, but can't be eliminated without facility like monad.