Using H2 in memory database in Clojure

Introduction

We want to accomplish a very simple task: collecting file names from different locations from the file system. And then add these data to an in memory SQL DB, and then do some query against this DB. There are different ways can do the same thing , for example, we can simply export all the file name information to a text file and using grep to search through it.

Or we can build a data structure to hold those data and using regular expression to search them. We can do it in any dynamic scripting language like PHP or Python.

Clojure is a pretty new functional language and have access to all JAVA libraries and powerful expressiveness. Here we will use this language to do the thing. You will find its a good choice for the job.

Prepare H2 database and jdbc drivers jar for Clojure

H2 database is a JAVA library implemented a SQL database engine, it can work as an in memory db. Why use it? First reason is we want a lightweight solution, we don't want involve a full fledged database server. Our task is simple enough. The second reason is we don't want write code to do data query, its a tedious job, let SQL do what it good at.

The combination used in here is H2 database + Clojure + Clojure JDBC driver. Go to the h2 database site and get the jar file and drop it to class path that your REPL can access.

To get Clojure jdbc driver, add this to project.clj

 
[org.clojure/java.jdbc "0.3.6"]
 

Connect, CRUD in H2 database in Clojure

First require clojure.java.jdbc.

 
(require '[clojure.java.jdbc :as j])
 

Next step is configure H2 database,

 
(def demo-settings
   {
    :classname   "org.h2.Driver"
    :subprotocol "h2:mem"
    :subname     "demo;DB_CLOSE_DELAY=-1"
    :user        "sa"
    :password    ""
   }
  )
 
 

This configuration sets the database in memory. Note we add DB_CLOSE_DELAY=-1 after subname. The reason is without this option the content of the database is lost whenever the last connection is closed. If you create table like this

 
(j/db-do-commands demo-settings
  (j/create-table-ddl :filetable
    [:name "varchar(3200)"]
    [:path "varchar(3200)"]
    [:origname "varchar(3200)"])
)
 
 

You will find the table is not created after this statement, if you insert record to table, the error "not find table" throw.

You can also using disk file as data storage, then the configuration will be

 
(def demo-settings
   {
    :classname   "org.h2.Driver"
    :subprotocol "h2:file"
    :subname     (str (System/getProperty "user.dir") "/" "demo")
    :user        "sa"
    :password    ""
   }
  )
 

Collecting data from file system

First let's define some locations from which we retrieve our file name data. Its simply a vector of paths.

 
(def file-store-location [
  "C:\\loc1"
  "C:\\loc2"
  "C:\\loc3"
  "C:\\loc4"
])
 

We only interest in certain types of file, for example pdf file, or epub file, these types will be a set.

 
(def file-ext-watch 
  #{"pdf" "djvu" "epub"}
)
 
(defn get-extension [file]
  (let [re #"\.[^.\\]*$"]
    (re-find re file)
  )
)
 
(def to-lower clojure.string/lower-case)
 
(defn filter-by-ext [file]
  (if (= nil  (get-extension file))
    nil
    (contains? file-ext-watch (clojure.string/replace (to-lower (get-extension file)) "." ""))
  )
)
 
 

Since we only interest files with these types, we will filter file names by their extension. We need a function to get file extension and a filter function that filter out other types of files.

To list and filter files :

 
(defn list-files [store]
  (filterv #(filter-by-ext %) (shell "ls.bat" store))
)
 

To list all files in a folder, we can call shell command, see Executing shell command in Clojure REPL.

Now we can add data to database

 
(defn insert-file [fileloc file origname]
  (j/insert! demo-settings :filetable {
    :name file
    :path fileloc
    :origname origname
 })
)    
 
(def files-in-location (mapv #(list-files %) file-store-location)) 
(defn add-to-db []
  (let [mapped (map #(vec (list %1 %2)) file-store-location files-in-location)]
    (for [ [fileloc filelist] mapped]
      (for [file filelist]
     (insert-file fileloc (to-lower file) file)     
      )
    )
  )
)    
 
(add-to-db)
 

We create a table with three fields, the file name lower cased, file path and file name original text. The SQL query like clause don't support case insensitive match, keep a lower case version is very helpful.

Query data with SQL statement

A simple query will look like

 
(j/query demo-settings ["select * from filetable limit 10"])
 

Suppose we want to query that the file name contains a substring, construct sql like this

 
(defn sql-like [s]
  (str "select * from filetable where name like '%" s  "%'")
)
 

Match a substring

 
 
(defn pretty-record [i rec]
  (println (str i ":"  (:path rec) "\\" (:name rec) ))
)
 
(defn open-record [i rec target]
  (if (= i target)
    (shell "explorer.exe" (str "\"" (:path rec) "\\" (:name rec) "\""))
  )
)
 
(defn pretty-query [sql record-handler & {:keys [target] :or {target -1}}]
  (let [result (j/query demo-settings [sql]) ]
    (loop [rest-result result i 0]
      (when (not (empty? rest-result))
        (if (= target -1)
          (record-handler i (first rest-result))
          (record-handler i (first rest-result) target)
        )
 
        (recur (rest rest-result) (inc i))
      )
    )
  )
)
 
(defn only-show [sql]
  (pretty-query sql pretty-record)
)
 
(defn only-open [sql target]
  (if (= target -1)
    (println "wrong target")
    (pretty-query sql open-record :target target)  
  )
 
)
 
(defn match-substr 
  ([s] (only-show (sql-like s)))
  ([s target] (only-open (sql-like s) target))
)
 

With these functions, we can submit a word, it will list all files contains the word.

 
(match-substr "big")
 

It will print all matching files , each has an index start from 0. Here we also defined another handler for each matching record. We can specify a target index, means we wish to open this file indicated by function open-record.

 
(match-substr "big" 0)
 

This script is helpful when you have a lot of files scattered all over the places and its hard to locate them quickly. For example, you only remember part of the file name, maybe just a word. And you are not sure which folder the file resides.