Add JSON document to Solr index with Java

Using curl to post documents to Solr is a frequently used method, but sometimes we need to do it programmatically . For example, the fields that generated by code.

This post show how to add a String to a Solr document and index it with Java.

Solr URL to update json

Solr use REST like API to operate the index data, means those APIs are URL based. To update JSON document we use the following URL.

 
http://localhost:8983/solr/pagecollection/update/json?wt=json&commit=true
 

The pagecollection is the Solr core we want to add document to.

The wt parameter indicate the response format will be JSON, And commit true means the added document will be visible immediately.

Gradle dependencies

To be able to post URL to Solr REST service, we need Apache HttpClient and Commons IO packages. Add these dependencies to Gradle build.

 
    compile 'org.apache.httpcomponents:httpclient:4.0-alpha4'
 
    compile 'org.apache.commons:commons-io:1.3.2'
 

Document schema

Suppose we want to index each page of pdf document.

Our document will contains file name, file path, text content of the page and the page number.

 
{
    "filename" : "",
    "filepath" : "",
    "contenttext" : "",
    "page" : 3
}
 

The Java code

 
    public void postSolr( String content , int page, String filename, String filepath) {
 
        try {
            DefaultHttpClient httpClient = new DefaultHttpClient();
            HttpPost post = new HttpPost("http://localhost:8983/solr/pagecollection/update/json?wt=json&commit=true");
            StringEntity entity  = new StringEntity("{\"add\": { \"doc\": {\"filename\":\"" + filename + "\", \"filepath\":\""+ filepath +"\", \"contenttext\":\"" + content + "\", \"page\" : \"" + page + "\"}}}", "UTF-8");
            entity.setContentType("application/json");
 
 
            post.setEntity(entity);                
            HttpResponse response = httpClient.execute(post);
            HttpEntity httpEntity = response.getEntity();
            InputStream in = httpEntity.getContent();
 
            String encoding = httpEntity.getContentEncoding() == null ? "UTF-8" : httpEntity.getContentEncoding().getName();
            encoding = encoding == null ? "UTF-8" : encoding;
            String responseText = IOUtils.toString(in, encoding);
            System.out.println("response Text is " + responseText);
 
        } catch (UnsupportedEncodingException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (URISyntaxException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (HttpException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
 
 
    }
 

The response when successfully post.

 
 
response Text is {"responseHeader":{"status":0,"QTime":1352}}
 

Now you can query with the URL .

 
http://localhost:8983/solr/pagecollection/select?q=*&wt=json
 

The response looks like.

 
{"responseHeader":{"status":0,"QTime":1,"handler":"org.apache.solr.handler.component.SearchHandler","params":{"q":"*","wt":"json"}},"response":{"numFound":15,"start":0,"docs":[{"filename":"[doc.pdf","filepath":"F:\\doc","page":25,"id":"0da0f681-319b-40d5-a214-b6111c26a532","_version_":1459386498423980032}]}}
 

Don't forget to escape double quotes

A new problem I encounter is the double quotes in the text. Suppose there is a double quote in text like below

 
Hello "World"
 

Then the JSON will be

 
"Hello "World""
 

The parser will stop at 'W' and expect a ',' or '}' because it thinks its the end of a JSON value.

You will get this error

 
"error":{"msg":"Expected ',' or '}': char=W
 

The solution is escape any double quotes in text

 
text = text.replace("\"", "\\\"");
 

The text will looks like this after replace

 
"Hello \"World\""
 

What if text contains \"?

 
Hello \"World\"
 

It will be replaced to

 
Hello \\"World\\"
 

The slash is escaped, not the double quotes.

The fix will be like this

 
text = text.replace("\\","\\\\");
text = text.replace("\"", "\\\"");