How to index mysql data in Solr

One of the biggest data sources that solr can index is from databases like MySQL, SQLServer, etc. It's a very common use case: index data that stored in relational database and Solr provides necessary facilitates to do it. This post shows how to index data from MySQL with DataImportHandler.

The first step is to down a release package.

The package contains some examples, including an example using hsqldb as data source. In this post we will use mysql. The table stored blog post looks like

+--------------+---------------+------+-----+---------+----------------+
| Field        | Type          | Null | Key | Default | Extra          |
+--------------+---------------+------+-----+---------+----------------+
| postid       | int(11)       | NO   | PRI | NULL    | auto_increment |
| title        | varchar(1000) | NO   |     |         |                |
| contents     | mediumtext    | NO   |     | NULL    |                |
| insertdate   | datetime      | YES  |     | NULL    |                |
| keywords     | varchar(1000) | NO   |     |         |                |
| description  | text          | NO   |     | NULL    |                |
| category     | varchar(100)  | NO   |     |         |                |
| publishstate | smallint(6)   | YES  |     | 1       |                |
| seotitle     | varchar(1000) | YES  | MUL |         |                |
+--------------+---------------+------+-----+---------+----------------+

Based on the hsql example, we just need a little modification to use mysql instead. Copy the folder F:\setup\jar\solr-4.2.1\example\example-DIH\solr and rename it to mysql. Then open F:\setup\jar\solr-4.2.1\example\example-DIH\mysql\db\conf\db-data-config.xml. Change it to

 
<dataConfig>
    <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/itstuff?useUnicode=true&characterEncoding=UTF-8" user="root" password="pass" />
    <document>
        <entity name="post" query="select * from post">
            <field column="postid" name="id" />
            <field column="title" name="name" />
            <field column="contents" name="contenttext" />
        </entity>
    </document>
</dataConfig>
 
 
 

If you are using Solr 6, the XML should escape the ampersand, I guess Solr 6 is stricter than Solr 4 on XML syntax. Otherwise you get an XML parsing error.

 
    <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/makble?useUnicode=true&#038;characterEncoding=UTF-8" user="root" password="" />
 

This configuration map the title and contents column to Solr fields, the "name" field is already defined. Add a line to F:\setup\jar\solr-4.2.1\example\example-DIH\mysql\db\conf\schema.xml

 
<field name="contenttext" type="text" indexed="true" stored="false"/>
 

In Solr 6 you should rename the

c:\App\solr-6.3.0\example\example-DIH\mysql\solr\conf\managed-schema to schema.xml.

Now you can start the Jetty server with this command:

 
java -Dsolr.solr.home="./example-DIH/mysql/" -jar start.jar
 

Or in Solr 6

 
solr start -p 8983 -s "C:\App\solr-6.3.0\example\example-DIH\mysql"
 

Now goto http://localhost:8983/solr/#/db/dataimport//dataimport.

solr mysql import data

Select your table name in entity selection menu and click execute button.

If your classpath doesn't include MySQL JDBC driver, you will get an error, see the solution Could not load driver: com.mysql.jdbc.Driver Processing Document.

One of the most useful features of Solr is generating related post, a simple query will generate a nice list of related posts:

 
name:Overview of Java persistent
 

Just search the title with the post title. The result is not bad.