I find my program gets slower and slower when the data grows. The program is a web page crawler, when it downloaded a page and save it to MySQL database, it saved the url of the page at the same time.

When the crawler visit the target site again, it may catches the already grabbed url. The program has to search in the database to determine whether the url is already existed. A simple SQL query can do the trick.

 
    public boolean IsExistDb(String table,String col,String val,Connection con){
        try {
 
            Statement stmt = con.createStatement();
 
            String query = "select count(*) as Count from " +table+ " where " +col+ " = '"+val+"'";
            ResultSet rs = stmt.executeQuery(query);        
 
            if(rs.next()) {            
                int Count = rs.getInt(1);
 
                return Count == 0 ? false : true;
            }
            } catch (Exception e) {
                e.printStackTrace();
                return false;
            }
 
            return true;
    }
 

When the data grows to 20 thousands records, the program slow down and runs like in a slow motion mode.

Its obvious that the reason is there is no index on the url column. The "show index from table" shows all indexes on a table.

 
+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name    | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| post  |          0 | PRIMARY     |            1 | postid      | A         |       21045 |     NULL | NULL   |      | BTREE      |         |
+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
 

Lets add an index to the url column

 
CREATE INDEX urlindex ON post ( url(300) ) ; 
 

The command create an index with name "urlindex" on the 'url' column of post table. My 'url' column has the text type, so the length is a must.

The length means use the string from start to 300th character as indexing key. If you don't know how to set it , run this command

 
SELECT LENGTH(url) AS LENGTH FROM post ORDER BY LENGTH DESC LIMIT 1;
 

It shows the length of the longest url in the table, choose a value slightly bigger than this value.

The index is very effective, now the program runs like a lighting.