|

Add full-text search to your application with Hibernate Search


Take your skills to the next level!

The Persistence Hub is the place to be for every Java developer. It gives you access to all my premium video courses, monthly Java Persistence News, monthly coding problems, and regular expert sessions.


Full-text search has become a common requirement for modern enterprise applications, and there are several good implementations available, like Apache Lucene and Elasticsearch. They provide powerful indexing and search capabilities that allow you to easily add full-text search capabilities to your application.

But one important questions remains when you decide to add Apache Lucene or Elasticsearch to your application: How do you keep the indexes in sync with of your database?

You need to update the index every time you create, update or delete an indexed entity. Doing that programmatically is a tedious and error-prone task.

Hibernate Search provides an easier solution. It integrates with Hibernate ORM, updates the Lucene and Elasticsearch indexes transparently and provides a query DSL for full-text queries.

Let’s have a look at what you need to do to add Hibernate Search to your project and to perform your first full-text search. I will show you some of the more advanced features in future blog posts.

Project setup

Add Hibernate Search to your project

The first thing you need to do, if you want to add Hibernate Search to your project is to add the required library to your project. That is the hibernate-search-orm.jar.

I’m using Hibernate Search 5.6.0.Final for this example which requires Hibernate ORM 5.1.3.Final. If you want to use the latest Hibernate ORM release (5.2.7), you can do that with Hibernate Search 5.7.0.CR1.

<dependency>
   <groupId>org.hibernate</groupId>
   <artifactId>hibernate-search-orm</artifactId>
   <version>5.6.0.Final</version>
</dependency>

Configuration

You don’t need to provide any configuration when you start to use Hibernate Search. The default values provide a good starting point for most standard applications.

I recommend using the filesystem DirectoryProvider in the beginning, which is also used by default. It stores the Lucene indexes in the file system which allows you to easily inspect them and get a better understanding of your system. When you’re familiar with Hibernate Search and Lucene, you should also have a look at the other supported DirectoryProviders.

You can configure the filesystem DirectoryProvider with 2 configuration parameter which you can provide in the persistence.xml file. You can set the default DirectoryProvider to filesystem with the configuration parameter hibernate.search.default.directory_provider and you can provide the base directory of the index with the hibernate.search.default.indexBase parameter.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<persistence xmlns="http://xmlns.jcp.org/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="2.1" xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/persistence http://xmlns.jcp.org/xml/ns/persistence/persistence_2_1.xsd">
    <persistence-unit name="my-persistence-unit">
        ...
      
        <properties>
          ...
			
			    <property name="hibernate.search.default.directory_provider" value="filesystem"/>
			    <property name="hibernate.search.default.indexBase" value="./lucene/indexes"/>
        </properties>
    </persistence-unit>
</persistence>

Index entity attributes

Indexing one of your entities requires 2 things:

  1. You need to annotate the entitiy with @Indexed to tell Hibernate Search to index the entity.
  2. You need to annotate the fields you want to index with the @Field annotation. This annotation also allows you to define how the attributes will be indexed. I will get into more detail about that in one of the following blog posts.

Let’s start with a simple example. The following code snippet shows the simple Tweet entity. It persists the date, user, message and URL of a tweet and I want to be able to search for the userName and the message. I, therefore, annotate both attributes with Hibernate Search’s @Field annotation. That tells Hibernate Search to index the both attributes in Lucene and use the primary key attribute id as the identifier.

@Indexed
@Entity
public class Tweet {

	@Id
	@GeneratedValue(strategy = GenerationType.AUTO)
	@Column(name = “id”, updatable = false, nullable = false)
	private Long id;

	@Column
	private Date postedAt;

	@Column
	@Field
	private String userName;

	@Column
	@Field
	private String message;

	@Column
	private String url;

	@Version
	private Long version;

	…
}

That’s all you need to do to add an Entity to the Lucene index. You can now use the userName and the message attribute in a full-text search query.

But before you can do that, you might need to create the initial index based on the data already stored in your database.

Create the initial index

Hibernate Search manages the Lucene index and keeps it in sync when you change indexed entities. That’s great when you start with an empty database. But most often, that’s not the case. If you’re working with an existing database, you need to add the existing records to your Lucene index.

You can do that with a few lines of code and Hibernate Search’s batch indexer.

EntityManager em = emf.createEntityManager();
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(em);
fullTextEntityManager.createIndexer().startAndWait();

Perform a simple full-text search

The entity attributes you annotated with @Field are now indexed, and you can use them in a full-text search. I created a small test database with 3 of my recent tweets. Each of them promotes a different blog posts.

Similar to a search on Google, you can now use Hibernate Search to do a full-text search on the messages of these tweets. The following code snippet shows a query that searches for the words “validate” and “Hibernate” in the messages of the tweets.

EntityManager em = emf.createEntityManager();
em.getTransaction().begin();

FullTextEntityManager fullTextEm = Search.getFullTextEntityManager(em);
QueryBuilder tweetQb = fullTextEm.getSearchFactory().buildQueryBuilder().forEntity(Tweet.class).get();
Query fullTextQuery = tweetQb.keyword().onField(Tweet_.message.getName()).matching(“validate Hibernate”).createQuery();
List results = fullTextEm.createFullTextQuery(fullTextQuery).getResultList();

In the first step, you need to get a FullTextEntityManager. It extends the EntityManager interface with full-text search capabilities and allows you to create a QueryBuilder for the entity class you’re searching. In this example, I create a QueryBuilder for my Tweet entity. You then use the QueryBuilder to define your query. I want to do a keyword search on the message field. That searches the index with message attributes for one or more words. In this case, I’m searching for the words “validate” and “Hibernate”. Then I create a query and provide it to the createFullTextQuery method. This method returns a FullTextQuery interface which extends JPA’s Query interface. And then I call the getResultList method to execute the query and get a List of results.

This query returns the primary keys of 2 Tweets and Hibernate ORM uses them to select the Tweet entities from the database.

15:04:29,704 DEBUG SQL:92 – select this_.id as id1_0_0_, this_.message as message2_0_0_, this_.postedAt as postedAt3_0_0_, this_.url as url4_0_0_, this_.userName as userName5_0_0_, this_.version as version6_0_0_ from Tweet this_ where (this_.id in (?, ?))
15:04:29,707 INFO TestSearchTweets:55 – Tweet [id=3, postedAt=2017-02-02 00:00:00.0, userName=thjanssen123, message=How to automatically validate entities with Hibernate Validator BeanValidation, url=http://www.thoughts-on-java.org/automatically-validate-entities-with-hibernate-validator/, version=0]
15:04:29,707 INFO TestSearchTweets:55 – Tweet [id=2, postedAt=2017-01-24 00:00:00.0, userName=thjanssen123, message=5 tips to write efficient queries with JPA and Hibernate, url=http://www.thoughts-on-java.org/5-tips-write-efficient-queries-jpa-hibernate/, version=0]1

You might be surprised that the query returned 2 Tweets because one of them doesn’t contain the word “validate”. Similar to a Google search, Lucene also returns documents that contain only one of the search terms. But as you can see in the log output, the Tweet with the message “How to automatically validate entities with Hibernate Validator BeanValidation” received the better ranking because it contained both search terms.

This example showed just a very small part of the query capabilities of Lucene and Hibernate Search. I will dive deeper into this topic in a future blog post.

The last thing I want to talk about in this post is one of the big advantages of Hibernate Search: What do you need to do to keep the full-text search index in sync with the database.

Keep the index in sync

There is nothing you need to do to keep the Lucene index in sync with your database as long as you use Hibernate ORM to perform the create, update or remove operation. The following code snippet shows an example searching and updating an entity in 3 independent transactions. Hibernate Search updates the Lucene index when the EntityManager updates the Tweet entity in the 2nd transaction, and the query in the 3rd transaction finds the changed entity.

// Transaction 1: Check that no tweet matches the search string
EntityManager em = emf.createEntityManager();
em.getTransaction().begin();

FullTextEntityManager fullTextEm = Search.getFullTextEntityManager(em);
QueryBuilder tweetQb = fullTextEm.getSearchFactory().buildQueryBuilder().forEntity(Tweet.class).get();
Query fullTextQuery = tweetQb.keyword().onField(Tweet_.message.getName()).matching(“Message updated”).createQuery();
List results = fullTextEm.createFullTextQuery(fullTextQuery).getResultList();
Assert.assertEquals(0, results.size());

em.getTransaction().commit();
em.close();

// Transaction 2: Update a tweet
em = emf.createEntityManager();
em.getTransaction().begin();

Tweet tweet = em.find(Tweet.class, 1L);
tweet.setMessage(“Message updated – “+tweet.getMessage());

em.getTransaction().commit();
em.close();

// Transaction 3: Check that 1 tweet matches the search string
em = emf.createEntityManager();
em.getTransaction().begin();

fullTextEm = Search.getFullTextEntityManager(em);
tweetQb = fullTextEm.getSearchFactory().buildQueryBuilder().forEntity(Tweet.class).get();
fullTextQuery = tweetQb.keyword().onField(Tweet_.message.getName()).matching(“Message updated”).createQuery();
results = fullTextEm.createFullTextQuery(fullTextQuery).getResultList();
Assert.assertEquals(1, results.size());

Summary

Hibernate Search integrates the full-text search capabilities of Lucene and Elasticsearch with your Hibernate ORM entities. It transparently updates the indexes everytime you create, update or delete an indexed entity and provides a powerful query DSL to define full-text queries.

You can do a lot more with Hibernate Search than I was able to show you in this blog post. I will show you how to perform more complex full-text queries in next weeks blog posts.

One Comment

  1. Thanks for this article, I really did not know anything about full-text search.

Leave a Reply to Luciano Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.