Many improvements were made, and bugs were fixed.
  • Is JForum useful for you? Please consider helping this project.
 
 
 
 
 

Reindexing messages

JForum uses the Lucene library, which provides an excellent platform for fast content search and indexing. Versions of JForum previous to 2.1.8 used a database-driven approach, which included several tables for word indexing and search.

Lucene stores its index in the filesystem using a special document format. On a regular development machine, JForum indexes about 1000 messages per second, value that can be higher on powerful machines.

How to reindex

There are two possible ways to reindex the messages: using a command-line tool, or using JForum's Admin Panel (web interface). It is up to you to choose which one to use, although there are some considerations.

First, if you're migrating from JForum 2.1.7 of previous to JForum2.1.8 or newer, you will have to reindex the entire database, as the old search mechanism is no longer supported. This operation is fast and painless, and it is better to use the command-line tool for this task. Secondly, JForum's Admin Panel has a section that provides information about Lucene's index status and other general information, like index size and last modification date, as well a tool to reindex messages using the same parameters available to the command-line tool. You can reindex from a range of Post IDs or message date.

Before you start: setting up index configuration

As a part of JForum configuration, you may want to take a look at the default settings used for indexing - that are OK for most environments.

Open your SystemGlobals.properties file and look for the "SEARCH" search. There you will find a set of properties, as described in the following table:

Property nameDefault valueDescription
search.indexing.enabledtrueEnable or disable search indexing. Set it to false to disable.
lucene.index.write.path ${resource.dir}/jforumLuceneIndexThe complete path to where the index will be written. The default value will write it to the directory WEB-INF/jforumLuceneIndex of your application.
Please note that the directory must be writable by the user who runs the application server instance.
lucene.indexer.ram.numdocs10000used only for reindexation. It is the number of documents to keep in memory before flushing them to the disk. Keep in mind that a higher number means a higher memory usage.
This value also affects a little bit how long the entire process will take.
lucene.indexer.db.fetch.count50Also used only for reindexation. It is the number of records to fetch from the database on each read

Index store directory permissions

The directory specified in the property lucene.index.write.path must be writable by the user who runs the web server. Not doing so will cause errors when trying to index any message

Using the command-line tool

JForum comes with a tool named "Lucene Indexer", located in the directory tools/luceneIndexer. It is a command-line interface, and can be used to reindex the entire database or just part of it. In the tool's directory there are two files - LuceneCommandLineReindexer.sh and LuceneCommandLineReindexer.bat -, being the first destined for Unix like systems and the second for Windows machines.

Invoking the tool without any arguments will provide you with a list of available options, like shown below:

Usage: LuceneCommandLineReindexer
--path full_path_to_JForum_root_directory
--type {date|message}
--firstPostId a_id
--lastPostId a_id
--fromDate dd/MM/yyyy
--toDate dd/MM/yyyy
[--recreateIndex]
[--avoidDuplicatedRecords]

The following table describes each argument.

Argument nameDescription
--pathThe complete path to where JForum is installed. Lucene Indexer will use it as base for reading the configuration files stored in WEB-INF/config.
--typeType of indexing. Can be date or message. The first enabled the use of --fromDate and --toDate, while the second enables the use of --firstPostId and --lastPostId.
--firstPostIdThe ID of the first message to index (the "Post ID"). Used only when --type=message.
--lastPostIdThe ID of the last message to index (The "Post ID"). Used only when --type=message.
--fromDateThe start date to index from. Used only when --type=date, and the date format must be in the form dd/MM/yyyy, like '23/07/2007' (July 23, 2007)
--toDateThe end date to index from. Used only when --type=date, and the date format must be in the form dd/MM/yyyy, like '23/07/2007' (July 23, 2007)
--recreateIndexIf specified (there is no value to provide, like other options), it will recreate the index from scratch.
--avoidDuplicatedRecordsThis is useful when you don't want to add documents to an existing index (instead of recreating it) and want to make sure that there will not be any duplicated record. This option makes the indexing process slower.

Usage example

Let's see a set of examples. For all cases, considere that JForum is installed at /home/www/jforum

Reindexing the entire board

We want to reindex all messages into a new brand new index. There are a total of 367.234 messages, so we will round it to 368.000 (there is no real need to round the number, but it also doesn't hurt).

sh LuceneCommandLineReindexer.sh --path=/home/www/jforum --recreateIndex --type=message --firstPostId=1 --lastPostId=368000

Reindexing from a range of date

We want to only reindex messages from a given month, and add them to an existing index. We'll not care about duplicated records here.

sh LuceneCommandLineReindexer.sh --path=/home/www/jforum --type=date --fromDate=01/03/2006 --toDate=31/03/2006

Avoiding duplicated records

Now we also want to reindex a determined date range, but this time it is necessary to make sure that there will not be any duplicated record.

sh LuceneCommandLineReindexer.sh --path=/home/www/jforum -avoidDuplicatedRecords --type=date --fromDate=15/04/2007 --toDate=27/05/2007

Using the Web interface - Admin Panel

There is a web based interface for reindexing, which also provides general statistics about the current usage of the index. To access it, go to Admin Panel -> Lucene Statistics. There you will find two boxes, one named "Search statistics" and other called "Re-Index".

Search statistics

General information about the current usage of Lucene's index. It shows how many messages are in the database, the index storage directory, its last modification date and version, as well if it is locked or no - if it is locked, then some process is currently writing to the index. The last option, "Is Post Indexed?" allows you to query the index to see if a specific message is indexed.

Re-Index

This section works much like the Command Line interface, with the difference that runs on the Tomcat instance. All options are there, and the only difference is that reindexing by date requires that you provide the start and end hour as well, while the command line tool only asks for the date.

After you click "Start" the process will start in background, and the page will refresh on each 5 seconds automatically. When the process finishes, it will change back to its previous state.