Introducing Solr4 in Alfresco: What does it do? How does it work? and What’s new?
What is Solr?
Solr is the open source enterprise search platform built on Apache Lucene. Solr is written in Java, runs as a standalone search server and uses Lucene as indexing and search engine. The typical process is the following:
- Alfresco sends HTTP and XML input to Solr and searches for content,
- Solr updates the cores or indexes and returns the result of the query in XML or JSON format.
Solr4 search server brings improvements and new features over Solr 1.4 with respect to scalability, performance, and flexibility.
In particular, Solr4 offers:
- More compact disc formats – Uses less memory for Indexing
- Faster index rebuilding
- Simpler and faster wildcard querying
- Use of doc values for faceting and ordering – Lower Memory, Improved Performance
- More accurate results and facet count
- Integrated Solr date math for d:date and d:datetime types – date:[NOW-1DAY TO NOW+7DAYS]
- Use of primitive types – Smaller index overhead
- Support for spell checking and suggestion
- Support for site shortnames using SITE in queries and faceting using TAG
- Special tag support in queries and faceting
New Solr “Document Store”:
Solr documents are stored on disk as below
- All fields and text representation of content
- Metadata first, then content
- Binary & Compressed
There are two cores or indexes in Solr version 4:
1. WorkspaceStore: used for searching all live content stored at alfresco/solr4 within the Solr 4 search server.
2. ArchiveStore: used for searching content that has been marked as deleted at alfresco/solr4 within the Solr 4 search server.
Solr4 Can rebuild index from disk without refetching unchanged nodes from Alfresco, i.e Avoids re-fetching from Alfresco for content index or meta-data only updates.
Alfresco One 5.0 introduces the concept of eventual consistency to overcome the scalability limitations of in-transaction indexing.
Alfresco One 5.0 with the Solr 4 subsystem does not include any transactional indexing. In other words, Alfresco removes the need to have the database and indexes in perfect sync at any given time and relies on an index that gets updated at configurable intervals (default: 15s) by Solr 4 itself.
The index tracker takes care of polling Alfresco for new transactions and proceeds to update its index. In this sense, indexes will eventually be consistent with the database.
Alfresco 5.0 Model Changes
- New <facetable> model entry is added to Content Model
- Not required but recommend review custom models before upgrade
- Fall back rules where <facetable> not specified.
Apache Solr reference Guide: Read