Apache Solr is a powerful open source search engine that allows you to index a wide variety of document types including XML, JSON, CSV, Word, PDF … Among its features we find full text search, face search, results highlighting, Dynamic clustering and database integration.
One of the main advantages of Solr is that searches can be performed using simple REST requests, so that we can perform a query through a URL such as:
Http: // localhost: 8983 / solr / gettingstarted / select? Q = surnames% 3Horla & sort = surnames + desc & start = 40 & rows = 20 & wt = xml & indent = true & defType = dismax
The results can be obtained in the form of a structured document such as XML, JSON or CSV. But also in different formats like a PHP array.
Among users of Apache Solr are large companies like Apple, eBay, Cisco.
The most popular content managers such as Drupal, WordPress and Liferay already have integration modules or plugins with Solr.
Since version 5 (available since February 20, 2015) is no longer distributed as a war file, which could be deployed on any application server. From this version Solr comes integrated in a Jetty server with a new administration interface. We can download the latest version from
http://lucene.apache.org/solr/mirrors-solr-latest-redir.html from where we will be redirected to the download page of the latest version, 5.4.1 released on 23 January.
To run Solr 5.4.1 you will need a version of Java 7 or higher, avoiding using GA 147 construction and u40, u45 and u51 updates from Oracle or OpenJDK. For more information on this issue you can visit Java Bugs in various JVMs affecting Lucene / Solr . It is recommended to use version 8 that provides better performance.
Once the download is done, unzip the file to a folder of your choice.
To start Solr we use the solr script found in the bin folder. There is a version of this script for the Linux / Unix / OSX operating systems and another one for Windows. In any case it is enough to execute
$ Bin / solr start
Once the server is up and running we can access Solr’s graphical administration interface at
http: // localhost: 8983 / solr /
To see all the options of the script we can execute:
$ Bin / solr -help
And for the start option:
$ Bin / solr start -help
We can not finish talking about the start of Solr without mentioning SolrCloud. SolrCloud is the name with which they are called a series of functionalities added from version 4 to facilitate the administration of a cluster of Solr servers for scalability, tolerance to failures and high availability. To access these functionalities we must start the Solr server in SolrCloud mode:
$ Bin / solr start -cloud
The detailed description of SolrCloud is beyond the scope of this article.
Once the Solr server is running, the first thing we have to do is create a core , the structure where our index will be stored. For this we execute:
$ Bin / solr create -c <core_name>
As with the start option, we can execute:
$ Bin / solr create -help
For help with the options available for color creation .
If at any point we want to delete a core we can do it with thedelete command :
$ Bin / solr delete -c <core_name>
<solr_home> / server / solr will create the folder of our core that will contain:
- Core.properties: defines the properties of the core as its name, the location of the schema.xml file and other parameters
- Conf /: Contains the configuration files. The most important:
- Solrconfig.xml: contains parameters to define the behavior of the core to alt level, as a different location for the data folder
- Schema.xml: defines the structure of the documents to be indexed. In this file you define a document as a series of fields, which must be defined through various parameters including their type. When creating a core we will not find this file, but one called managed-schema that can serve as a starting point for creating our own schema.xml. It is also possible to work in “schemaless” mode, so you do not have to edit the schema manually, but it will be created as you index documents. For our core to work in this way it is necessary to properly configure the file solrconfig.xml.
- Data /: this folder contains the files of low level products of the indexing process
Once created the core we have to add the documents that we want to index.
Although there are several methods by which we can perform this task, including using SolrJ, a Java API, here we will see the use of post a simple command line tool available only for Unix shell. However post performs its work through a Java program, SimplePostTool, which can be invoked from Windows systems.This utility is included in the post.jar file so that we can execute it from the Solr installation directory. We find the post.jar file in the <solr_home> / example / exampledocs folder. We can move it for example to bin folder and run from <solr_home>
$ Java -jar bin / post.jar -h
If we work on Linux / Unix / OsX, for the same purpose we can do
$ Bin / post -h
And, for example, to index in the core of name products all the pdf extension documents that are in the catalog folder we do:
$ Bin / post -c products catalog / *. Pdf
When we make a request to a Solr server, it is initially processed by a request handler . In the case of search requests, the request handler will send the request to a query parser , which is the one who finally interprets the terms and parameters for the search process. Each query parser has its own syntax although there are a number of parameters common to all of these. Three of these parseadores are the “standard”, DisMax and Extended DisMax (eDisMax). The first one already has the possibility to perform precise searches, while DisMax also provides a great tolerance to syntax errors and eDisMax allows the use of Lucerne’s complete syntax (the search engine running under Solr).
Once the search is done, a reponse writer is in charge of the final format of the results. Among the most used are the XML Response Writer and the JSON Response Writer.
The most commonly used parameters that are valid for the threequery parsers are:
- DefType: defines the query parser to use, for example defType = dismax. If we do not indicate this parameter Solr will use the standard parser (defType = lucene)
- Sort: sorts the result of the search in ascending or descending order according to the score or any other characteristic indicated. For example: sort results by price field in descending order: sort = price + des
- Start: indicates the registration number from which the results are displayed (0 by default). We can use this parameter together with the rows parameter for pagination.
- Rows: number of results to display. The default value is 10
- Fq: through this parameter we filter the result of the search.This can be useful for accelerating complex searches because they are cached regardless of the filters that are applied. We can specify several parameters fq in the same query: show products with prices between 10 and 20 and that are in stock, for example
Fq = price: [10 to 20] & fq = stock: 1
We can also perform complex queries using the Boolean + and – operators. This way the previous query would be:
Fq = + price: [10 to 20] + stock: 1
Or assuming that the stock field can only have the values 0 and 1:
Fq = + price: [10 to 20] -stock: 0
Keep in mind that the first example is stored in the cache as two different filters while the following is saved as one. If the parameters price and stock are often used together in a search it is better then making the request to our index with a single parameter fq using the format of the second example.
- Debug: this is a parameter that we can use during our development. The possible values are:
- Query: to get information about the query only
- Timing: information about the time it takes the query to be processed
- Results: information on the score of the results
- All: all available information about the query
- Wt: Defines the response writer to use to display the results.Some of the possible values are xml, csv, json and php. The latter gets the results in the form of a PHP array. For example we could use the following code:
$ Code = file_get_contents ('http: // localhost: 8983 / solr / products / select? Q = name: iPod & wt = php');
Eval (“$ result =”. $ Code. “;”);
Print_r ($ result);
6. Server Status and Shutdown
We can get basic information (in JSON format) on our Solr server with the status command:
$ Bin / solr status
To stop the Solr server, we execute the stop command. We must indicate the port with the -p option.
$ Bin / solr stop -p <port>
We can also stop all instances running with the -all option.
$ Bin / solr stop -all