Pre-sorting

更新时间:
复制 MD 格式

On terabyte-scale datasets, sorting query results requires LindormSearch to visit every matching document in each segment, which increases response time and resource usage. Pre-sorting reorders documents within each segment at merge time. When a query's sort order matches the merge-time sort, LindormSearch stops collecting results from each segment as soon as it has enough — a technique called early termination. This reduces query response time significantly without requiring changes to query logic.

How it works

By default, LindormSearch must visit every matching document in a segment to identify the top-N results. When you enable pre-sorting, documents within each segment are reordered during merge operations. If the sort order in your query matches the merge-time sort, LindormSearch can stop collecting results from each segment after it finds enough matches — instead of scanning all matching documents.

The two-step setup is:

  1. Configure SortingMergePolicyFactory in solrconfig.xml to define the sort order at merge time.

  2. Set segmentTerminateEarly=true in your queries to activate early termination.

Both steps are required. If either is missing, LindormSearch sorts all matching documents as normal.

Configure pre-sorting

Step 1: Set MergePolicy in solrconfig.xml

In solrconfig.xml, replace the default merge policy with SortingMergePolicyFactory and set the sort field. The following example sorts by timestamp in descending order:

<mergePolicyFactory class="org.apache.solr.index.SortingMergePolicyFactory">
  <str name="sort">timestamp desc</str>
  <str name="wrapped.prefix">inner</str>
  <str name="inner.class">org.apache.solr.index.TieredMergePolicyFactory</str>
  <int name="inner.maxMergeAtOnce">10</int>
  <int name="inner.segmentsPerTier">10</int>
</mergePolicyFactory>

For details on MergePolicy configuration options, see Customizing merge policies.

Step 2: Enable early termination in queries

Set segmentTerminateEarly=true in your query. The sort parameter must exactly match the sort value configured in SortingMergePolicyFactory. If the values differ, early termination does not activate and LindormSearch sorts all matching documents.

curl "http://localhost:8983/solr/testcollection/query?q=*:*&sort=timestamp+desc&rows=10&segmentTerminateEarly=true"

Limitations

Limitation

Details

Sort parameters must match

The sort value in the query must exactly match the sort value in SortingMergePolicyFactory. A mismatch disables early termination.

segmentTerminateEarly is required

Without this parameter, LindormSearch sorts all matching documents regardless of MergePolicy configuration.

numFound is approximate

If pre-sorting is used, an inaccurate value is returned for the numFound parameter.