java - Why solr reindex data on highlighting? -

- February 15, 2012

i wrote custom tokenizer solr, when first add records solr, going throug tokenizer , other filters, when going through tokenizer call web service , add needed attributes. after can use search without sending requests web service. when use search highlighting data going through tokenizer again, should not going through tokenizer again?

when highlighter run on text highlight, analyzer , tokenizer field re-run on text score different tokens against submitted text, determine fragment best match query produced. can see this code around line #62 of highlighter.java in lucene.

there few options might negating need re-parse document text, given options on the community wiki highlighting:

for standard highlighter:

it not require special datastructures such termvectors, although use them if present. if not, highlighter re-analyze document on-the-fly highlight it. highlighter choice wide variety of search use-cases.

there 2 other highlighter-implementations might want at, either 1 uses other support structures might avoid doing retokenizing / analysis of field (i think testing lot quicker me right now).

fastvector highlighter: fastvector highlighter requires term vector options (termvectors, termpositions, , termoffsets) on field.

postings highlighter: postings highlighter requires storeoffsetswithpositions configured on field. more compact , efficient structure term vectors, not appropriate huge numbers of query terms.

you can switch highlighting implementation using hl.usefastvectorhighligter=true or adding <highlighting class="org.apache.solr.highlight.postingssolrhighlighter"/> searchcomponent definition.

Search This Blog

Back

java - Why solr reindex data on highlighting? -

Comments

Post a Comment

Popular posts from this blog

c# - HttpResponseMessage System.InvalidOperationException -

sql - Postgresql error: "failed to find conversion function from unknown to text" -

how to remove index.php file from url in codeigniter? -