java - Why solr reindex data on highlighting? -


i wrote custom tokenizer solr, when first add records solr, going throug tokenizer , other filters, when going through tokenizer call web service , add needed attributes. after can use search without sending requests web service. when use search highlighting data going through tokenizer again, should not going through tokenizer again?

when highlighter run on text highlight, analyzer , tokenizer field re-run on text score different tokens against submitted text, determine fragment best match query produced. can see this code around line #62 of highlighter.java in lucene.

there few options might negating need re-parse document text, given options on the community wiki highlighting:

for standard highlighter:

it not require special datastructures such termvectors, although use them if present. if not, highlighter re-analyze document on-the-fly highlight it. highlighter choice wide variety of search use-cases.

there 2 other highlighter-implementations might want at, either 1 uses other support structures might avoid doing retokenizing / analysis of field (i think testing lot quicker me right now).

fastvector highlighter: fastvector highlighter requires term vector options (termvectors, termpositions, , termoffsets) on field.

postings highlighter: postings highlighter requires storeoffsetswithpositions configured on field. more compact , efficient structure term vectors, not appropriate huge numbers of query terms.

you can switch highlighting implementation using hl.usefastvectorhighligter=true or adding <highlighting class="org.apache.solr.highlight.postingssolrhighlighter"/> searchcomponent definition.


Comments

Popular posts from this blog

javascript - how to protect a flash video from refresh? -

visual studio 2010 - Connect to informix database windows form application -

android - Associate same looper with different threads -