[siren-user] Exact cell query?

Renaud Delbru renaud.delbru at deri.org
Mon Nov 22 09:17:25 GMT 2010


  Hi Jeroen,

On 21/11/10 21:49, Jeroen Steggink wrote:
> Hi Mike and Renaud,
>
> I experienced a similar problem like yours Mike.
>
> When searching using the Solr implementation several questions have 
> arisen.
>
> Let's say I have the following 3 documents.
>
> doc1:
> url1 label "a"
> url1 label "b"
> url1 label "c"
>
> doc2:
> url2 label "a b"
> url2 label "a c"
> url2 label "a a c b"
>
> doc3:
> url3 label "a c"
>
> When searching for the term "a", doc2 will get a higher score than 
> doc1 and doc3 will have the same score as doc1.
>
> Firstly, when searching for a term, the documents with multiple 
> occurrence of that term will get a higher score than the documents 
> that only have one occurence and one term in total. Normally I would 
> prefer this. However, in this case I'm not interested in the matches 
> over the whole document, but the match in one triple.
>
> Secondly, higher scores for an exact match than a not-exact match is 
> not possible. I would like doc1 to have a higher score than doc2 and 
> doc3, since doc1 has an exact match in the first triple.
Yes, since exact match is not implement yet. As I explained previously, 
it will be overkill to add in the index the necessary information to 
answer such queries (for the details, I need to store one integer per 
term occurence).
In the future, it will be possible, when I will move from the Lucene's 
Payload interface (not very efficient and compcat) to our own index data 
structure.
For the moment, you could try the trick of adding hidden tokens at the 
beginning and end of the cell. Therefore, you will be able to emulate 
exact cell query, and in addition, if you query:
tuple(label, exact_cell(a)) OR tuple(label, a)
then the scoring mechanism will rank higher doc 1 than doc 3. however, I 
am not sure if doc 1 will be rank higher than doc 3. If doc 3 is ranked 
higher, you could add a negative boost to the clause tuple(label, a).

hope this helps,
cheers
-- 
Renaud Delbru


More information about the siren mailing list