[siren-user] NumericRangeQuery in Siren...?

Renaud Delbru renaud.delbru at deri.org
Tue Dec 14 16:44:57 GMT 2010


Hi Mike, Jeroen,

On 09/12/10 14:46, Mike Hugo wrote:
> I was also looking at extending SIREn to support prefix and wildcard 
> queries but probably won't be able to get to it for a while.  I 
> noticed that classes like SirenPhraseQuery and SirenTupleQuery were 
> basically taken from Lucene and adapted for the Siren use case -- is 
> there a reason these classes cannot just extend from the Lucene Query 
> classes, rather than wholesale copying and adapting?
No, this is not possible, because of the way SIREn works.
SIREn is using another type of index data structure than Lucene. SIREn 
is storing additional information, such as the tuple and cell ids, which 
are used during query processing. Lucene query classes are not aware of 
this additional information, and therefore cannot be used to construct 
Cell or Tuple type of queries.
How do query processign works in Lucene and SIREn. In fact, all is 
starting with basic query class, such as TermQuery (or SirenTermQuery) 
or PhraseQuery (or SirenPhraseQuery) (PrefixTermQuery, FuzzyTermQuery, 
RangeQuery, and others are what I call basic query classes). These 
classes are the building blocks when creating more complex queries. 
During query processing, Lucene and SIREn are first processing these 
queries in order to get the necessary information (doc ids and term 
positions in Lucene, doc ids, term positions, tuple and cell ids in 
SIREn) to answer more complex query upfront.
If you are using a Lucene's TermQuery, then the tuple and cell 
information will be not available upfront and therefore, SIREn will be 
not able to compute the correct results for SirenTupleQuery or 
SirenCellQuery.
This is the reason why, in SIREn, we have to reimplement all the basic 
query classes. Most of the time, it represents not too much work: it is 
just a question of making the tuple and cell ids available for upfront 
query processing.
> Most of the code looks relatively the same with the exception of the 
> scorer - is that the essential difference?  In the current form it's 
> difficult to see what needs to be changed to support the SIREn use case.
Yes, most of the time, it is only the Scorer that needs to be 
reimplemented. In Lucene, the *Query classes are just a end-user 
interface. All the query processing is done in fact in the associated 
*Scorer classes (which are hidden from the end user).
> SIREn has been a huge help to us in searching RDF triples, I'm hoping 
> that we'll be able to contribute back at some point.  I know there 
> isn't a public source repository (yet) - what do you think of creating 
> a repository at github.com <http://github.com> (git), 
> http://bitbucket.org/ (mercurial) or code.google.com 
> <http://code.google.com> / assembla.com <http://assembla.com> 
> (subversion) ?

The main problem is that SIREn repository is part of the Sindice 
project, and therefore linked to other component of the Sindice project.
We are not able to make the svn repository public at the moment due to 
security reason.
One solution will be to periodically synchronise our private svn 
repository to a public one (e.g., github), but I don't know if this is 
something easy to do. IF you have experience with such problem, comments 
and advices are welcome.

Kind Regards,
-- 
Renaud Delbru


More information about the siren mailing list