[siren-user] Parsing User-Input Query Text
Adam McLellan
akmclell at lakeheadu.ca
Wed Jun 30 05:04:53 IST 2010
Hi Renaud,
Thanks for your reply. It's true that for the existing data I'm indexing,
regular Lucene would probably be a better choice. However, another part of
the project is to provide a semantic-based extension to the existing Google
Gadget metadata specs, so that gadget authors can provide all sorts of data
not allowed by the existing spec. The gadgets authors would not be
restricted to any particular set of predicates. So even though I'm
currently focused on base functionality that Lucene could accommodate, I
expect to outgrow its capabilities before too long. When I stumbled across
SIREn I was very pleased since it's exactly what I was looking for. My
originally-planned approach was going to be running queries through SPARQL,
from which it would have been much more difficult to get reasonable results.
Thanks,
Adam
On Tue, Jun 29, 2010 at 9:50 AM, Renaud Delbru <renaud.delbru at deri.org>wrote:
> Hi Adam,
> sorry for the late reply, see my comments below.
>
>
> On 23/06/10 17:11, Adam McLellan wrote:
>
>>
>> This project will almost certainly be released as open source once
>> complete. I was looking through the available objects in Siren and didn't
>> see any extension of Lucene's QueryParser, but I thought I should ask since
>> the SirenPhraseQuery's docs made mention of automatic use via the
>> QueryParser.
>>
> Indeed, small problem in the doc, thanks for reporting this.
>
>> What I am primarily hoping to support are queries such as "(tim AND
>> berners AND lee) OR timbl OR http://www.w3.org/People/Berners-Lee/card <
>> http://sindice.com/search?q=%28tim+AND+berners+AND+lee%29+OR+timbl+OR+http%3A%2F%2Fwww.w3.org%2FPeople%2FBerners-Lee%2Fcard&qt=term>"
>> which you have as an example for Sindice. The exact search terms would
>> differ when trying to locate a Google Gadget, but the idea is the same.
>>
> Ok. The first question I will ask you is: do you really need SIREn features
> to do this ?
> I am asking this because the query you are providing is something that
> Lucene supports. If your queries will be keyword-based only or if your data
> schema is relatively small, then Lucene could be a better choice.
> SIREn adds additional features to Lucene for managing large amount of
> heterogeneous data in a efficient way. However, if your data is not that
> heterogeneous, or relatively small, then Lucene will do the job. In
> addition, you'll be able to use the original QueryParser of Lucene out of
> the box.
> Also, SIREn will be a better choice if you are expecting a group of
> keywords to match a specific value. For example, if you want to restrict
> (tim AND berners AND lee) to match only one value (or object in an RDF
> triple), then SIREn could be a better choice.
>
> If you need some indications on how to index RDF data using Lucene, just
> asks your questions. I can help. We can also discuss your use cases or
> scenario to see if Lucene or SIREn fits better.
>
> Regards
>
> --
> Renaud Delbru
> _______________________________________________
> siren mailing list
> siren at lists.deri.org
> http://lists.deri.org/mailman/listinfo/siren
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.deri.org/pipermail/siren/attachments/20100630/a10bd55b/attachment.htm>
More information about the siren
mailing list