  | | | Lucene |
I understand how that recommendation could potentially cover fields with
undesired terms mixed in with the desired terms. I fail to see that it
covers the case where the undesired term(s) are I understand how that recommendation could potentially cover fields with
undesired terms mixed in with the desired terms. I fail to see that it
covers the case where the undesired term(s) are last i
A typical solution to problems in this "space " is to index marker terms to
denote boundaries in the term sequence ... in combination with things
like SpanNear and SpanNot this can be used to make Oops too quick to reply... coord() won 't quite do it since it does
terms matched in doc versus terms in query.
On Oct 3 2007 at 2 20 PM Kyle Maxwell wrote
> I 'm indexing a dataset with lotSee the Similarity.coord() method.
/** Computes a score factor based on the fraction of all query terms
that a
* document contains. This value is multiplied into scores.
*
* <p >The preI 'm indexing a dataset with lots of short fields. I have determined that it
would be useful to highly boost matches where every term in this field is
represented in the query. i.e.
Query lucene fiehttps //issues.apache.org/jira/browse/LUCENE-1017
On Oct 2 2007 at 8 25 PM Mike Klaas wrote
> On 2-Oct-07 at 3 44 PM Peter Keegan wrote
>
> > I have been experimenting with payloads and BoostOn 2-Oct-07 at 3 44 PM Peter Keegan wrote
> I have been experimenting with payloads and BoostingTermQuery
> which I think
> are excellent additions to Lucene core. Currently
> BoostingTermQuHi Peter
This sounds interesting. Can you put this in JIRA as a patch
please? I am slowly but surely working on Span query stuff so
hopefully I can get to it soon.
Thanks
Grant
On Oct 2 2
3 okt 2007 kl. 00.44 skrev Peter Keegan
>
> TermQuery 200 qps
> BoostingTermQuery (extends SpanQuery) 97 qps
> BoostingTermQuery (extends TermQuery) 130 qps
>
> Here is a version of BoostingTI have been experimenting with payloads and BoostingTermQuery which I think
are excellent additions to Lucene core. Currently BoostingTermQuery extends
SpanQuery. I would suggest changing this classhttp //www.gossamer-threads.com/lists/lucene/java-dev/53351 might be
of interest.
On Oct 1 2007 at 10 25 PM Johnny R. Ruiz III wrote
> Hi
>
> I can 't seem to find a way to delete duplicate inHere 's a couple of fragments alter to suit....
public void doRemove(Directory dir) throws Exception
{
this.reader IndexReader.open(dir)
TermEnum theTerms this.reader.termHi Daniel
Tnx but forgive my ignorance.. can u give me a sample code to do it ). I have never used termDocs() before.
Tnx
Johnny
----- Original Message ----
From Daniel Noll <daniel@(protected)On Tuesday 02 October 2007 12 25 47 Johnny R. Ruiz III wrote
> Hi
>
> I can 't seem to find a way to delete duplicate in lucene index. I hve a
> unique key so it seems to be straight forward. But Hi
I can 't seem to find a way to delete duplicate in lucene index. I hve a unique key so it seems to be straight forward. But I can 't find a simple way to do it except for putting each record iThe whole question of multilingual indexing has been discusses
at length you might find some ideas if you search the archive...
Erick
On 10/1/07 Dino Korah <dckorah@(protected) > wrote
>
> Thanks ErYou might be able to create an analyzer that breaks your
stream up (from the example) into tokens
"foo " and " " and then (using the same analyzer)
search on phrases with a slop of 0. That seems like
iI 've been getting the following compiler error when building the javadocs
from the trunk sources
Ant build error
[javac] D \lucene-
2.2.0\contrib\gdata-server\src\gom\src\java\org\apache\lucenThanks Erick.
The PerFieldAnalyzerWrapper could fit in but in the current world of
multilingual anywhere (even in programming languages.. %$?%#@) almost any
field in an email (addresses subject bAs for suggestions on how to do this I have no other than
to make sure that you can create the queries necessary to obtain
the required output.
Regards
Paul Elschot
On Sunday 30 September 2007 09Of course it depends on the kind of query you are doing but (I did
find the query parser in the mean time)
MultiFieldQueryParser mfqp new MultiFieldQueryParser(useFields
analyzer boosts)
whereWell the size wouldn 't be a problem we could afford the extra field.
But it would seem to complicate the search quite a lot. I 'd have to run
the search terms through both analyzers. It would be muHi
Don 't know the size of your dataset. But couldn 't you index in 2
fields with PerFieldAnalyzer tokenizing with Standard for 1 field
and WhiteSpace for the other.
Then use multiple field queryWhitespace analyzer does preserve those symbols but not as tokens. It
simply leaves them attached to the original term.
As an example of what I 'm talking about consider a document that
contains (
1 okt 2007 kl. 15.33 skrev John Byrne
> Has anyone written an analyzer that preserves puncuation and
> synmbols ( "? " "$ " "% " etc.) as tokens?
WhitespaceAnalyzer?
You could also extend the lexic
1 okt 2007 kl. 14.41 skrev sandeep chawla
> 2- Is there a way I can get the term.docFrq() for a particular set of
> documents..
Using TermDocs or the TermFreqVector.
--
karl
Hi
Has anyone written an analyzer that preserves puncuation and synmbols
( "? " "$ " "% " etc.) as tokens?
That way we could distinguish between searching for "100 " and "100% " or
"$100 ".
Does anyoSure but there 's a time/space tradeoff. Isn 't there always <G >....
PerFieldAnalyzerWrapper is your friend. It would require that your
index be built on a per-language basis. Say indexing
text from FHi
I am working on a lucene email indexing system which potentially can get
documents in various languages. Currently I am using StandardAnalyzer which
works for English but not for many of the  |
|
 |