Create and populate a field when indexing 2007-11-09 - By KR
Back
Grant Ingersoll-6 (See http://oll-6.ora-code.com) wrote: > > When you are indexing the file and adding the Document, you will need > to parse out your filename per your regular expression, and then > create the appropriate field: > > Document doc = new Document() > String cat = getCategoryFromFileName(inputFileName) > doc.add(new Field("category", cat, ...) > //do the rest of your adds > > Just locate where in the demo the Document add is taking place (I > forget the exact spot) and then add in the appropriate stuff from > above. Obviously, you need to implement the method I stubbed called > getCategoryFromFileName. > > HTH, > Grant >
Thanks, Grant. That was just the hint I needed.
I found that the fields are populated in HTMLDocument.
I added:
doc.add(new Field("category", "test", Field.Store.YES, Field.Index.TOKENIZED));
and then used Luke to verify that this field had been added. It had.
Now I am trying to get a quick-and-dirty way of setting the field based on the filename, but I'm running into problems that I don't really understand well enough to fix quickly.
I have only very limited experience of Java programming, so I might be using the wrong terms, but I think the problem is variable scope. I get a compilation error:
HTMLDocument.java:86: cannot find symbol symbol : variable url location: class org.apache.lucene.demo.HTMLDocument if (url.indexOf("-ov-") != -1) {
I thought I'd be able to use a simple mechanism based on indexOf() to check the existence of a short sequence of characters within the filename. For example, "-sys-". I know that this sequence, if it exists anywhere in the full path must be in the filename.
So I put in a series of if statements like this:
if (url.indexOf("-sys-") != -1) { string category = "system"; }
then right at the end: doc.add(new Field("category", category, Field.Store.YES, Field.Index.TOKENIZED));
Am I right in thinking that the variable url is undefined at this point in the code? It certainly seems to be defined earlier on in the file:
public static String uid2url(String uid) { String url = uid.replace('\u0000', '/'); // replace nulls with slashes return url.substring(0, url.lastIndexOf('/')); // remove date from end }
Is there some way for me to perhaps chop down to the filename here, and make that available later in the code?
K. -- View this message in context: http://www.nabble.com/Create-and-populate-a-field -when-indexing-tf4713018.html#a13667927 Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ------ To unsubscribe, e-mail: java-user-unsubscribe@(protected) For additional commands, e-mail: java-user-help@(protected)
|
|