RE: Ensuring SpellChecker returns corrections which satisfy fq params for default OR query

2012-12-19 Thread Dyer, James
would be helpful, but you're not sure because they might all apply to items that do not match fq=item:in_stock.* Yup, exactly. Do you think the workaround I suggested would work (and not have terrible perf)? Or any other ideas? Thanks, Nalini On Wed, Dec 19, 2012 at 1:09 PM, Dyer, James james.d

RE: dataimport.properties not created/updated with solrcloud

2012-12-19 Thread Dyer, James
Someone with more zookeeper knowledge than I have can better answer this, but there is code in place specifically for using DIH with SolrCloud to save the dataimport.properties file in an appropriate place. The default path is: /configs/{collection}/dataimport.properties I'm not sure which

RE: order question on solr multi value field

2012-12-18 Thread Dyer, James
I would say such a guarantee is implied by the javadoc to Analyzer#getPositionIncrementGap . It says this value is an increment to be added to the next token emitted from tokenStream.

RE: order question on solr multi value field

2012-12-18 Thread Dyer, James
for the INDEXED data, not the STORED data! I believe the concern is the stored values that are returned from a query. Although the question does also apply to how a span query would work for a multi-valued field. -- Jack Krupansky -Original Message- From: Dyer, James Sent: Tuesday, December

RE: order question on solr multi value field

2012-12-18 Thread Dyer, James
that the contract is semi-sort-of-implied, but the point is that we should make it explicit. If I have time later today I'll file the Jira. -- Jack Krupansky -Original Message- From: Dyer, James Sent: Tuesday, December 18, 2012 12:08 PM To: solr-user@lucene.apache.org Subject: RE: order question

RE: Spell Check is not working properly

2012-12-17 Thread Dyer, James
The spellcheckers (IndexBasedSpellChecker and DirectSolrSpellChecker) both have tuning parameters that control how similar a potential correction needs to be from the original query term in order to be considered. For IndexBasedSpellChecker, there is spellcheck.accuracy, which should be a

RE: Differentiate between correctly spelled term and mis-spelled term with no corrections

2012-12-14 Thread Dyer, James
then. Thanks for the detailed explanation! - Nalini On Fri, Dec 7, 2012 at 1:38 PM, Dyer, James james.d...@ingramcontent.comwrote: The response from the shards is different from the final spellcheck response in that it does include the term even if there are no suggestions for it. So to get

RE: Need help with delta import

2012-12-14 Thread Dyer, James
Try ${dih.delta.ID} instead of ${dataimporter.delta.id}. Also use ${dih.last_index_time} instead of ${dataimporter.last_index_time} . I noticed when updating the test cases that the wiki incorrectly used the longer name but with all the versions I tested this on only the short name works.

RE: Need help with delta import

2012-12-14 Thread Dyer, James
Group (615) 213-4311 -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Friday, December 14, 2012 1:41 PM To: solr-user@lucene.apache.org Subject: Re: Need help with delta import On 12/14/2012 11:39 AM, Dyer, James wrote: Try ${dih.delta.ID} instead

RE: dataimport.properties not created/updated with solrcloud

2012-12-12 Thread Dyer, James
When using SolrCloud, the dataimport.properties file goes to a different location. See https://issues.apache.org/jira/browse/SOLR-3165 for more information. Also, while this feature works in 4.0.0, it is currently broken in (not-released) 4.1 (branch_4x) and the development Trunk. This

RE: Need help with delta import

2012-12-10 Thread Dyer, James
Its surprising that your full import is working for you. Both your query and your deltaImportQuery have: SELECT ID FROM... ...So both your full-import (query attr) and your delta-import (deltaImportQuery attr) are only getting the ID field from your db. Shouldn't you be at least be getting

RE: Spelling output solr 4

2012-12-07 Thread Dyer, James
I'm not sure what you mean. Can you paste in an example spellcheck response and explain how it differs between the older IndexBasedSpellChecker on 3.1 and the DirectSolrSpellChecker on 4.0 ? James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From:

RE: Differentiate between correctly spelled term and mis-spelled term with no corrections

2012-12-07 Thread Dyer, James
You might want to open a jira issue for this to request that the feature be added. If you haven't used it before, you need to create an account. https://issues.apache.org/jira/browse/SOLR In the mean time, If you need to get the document frequency of the query terms, see

RE: Differentiate between correctly spelled term and mis-spelled term with no corrections

2012-12-07 Thread Dyer, James
the final spellcheck response we get from Solr in some way? Thanks, Nalini On Fri, Dec 7, 2012 at 10:26 AM, Dyer, James james.d...@ingramcontent.comwrote: You might want to open a jira issue for this to request that the feature be added. If you haven't used it before, you need to create an account

RE: DIH nested entities don't work

2012-12-05 Thread Dyer, James
Maarten, Glad to hear that your DIH experiment worked well for you. To implement something like Endeca's guided navigation, see http://wiki.apache.org/solr/SolrFacetingOverview . If you need to implement multi-level faceting, see http://wiki.apache.org/solr/HierarchicalFaceting (but

RE: DIH nested entities don't work

2012-11-27 Thread Dyer, James
The line numbers aren't matching up, but best I can tell, it looks like its probably getting a NPE on this line: DIHCacheTypes type = types[pkColumnIndex]; If this is the correct line, it seems the types array is NULL. The types array is populated from a properties file that gets written when

RE: Spellchecker for multiple sites (and languages?)

2012-11-26 Thread Dyer, James
Also see this recent mail list thread for an explanation how you can set up a master dictionary with everything in it but only get valid spell suggestions returned:

RE: Spellchecker for multiple sites (and languages?)

2012-11-26 Thread Dyer, James
The Lucene spellcheckers just look at each word in isolation, which is what the extended results are reporting on. So when using maxCollationTries, etc, this information becomes less useful. Its when Solr tries to put these words together into a meaningful collation that you get a good query

RE: How do I best detect when my DIH load is done?

2012-11-19 Thread Dyer, James
Andy, I use an approach similar to yours. There may be something better, however. You might be able to write an onImportEnd listener to tell you when it ends. See http://wiki.apache.org/solr/DataImportHandler#EventListeners for a little documentation See also

RE: Search using the result returned from the spell checking component

2012-11-19 Thread Dyer, James
What you want isn't supported. You always will need to issue that second request. This would be a nice feature to add though. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Roni [mailto:r...@socialarray.com] Sent: Monday, November 19, 2012

RE: How do I best detect when my DIH load is done?

2012-11-19 Thread Dyer, James
I'm not sure. But there are at least a few jira issues open with differing ideas on how to improve this. For instance, SOLR-1554 SOLR-2728 SOLR-2729 James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: geeky2 [mailto:gee...@hotmail.com] Sent:

RE: DIH nested entities don't work

2012-11-16 Thread Dyer, James
Maarten, Here is a sample set-up that lets you build your caches in parallel and then index off the caches in a subsequent step. See below for the solrconfig.xml snippet and the text of the 4 data-config.xml files. In this example it builds a cache for the parent also, but this is not

RE: DIH nested entities don't work

2012-11-15 Thread Dyer, James
Depending on how much data you're pulling back, 2 hours might be a reasonable amount of time. Of course if you had it a lot faster with Endeca Forge, I can understand your questioning this. Keep in mind that the way you're setting up, it will build each cache, 1 at a time. I'm pretty sure

RE: DIH nested entities don't work

2012-11-12 Thread Dyer, James
Here's what I'd do next: - double check you're only caching the child entity, not the parent. - Replace the SELECT * queries with a list of actual fields you want. - specify the persistCacheFieldNames and persistCacheFieldTypes parameters (see the doc-comment for DIHCachePersistProperties) - Try

RE: DIH nested entities don't work

2012-11-09 Thread Dyer, James
Here are things I would try: - You need to package the patch from SOLR-2943 in your jar as well as SOLR-2613 (to get the class DIHCachePersistCacheProperties) - You need to specify cacheImpl, not persistCacheImpl - You are correct using persistCacheName persistCacheBaseDir , contra the test

RE: Solr SpellCheck on Query Field

2012-11-09 Thread Dyer, James
What I'm saying is if you specify spellcheck.maxCollationTries, it will run the suggested query against the index for you and only return valid re-written queries. That is, a misspelled firstname will be replaced with a valid firstname; a missspelled lastname will be replaced with a valid

RE: Solr SpellCheck on Query Field

2012-11-08 Thread Dyer, James
This would be an awesome feature to have, wouldn't it? For now, the best you can do is to create a master dictionary that contains all of the FirstNames and LastNames and use that as your dictionary's spellcheck field. This is the copyField technique that you refer to in the linked post.

RE: Problem with ping handler, SolrJ 4.1-SNAPSHOT, Solr 3.5.0

2012-11-08 Thread Dyer, James
Shawn, Could this be a side-effect from SOLR-4019, in branch_4.0 this was commit r1405894 ? Prior to this commit, PingRequestHandler would throw a SolrException for 503/Bad Request. The change is that the exception isn't actually thrown but rather sent in place of the response. This

RE: [SOLR-2549] DIH LineEntityProcessor support for delimited fixed-width files

2012-11-07 Thread Dyer, James
Zakaria, You might want to post your data-config.xml, or at least the part that uses SOLR-2549. If its throwing an NPE, it certaintly has a bug (if you're doing something wrong, it would at least give you a sensible error message). Also, unless you need to use DIH for some other reason, you

RE: [SOLR-2549] DIH LineEntityProcessor support for delimited fixed-width files

2012-11-07 Thread Dyer, James
BENZIDALMAL mobile: 06 31 40 04 33 2012/11/7 Dyer, James james.d...@ingramcontent.com Zakaria, You might want to post your data-config.xml, or at least the part that uses SOLR-2549. If its throwing an NPE, it certaintly has a bug (if you're doing something wrong, it would at least give you

RE: Access DIH from inside application (via Solrj)?

2012-11-06 Thread Dyer, James
DIH SolrJ don't really support what you want to do. But you can make it work with code like this, which reloads the DIH configuration and checks for the response. Just note this is quite brittle: whenever the response changes in future versions of DIH, it'll break your code. MapString,

RE: DIH nested entities don't work

2012-10-29 Thread Dyer, James
If your subentities are large, the default DIH Cache probably isn't going to work because it stores all the data in-memory. (This is CachedSQLEntityProcessor for Solr 3.5 or earlier ; cacheImpl=SortedMapBackedCache for 3.6 or later) DIH for Solr 3.6 and later supports pluggable caches (see

RE: WordBreak spell correction makes split terms optional?

2012-10-02 Thread Dyer, James
The parenthesis are being added by the spellchecker. I tried to envision a number of different scenarios when designing how this would work and at the time it seemed best to add parenthesis around terms that originally were together but now are split up. From your example, I see this is a

RE: DIH problem

2012-09-22 Thread Dyer, James
Gian, Even if you can't write a failing unit test (if you did it would be awesome), please open a JIRA issue on this and attach your patch. Also, you may want to try 4.0 as opposed to 3.6 as some of the 3.6 issues with DIH are resolved in 4.0.

RE: deletedPkQuery not work in solr 3.3

2012-09-06 Thread Dyer, James
You have deletedPKQuery, but the correct spelling is deletedPkQuery (lowercase k). Try that and see if it fixes your problem. Also, you can probably simplify this if you do this as command=full-importclean=false, then use something like this for your query: select product_id as

RE: DIH jdbc4.MySQLSyntaxErrorException

2012-08-31 Thread Dyer, James
I have very long SQL statements that span multiple lines and it works for me. You might want to paste your SQL into a tool like Squirrel and see if it executes outside of DIH. One guess I have is you've got something like this... where blah='${my.property}' ...but the variable is not

RE: LineEntityProcessor process only one file

2012-08-31 Thread Dyer, James
No, it should process all of the files that get listed. I'm taking a look at the issue you opened, SOLR-3779. This is also similar to SOLR-3307, although that was reported as a bug with threads in 3.6, which is no longer a feature in 4.0. James Dyer E-Commerce Systems Ingram Content Group

RE: Dataimport Handler in solr 3.6.1

2012-08-30 Thread Dyer, James
There were 2 major changes to DIH Cache functionality in Solr 3.6, only 1 of which was carried to Solr 4.0: - Solr 3.6 had 2 MAJOR changes: 1. We support pluggable caches so that you can write your own cache implemetations and cache however you want. The goal here is to allow you to cache to

RE: Static template column in DIH

2012-08-30 Thread Dyer, James
You might just be missing entity ... transformer=TemplateTransformer / . See http://wiki.apache.org/solr/DataImportHandler#TemplateTransformer James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Kiran Jayakumar [mailto:kiranjuni...@gmail.com]

RE: Dataimport Handler in solr 3.6.1

2012-08-14 Thread Dyer, James
One thing I notice in your configuration...the child entity has this: cacheLookup=ent1.uid but your parent entity doesn't have a uid field. Also, you have these 3 transformers: RegexTransformer,DateFormatTransformer,TemplateTransformer but none of your columns seem to make use of these.

RE: SpellCheckComponent Collation query

2012-08-09 Thread Dyer, James
No, your client has to re-issue the query. I have looked into doing this automatically but it would be complicated to implement. SpellCheckComponent would have to somehow get the entire component stack (faceting, highlighting, etc) to re-start from the beginning and return the new request to

RE: How To apply transformation in DIH for multivalued numeric field?

2012-07-18 Thread Dyer, James
Don't you want to specify splitBy for the integer field too? Actually though, you shouldn't need to use GROUP_CONCAT and RegexTransformer at all. DIH is designed to handle 1many relations between parent and child entities by populating all the child fields as multi-valued automatically. I

RE: maxNumberOfBackups does not cleanup - jira 3361

2012-07-10 Thread Dyer, James
I'm also certain that it would apply to both oncommit and onoptimize. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: geeky2 [mailto:gee...@hotmail.com] Sent: Tuesday, July 10, 2012 8:48 AM To: solr-user@lucene.apache.org Subject:

RE: Better (and valid) Spellcheck in combination with other parameters with at least one occurance

2012-07-09 Thread Dyer, James
Yes, maxCollationTries tests the new (collation) queries with all the same parameters as the original query. Most notably, it uses the same fq parameters so it will take in account any filters you were using. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original

RE: Better (and valid) Spellcheck in combination with other parameters with at least one occurance

2012-07-06 Thread Dyer, James
If you're using Solr3.1 or higher, you can do this. See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate . Here's a summary: - specify spellcheck.collate=true to get a re-written query made from the individual word suggestions. - specify spellcheck.maxCollationTries to

RE: DIH - unable to ADD individual new documents

2012-07-03 Thread Dyer, James
A DIH request handler can only process one run at a time. So if DIH is still in process and you kick off a new DIH full-import it will silently ignore the new command. To have more than one DIH run going at a time it is necessary to configure more than one handler instance in sorlconfig.xml.

RE: LineEntityProcessor Usage

2012-06-28 Thread Dyer, James
LineEntityProcessor outputs the entire line in a field called rawLine. You then need to write a transformer that will parse out the data. But see https://issues.apache.org/jira/browse/SOLR-2549 for enhancements that will parse the data without needing a transformer, if the data is in

RE: WordBreakSolrSpellChecker ignores MinBreakWordLength?

2012-06-28 Thread Dyer, James
Carrie, Try taking the workbreak parameters out of the request handler configuration and instead put them in the spellchecker configuration. You also need to remove the spellcheck. prefix. Also, the correct spelling for this parameter is minBreakLength. Here's an example. lst

RE: WordBreak and default dictionary crash Solr

2012-06-15 Thread Dyer, James
Carrie, Thank you for trying out new features! I'm pretty sure you've found a bug here. Could you tell me whether you're using a build from Trunk or Solr_4x ? Also, do you know the svn revision or the Jenkins build # (or timestamp) you're working from? Could you try instead to use

RE: DIH idle in transaction forever

2012-06-14 Thread Dyer, James
Try readOnly=true in the dataSource configuration. This causes several defaults to get set in the JDBC connection, and often will solve problems like this. (see http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource) Also, try a batch size of 0 to let your jdbc driver pick

RE: PageRanking with DIH

2012-06-12 Thread Dyer, James
To boost a document with DIH, see this section about $docBoost in the wiki here: http://wiki.apache.org/solr/DataImportHandler#Special_Commands. If you're using a RDBMS for source data, your query would have something like this in it: select PAGE_RANK as '$docBoost', ... from ... etc If you

RE: Writing custom data import handler for Solr.

2012-06-11 Thread Dyer, James
More specifically, the 3.6 Data Import Handler code (DIH) can be seen here: http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/ The main wiki page is here: http://wiki.apache.org/solr/DataImportHandler

RE: issues with spellcheck.maxCollationTries and spellcheck.collateExtendedResults

2012-06-06 Thread Dyer, James
Markus, With maxCollationTries=0, it is not going out and querying the collations to see how many hits they each produce. So it doesn't know the # of hits. That is why if you also specify collateExtendedResults=true, all the hit counts are zero. It would probably be better in this case if

RE: Can't index sub-entitties in DIH

2012-06-05 Thread Dyer, James
I sucessfully use Oracle with DIH although none of my imports have sub-entities. (slight difference, I'm on ojdbc5.jar w/10g...). It may be you have a driver that doesn't play well with DIH in some cases. You might want to try these possible workarounds: - rename the columns in SELECT with

RE: why DIH works in normal mode,error in debug mode

2012-06-01 Thread Dyer, James
I see this in your stacktrace: java.sql.SQLException: Illegal value for setFetchSize(). It must be that your JDBC driver doesn't like the default value (300) that is used. In your datasource tag, try adding a batchSize attribute of either 0 or -1 (if using -1, DIH automatically changes it to

RE: Data Import Handler fields with different values in column and name

2012-06-01 Thread Dyer, James
Are you leaving both mappings in there, like this... entity name=documento query=SELECT iddocumento,nrodocumento,asunto FROM documento field column=iddocumento name=iddocumento / field column=nrodocumento name=nrodocumento / field column=asunto name=asunto / field column=asunto

RE: why DIH works in normal mode,error in debug mode

2012-06-01 Thread Dyer, James
type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://127.0.0.1:3306/MYSOLR?useUnicode=trueamp;characterEncoding=UTF-8 user=root password=qwertyuiop batchSize=500 / i have done it,set batchSize=500 On Fri, Jun 1, 2012 at 10:38 PM, Dyer, James james.d

RE: Data Import Handler fields with different values in column and name

2012-06-01 Thread Dyer, James
-Original Message- From: Dyer, James Sent: Friday, June 01, 2012 10:50 AM To: solr-user@lucene.apache.org Subject: RE: Data Import Handler fields with different values in column and name Are you leaving both mappings in there, like this... entity name=documento query=SELECT iddocumento

RE: spellcheck collate with fq parameters SOLR-2010

2012-05-31 Thread Dyer, James
Markus, When you set spellcheck.maxCollationTries to a value greater than zero, the spellchecker will query each collation candidate to determine how many hits it would return. If the collation will not yield any hits, it throws it away then tries some more (up to whatever value you set).

RE: possible status codes from solr during a (DIH) data import process

2012-05-31 Thread Dyer, James
You've got it right. Here's a summary: - status = busy means its in-process. - status = idle means its finished (success or failure). - You can drill down further by looking at sub-elements under statusMessages : if there is str name=Aborted / , it means the last import was cancelled with

RE: problem on running fullimport

2012-05-24 Thread Dyer, James
On your dataSource / tag, specify batchSize with a value that your db/driver allows. The hardcoded default is 500. If you set it to -1 it converts it to Integer.MIN_VALUE. See http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource , which recommends using this -1 value in

RE: index-time boosting using DIH

2012-05-22 Thread Dyer, James
See http://wiki.apache.org/solr/DataImportHandler#Special_Commands and the $docBoost pseudo-field name. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: geeky2 [mailto:gee...@hotmail.com] Sent: Tuesday, May 22, 2012 2:12 PM To:

RE: index-time boosting using DIH

2012-05-22 Thread Dyer, James
You need to add the $docBoost pseudo-field to the document somehow. A transformer is one way to do it. You could just add it to a SELECT statement, which is especially convienent if the boost value somehow is derrived from the data: SELECT case when SELL_MORE_FLAG='Y' then 999 ELSE null

RE: Issue in Applying patch file

2012-05-17 Thread Dyer, James
Recently Lucene/Solr went to a new build process using Ivy. Simply put, dependent .jar files are no longer checked in with Lucene/Solr sources. Instead while building, Ivy now downloads them from 'repo1.maven.org' From the error you sent it seems like you do not have access to the Maven

RE: Use DIH with more than one entity at the same time

2012-05-17 Thread Dyer, James
The wiki here indicates that you can specify entity more than once on the request and it will run multiple entities at the same time, in the same handler: http://wiki.apache.org/solr/DataImportHandler#Commands But I can't say for sure that this actually works! Having been in the DIH code, I

RE: Exception in DataImportHandler (stack overflow)

2012-05-17 Thread Dyer, James
Shawn, Do you think this behavior is because, while the indexing is paused, you reach some type of timeout so either your db or the jdbc cuts the connection? Or, ar you thinking something in the DIH/JDBCDataSource code is causing the connection to drop under these circumstances? James Dyer

RE: Issue in Applying patch file

2012-05-15 Thread Dyer, James
SOLR-3430 is already applied to the latest 3.6 and 4.x (trunk) source code. Be sure you have sources from May 7, 2012 or later (for 3.6 this is SVN r1335205 + ; for trunk it is SVN r1335196 + ) No patches are needed. About the modern compiler error, make sure you're running a 1.6 or 1.7 JDK

RE: Exception in DataImportHandler (stack overflow)

2012-05-15 Thread Dyer, James
Shot in the dark here, but try adding readOnly=true to your dataSource tag. dataSource readOnly=true type=JdbcDataSource ... / This sets autocommit to true and sets the Holdability to ResultSet.CLOSE_CURSORS_AT_COMMIT. DIH does not explicitly close resultsets and maybe if your JDBC driver

RE: Indexing data from pdf

2012-05-11 Thread Dyer, James
It looks like maybe you do not have apache-solr-dataimporthandler-extras.jar in your classpath. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: anarchos78 [mailto:rigasathanasio...@hotmail.com] Sent: Friday, May 11, 2012 11:00 AM To:

RE: Indexing data from pdf

2012-05-11 Thread Dyer, James
The document you tried to index has an id but not a fake_id. Because fake_id is your index uniqueKey, you have to include it in every document you index. Your most likely fix for this is to use a Transformer to generate a fake_id. You might get away with changing this: field column=fake_id

RE: Nested CachedSqlEntityProcessor running for each entity row with Solr 3.6?

2012-05-08 Thread Dyer, James
Kellen, I appreciate your trying this out. Is there any way you can provide your data-config.xml file? I'd really like to get to the bottom of this. Thanks. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: not interesting

RE: Nested CachedSqlEntityProcessor running for each entity row with Solr 3.6?

2012-05-07 Thread Dyer, James
Dear Kellen, Brent Keith, There now are fixes available for 2 cache-related bugs that unfortunately made their way into the 3.6.0 release. These were addressed on these 2 JIRA issues, which have been committed to the 3.6 branch (as of today): - https://issues.apache.org/jira/browse/SOLR-3430

RE: JDBC import yields no data

2012-04-24 Thread Dyer, James
You might also want to show us your dataimport handler configuration from solrconfig.xml and also the url you're using to start the data import. When its complete, browsing to http://192.168.1.6:8995/solr/db/dataimport; (or whatever the DIH handler name is in your config) should say indexing

RE: JDBC import yields no data

2012-04-24 Thread Dyer, James
-4311 -Original Message- From: Hasan Diwan [mailto:hasan.di...@gmail.com] Sent: Tuesday, April 24, 2012 11:52 AM To: solr-user@lucene.apache.org Subject: Re: JDBC import yields no data On 24 April 2012 07:49, Dyer, James james.d...@ingrambook.com wrote: You might also want to show us

RE: Performance problem with DIH in solr 3.3

2012-04-23 Thread Dyer, James
See this page for an alternate way to use DIH for Delta updates that does not generate n+1 Selects: http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Pravin Agrawal

RE: Maximum Open Cursors using JdbcDataSource and cacheImpl

2012-04-18 Thread Dyer, James
Keith, Can you supply your data-config.xml ? James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Keith Naas [mailto:keithn...@dswinc.com] Sent: Wednesday, April 18, 2012 11:43 AM To: solr-user@lucene.apache.org Subject: Maximum Open Cursors using

RE: dataImportHandler: delta query fetching data, not just ids?

2012-03-29 Thread Dyer, James
You can also use $deleteDocById . If you also use $skipDoc, you can sometimes get the deletes on the same entity with a command=full-importclean=false delta. This may or may not be more convienent that what you're doing already. See

RE: dataImportHandler: delta query fetching data, not just ids?

2012-03-28 Thread Dyer, James
Janne, You're correct on how the delta import works. You specify 3 queries: - deletedPkQuery = query should return all ids (only) of items that were deleted since the last run. - deltaQuery = query should return all ids (only) of items that were added/updated since the last run. -

RE: DataImportHandler: backups prior to full-import

2012-03-28 Thread Dyer, James
I don't know of any effort out there to have DIH trigger a backup automatically. However, you can set the replication handler to automatically backup after each commit. This might solve your problem if you aren't committing frequently. James Dyer E-Commerce Systems Ingram Content Group (615)

RE: DataImportHandler: backups prior to full-import

2012-03-28 Thread Dyer, James
. Is there a way to differentiate in the replication handler? On Wed, Mar 28, 2012 at 11:54 AM, Dyer, James james.d...@ingrambook.comwrote: I don't know of any effort out there to have DIH trigger a backup automatically. However, you can set the replication handler to automatically backup after each commit

RE: possible spellcheck bug in 3.5 causing erroneous suggestions

2012-03-27 Thread Dyer, James
It might be easier to know what's going on if you provide some snippets from solrconfig.xml and schema.xml. But my guess is that in your solrconfig.xml, under the spellcheck searchComponent either the queryAnalyzerFieldType or the fieldType (one level down) is set to a field that is removing

RE: preventing words from being indexed in spellcheck dictionary?

2012-03-27 Thread Dyer, James
If the list of words isn't very long, you can add a StopFilter to the analysis for itemDescSpell and put the words you don't want in the stop list. If you want to prevent low-occuring words from being sued as corrections, use the thresholdTokenFrequency in your spellcheck configuration. James

RE: preventing words from being indexed in spellcheck dictionary?

2012-03-27 Thread Dyer, James
Assuming you're just using this field for spellcheck and not for queries, then it doesn't matter. But the correct way to do it is to have it in both places. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: geeky2 [mailto:gee...@hotmail.com]

RE: SOLR 3.3 DIH and Java 1.6

2012-03-20 Thread Dyer, James
Taking a quick look at the code, it seems this exception could have been thrown for four reasons: (see org.apache.solr.handler.dataimport.ScriptTransformer#initEngine) 1. Your JRE doesn't have class javax.script.ScriptEngineManager (pre 1.6, loaded here via reflection) 2. Your JRE doesn't

RE: SOLR 3.3 DIH and Java 1.6

2012-03-20 Thread Dyer, James
- From: Dyer, James [mailto:james.d...@ingrambook.com] Sent: Tuesday, March 20, 2012 9:46 AM To: solr-user@lucene.apache.org Subject: RE: SOLR 3.3 DIH and Java 1.6 Taking a quick look at the code, it seems this exception could have been thrown for four reasons: (see

RE: index size with replication

2012-03-14 Thread Dyer, James
SOLR-3033 is related to ReplcationHandler's ability to do backups. It allows you to specify how many backups you want to keep. You don't seem to have any backups configured here so it is not an applicable parameter (note that SOLR-3033 was committed to trunk recently but the config param was

RE: Using multiple DirectSolrSpellcheckers for a query

2012-03-13 Thread Dyer, James
Nalini, You're correct that spellcheck.q does not run through the SpellingQueryConverter, so the workaround I suggest might be half-baked. What if when using maxCollationTries to have it check the collations against the index, you also had the ability to override both mm and qf? Then you

RE: Solr DIH and $deleteDocById

2012-03-09 Thread Dyer, James
This (almost) sounds like https://issues.apache.org/jira/browse/SOLR-2492 which was fixed in Solr 3.4 .. Are you on an earlier version? But maybe not, because you're seeing the # deleted documents increment, and prior to this bug fix (I think) the deleted counter wasn't getting incremented

RE: DIH - FileListEntityProcessor reading from Multiple Disk Directories

2012-03-09 Thread Dyer, James
Did you try setting baseDir to the root directory and recursive to true ? (see http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor for more information). James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 From: mike.rawl...@gxs.com

RE: Using multiple DirectSolrSpellcheckers for a query

2012-03-07 Thread Dyer, James
. But then, this would require an extra copyfield (we could have multiple unstemmed fields as a source for this secondary dictionary) and bloat the index even more so I'm not sure if it's feasible. Thanks, Nalini On Thu, Jan 26, 2012 at 10:23 AM, Dyer, James james.d...@ingrambook.comwrote: Nalini, Right now

RE: How to stop processing of DataImportHandler in EventListener

2012-03-07 Thread Dyer, James
Wenca, I have an app with requirements similar to yours. We have maybe 40 caches that need to be built, then when they're done (and if they all succeed), the main indexing runs. For this I wrote some quick-n-squirrley code that executes a configurable # of cache-building handlers at a time.

RE: DIH Delta index takes much time

2012-03-07 Thread Dyer, James
As an insanity check, you might want to take the query that it is executing for delta updates and run it manually through a SQL tool, or do an explain plan or something. It almost sounds like there could be a silly error in the query you're using and its doing a cartesian join or something

RE: Spelling Corrector Algorithm

2012-03-01 Thread Dyer, James
Yavar, When you listed what the spell checker returns you put them in this order: Marine (Freq: 120), Market (Freq: 900) and others Was Marine listed first, and then did you pick Market because you thought higher frequency is better? If so, you probably have the right settings already but

RE: Need tokenization that finds part of stringvalue

2012-03-01 Thread Dyer, James
Speaking of which, there is a spellchecker in jira that will detect word-break errors like this. See WordBreakSpellChecker at https://issues.apache.org/jira/browse/LUCENE-3523 . To use it with Solr, you'd also need to apply SOLR-2993 (https://issues.apache.org/jira/browse/SOLR-2993). This

RE: is it possible to run deltaimport command with out delta query?

2012-02-16 Thread Dyer, James
There is a good example on how to do a delta update using command=full-updateclean=false on the wiki, here: http://wiki.apache.org/solr/DataImportHandlerFaq#fullimportdelta This can be advantageous if you are updating a ton of data at once and do not want it executing as many queries to the

RE: spellcheck configuration not providing suggestions or corrections

2012-02-13 Thread Dyer, James
The one thing that jumps out is you have spellcheck.count set to 1. Try 10 and see if you get results. The spellcherker uses a 2-pass algorithm and if the count is too small, all the good suggestions can get eliminated in the first pass. So you often need a count of maybe 10 even if you only

RE: spellcheck configuration not providing suggestions or corrections

2012-02-13 Thread Dyer, James
That would be it, I tbinkl. Your request is to /select, but you've put spellchecking into /search. Try /search instead. Also, I doubt its the problem, but try removing the trailing CRLFs from your query. Also, typically you'd still query against the main field (itemDesc in your case) and

RE: spell checking and filtering in the same query

2012-02-09 Thread Dyer, James
Mark, I'm not as familiar with the Suggester, but with normal spellcheck if you set spellcheck.maxCollationTries to something greater than 0 it will check the collations with the index. This checking includes any fq params you had. So in this sense the SpellCheckComponent does work with fq.

RE: regular expression in solrcore.config to be passed to dataConfig via DataImportHandler

2012-02-09 Thread Dyer, James
I wouldn't feel too bad about this. This is a pretty common gotcha and going forward it would be nice if we can make it easier to parameterize data-config.xml... James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Zajkowski, Radoslaw

RE: $deleteDocByQuery and $deleteDocByID

2012-02-01 Thread Dyer, James
Here is an example DIH entity that will delete from solr anything in the database that is not flagged as 'active'. entity name=Deletes dataSource=ds query= SELECT a.id AS '$deleteDocById' FROM products a INNER JOIN

<    1   2   3   4   >