Re: Faceting Question

2012-11-15 Thread Alexey Serba
Seems like pivot faceting is what you looking for ( http://wiki.apache.org/solr/SimpleFacetParameters#Pivot_.28ie_Decision_Tree.29_Faceting ) Note: it currently does not work in distributed mode - see https://issues.apache.org/jira/browse/SOLR-2894 On Thu, Nov 15, 2012 at 7:46 AM, Jamie Johnson

Re: Faceting Facets

2012-09-03 Thread Alexey Serba
http://wiki.apache.org/solr/SimpleFacetParameters#Pivot_.28ie_Decision_Tree.29_Faceting On Mon, Sep 3, 2012 at 6:38 PM, Dotan Cohen wrote: > Is there any way to nest facet searches in Solr? Specifically, I have > a User field and a DateTime field. I need to know how many Documents > match each Us

Re: Java class "[B" has no public instance field or method named "split".

2012-08-31 Thread Alexey Serba
http://wiki.apache.org/solr/DataImportHandlerFaq#Blob_values_in_my_table_are_added_to_the_Solr_document_as_object_strings_like_B.401f23c5 On Sat, Sep 1, 2012 at 2:17 AM, Cirelli, Stephen J. wrote: > Anyone know why I'm getting this exception? I'm following the example > here < http://wiki.apache.

Re: Query Time problem on Big Index Solr 3.5

2012-08-31 Thread Alexey Serba
1. Use filter queries > Here a example of query, there are any incorrect o anything that can I > change? > http://xxx:8893/solr/candidate/select/?q=+(IdCandidateStatus:2)+(IdCobranded:3)+(IdLocation1:12))+(LastLoginDate:[2011-08-26T00:00:00Z > TO 2012-08-28T00:00:00Z]) What is the logic here? Are

Re: Injest pauses

2012-08-29 Thread Alexey Serba
Could you take jstack dump when it's happening and post it here? > Interestingly it is not pausing during every commit so at least a portion of > the time the async commit code is working. Trying to track down the case > where a wait would still be issued. > > -Original Message- > From

Re: LateBinding

2012-08-29 Thread Alexey Serba
http://searchhub.org/dev/2012/02/22/custom-security-filtering-in-solr/ See section about PostFilter. On Wed, Aug 29, 2012 at 4:43 PM, wrote: > Hello, > > Has anyone ever implementet the security feature called late-binding? > > I am trying this but I am very new to solr and I would be very glad

Re: Injest pauses

2012-08-29 Thread Alexey Serba
Hey Brad, > This leads me to believe that a single merge thread is blocking indexing from > occuring. > When this happens our producers, which distribute their updates amongst all > the shards, pile up on this shard and wait. Which version of Solr you are using? Have you tried 4.0 beta? * http

Re: Sharing and performance testing question.

2012-08-29 Thread Alexey Serba
> Any tips on load testing solr? Ideally we would like caching to not effect > the result as much as possible. 1. Siege tool This is probably the simplest option. You can generate urls.txt file and pass it to the tool. You should also capture server performance (CPU, memory, qps, etc) using tools

Re: Indexing and querying BLOBS stored in Mysql

2012-08-24 Thread Alexey Serba
I would recommend to create a simple data import handler to test tika parsing for large BLOBs, i.e. remove not related entities, remove all the configuration for delta imports and keep just entity that retrieves blobs and entity that parses binary content (fieldReader/TikaEntityProcessor). Some co

Re: Custom Geocoder with Solr and Autosuggest

2012-08-16 Thread Alexey Serba
> My first decision was to divide SOLR into two cores, since I am already > using SOLR as my search server. One core would be for the main search of the > site and one for the geocoding. Correct. And you can even use that location index/collection for locations extraction for a non structural docum

Re: MySQL Exception: Communications link failure WITH DataImportHandler

2012-08-16 Thread Alexey Serba
My memory is vague, but I think I've seen something similar with older versions of Solr. Is it possible that you have significant database import and there's a big segments merge happening in the middle causing blocking in dih indexing process (and reading records from database as well), since lon

Re: Solr Index linear growth - Performance degradation.

2012-08-14 Thread Alexey Serba
>10K queries How do you generate these queries? I.e. is this a single or multi threaded application? Can you provide full queries you send to Solr servers and solrconfig request handler configuration? Do you use function queries, grouping, faceting, etc? On Tue, Aug 14, 2012 at 10:31 AM, feroz_k

Re: Running out of memory

2012-08-12 Thread Alexey Serba
> It would be vastly preferable if Solr could just exit when it gets a memory > error, because we have it running under daemontools, and that would cause > an automatic restart. -XX:OnOutOfMemoryError="; " Run user-defined commands when an OutOfMemoryError is first thrown. > Does Solr require the

Re: Is this too much time for full Data Import?

2012-08-08 Thread Alexey Serba
9m*15 - that's a lot of queries (>400 QPS). I would try reduce the number of queries: 1. Rewrite your main (root) query to select all possible data * use SQL joins instead of DIH nested entities * select data from 1-N related tables (tags, authors, etc) in the main query using GROUP_CONCAT (that'

Re: Large RDBMS dataset

2011-12-29 Thread Alexey Serba
> The problem is that for each record in "fd", Solr makes three distinct SELECT > on the other three tables. Of course, this is absolutely inefficient. You can also try to use GROUP_CONCAT (it's MySQL function, but maybe there's something similar in MS SQL) to select all the nested 1-N entities i

Re: Decimal Mapping problem

2011-12-29 Thread Alexey Serba
Try to cast MySQL decimal data type to string, i.e. CAST( IF(drt.discount IS NULL,'0',(drt.discount/100)) AS CHAR) as discount (or CAST AS TEXT) On Mon, Dec 19, 2011 at 1:24 PM, Niels Stevens wrote: > Hey everybody, > > I'm having an issue importing Decimal numbers from my Mysql DB to Solr. > Is

Re: a question on jmx solr exposure

2011-12-29 Thread Alexey Serba
Which Solr version do you use? Maybe it has something to do with default collection? I do see separate jmx domain for every collection, i.e. solr/collection1 solr/collection2 solr/collection3 ... On Wed, Dec 21, 2011 at 1:56 PM, Dmitry Kan wrote: > Hello list, > > This might be not the right pl

Re: Solr 3.3: DIH configuration for Oracle

2011-08-17 Thread Alexey Serba
Why do you need to collect both primary keys T1_ID_RECORD and T2_ID_RECORD in your delta query. Isn't T2_ID_RECORD primary key value enough to get all data from both tables? (you have table1-table2 relation as 1-N, right?) On Thu, Aug 11, 2011 at 12:52 AM, Eugeny Balakhonov wrote: > Hello, all! >

Re: Weird issue with solr and jconsole/jmx

2011-06-24 Thread Alexey Serba
I just encountered the same bug - JMX registered beans don't survive Solr core reloads. I believe the reason is that when you do core reload * when the new core is created - it overwrites/over-register beans in registry (in mbeanserver) * when the new core is ready in the core register phase CoreC

Re: Solr and Tag Cloud

2011-06-19 Thread Alexey Serba
Consider you have multivalued field _tag_ related to every document in your corpus. Then you can build tag cloud relevant for all data set or specific query by retrieving facets for field _tag_ for "*:*" or any other query. You'll get a list of popular _tag_ values relevant to this query with occur

Re: Document Level Security (SOLR-1872 ,SOLR,SOLR-1834)

2011-06-17 Thread Alexey Serba
> Do you mean that we  have current Index as it is and have a separate core > which  has only the user-id ,product-id relation and at while querying ,do a > join between the two cores based on the user-id. Exactly. You can index user-id, product-id relation either to the same core or to different c

Re: Updating only one indexed field for all documents quickly.

2011-06-16 Thread Alexey Serba
>> with the integer field. If you just want to influence the >> score, then just plain external field fields should work for >> you. > > Is this an appropriate solution, give our use case? > Yes, check out ExternalFileField * http://search.lucidimagination.com/search/document/CDRG_ch04_4.4.4 * ht

Re: Strange behavior

2011-06-16 Thread Alexey Serba
Have you stopped Solr before manually copying the data? This way you can be sure that index is the same and you didn't have any new docs on the fly. 2011/6/14 Denis Kuzmenok : > What  should  i provide, OS is the same, environment is the same, solr > is  completely  copied,  searches  work,  excep

Re: Document Level Security (SOLR-1872 ,SOLR,SOLR-1834)

2011-06-16 Thread Alexey Serba
> So a search for a product once the user logs in and searches for only the > products that he has access to Will translate to something like this . ,the > product ids are obtained form the db  for a particular user and can run > into  n  number. > > &fq=product_id(100 10001  ..n number) > > b

Re: Complex situation

2011-06-16 Thread Alexey Serba
Am I right that you are only interested in results / facets for current season? If it's so then you can index start/end dates as a separate number fields and build your search filters like this "fq=+start_date_month:[* TO 6] +start_date_day:[* TO 17] +end_date_month:[* TO 6] +end_date_day:[16 TO *]

Re: URGENT HELP: Improving Solr indexing time

2011-06-13 Thread Alexey Serba
16276 ... > so I am doing a delta import of around 500,000 rows at a > time. http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport

Re: Need query help

2011-06-06 Thread Alexey Serba
See "Tagging and excluding Filters" section * http://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters 2011/6/6 Denis Kuzmenok : > For now i have a collection with: > id (int) > price (double) multivalue > brand_id (int) > filters (string) multivalue > > I  need  to  get a

Re: Solr memory consumption

2011-06-02 Thread Alexey Serba
> Commits are divided into 2 groups: > - often but small (last changed > info) 1) Make sure that it's not too often and you don't have commit overlapping problem. http://wiki.apache.org/solr/FAQ#What_does_.22PERFORMANCE_WARNING:_Overlapping_onDeckSearchers.3DX.22_mean_in_my_logs.3F 2) You may

Re: Documents update

2011-06-01 Thread Alexey Serba
> Will it be slow if there are 3-5 million key/value rows? AFAIK it shouldn't affect search time significantly as Solr caches it in memory after you reloading Solr core / issuing commit. But obviously you need more memory and commit/reload will take more time.

Re: Better Spellcheck

2011-06-01 Thread Alexey Serba
> I've tried to use a spellcheck dictionary built from my own content, but my > content ends up having a lot of misspelled words so the spellcheck ends up > being less than effective. You can try to use sp.dictionary.threshold parameter to solve this problem * http://wiki.apache.org/solr/SpellCheck

Re: DIH render html entities

2011-06-01 Thread Alexey Serba
Maybe HTMLStripTransformer is what you are looking for. * http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer On Tue, May 31, 2011 at 5:35 PM, Erick Erickson wrote: > Convert them to what? Individual fields in your docs? Text? > > If the former, you might get some joy from the Xpa

Re: Solr memory consumption

2011-06-01 Thread Alexey Serba
Hey Denis, * How big is your index in terms of number of documents and index size? * Is it production system where you have many search requests? * Is there any pattern for OOM errors? I.e. right after you start your Solr app, after some search activity or specific Solr queries, etc? * What are 1)

Re: Indexing 20M documents from MySQL with DIH

2011-05-05 Thread Alexey Serba
{quote} ... Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost. at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2539) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2989)

Re: Solr performance issue

2011-03-22 Thread Alexey Serba
> Btw, I am monitoring output via jconsole with 8gb of ram and it still goes > to 8gb every 20 seconds or so, > gc runs, falls down to 1gb. Hmm, jvm is eating 8Gb for 20 seconds - sounds a lot. Do you return all results (ids) for your queries? Any tricky faceting/sorting/function queries?

Re: Dataimport performance

2010-12-19 Thread Alexey Serba
> With subquery and with left join:   320k in 6 Min 30 It's 820 records per second. It's _really_ impressive considering the fact that DIH performs separate sql query for every record in your case. >> So there's one track entity with an artist sub-entity. My (admittedly >> rather limited) experien

Re: Custom scoring for searhing geographic objects

2010-12-19 Thread Alexey Serba
Hi Pavel, I had the similar problem several years ago - I had to find geographical locations in textual descriptions, geocode these objects to lat/long during indexing process and allow users to filter/sort search results to specific geographical areas. The important issue was that there were seve

Re: Newbie: Indexing unrelated MySQL tables

2010-12-14 Thread Alexey Serba
> I figured I would create three entities and relevant > schema.xml entries in this way: > > dataimport.xml: > > > That's correct. You can list several entities under document element. You can index them separately using entity parameter (i.e. add entity=Users to you full import HTTP request). D

Re: my index has 500 million docs ,how to improve so lr search performance?

2010-12-14 Thread Alexey Serba
How much memory do you allocate for JVMs? Considering you have 10 JVMs per server (10*N) you might have not enough memory for OS file system cache ( you need to keep some memory free for that ) > all indexs size is about 100G is this per server or whole size? On Mon, Nov 15, 2010 at 8:35 AM, lu.

Re: Syncing 'delta-import' with 'select' query

2010-12-14 Thread Alexey Serba
gt;> Thanks for all the help! It is really appreciated. >> >> For now, I can afford the parallel requests problem, but when I put >> synchronous=true in the delta import, the call still returns with >> outdated items. >> Examining the log, it seems that the commit opera

Re: Syncing 'delta-import' with 'select' query

2010-12-06 Thread Alexey Serba
> When you say "two parallel requests from two users to single DIH > request handler", what do you mean by "request handler"? I mean DIH. > Are you > refering to the HTTP request? Would that mean that if I make the > request from different HTTP sessions it would work? No. It means that when you h

Re: DIH - rdbms to index confusion

2010-12-06 Thread Alexey Serba
> I have a table that contains the data values I'm wanting to return when > someone makes a search.  This table has, in addition to the data values, 3 > id's (FKs) pointing to the data/info that I'm wanting the users to be able > to search on (while also returning the data values). > > The general

Re: Syncing 'delta-import' with 'select' query

2010-12-06 Thread Alexey Serba
Hey Juan, It seems that DataImportHandler is not a right tool for your scenario and you'd better use Solr XML update protocol. * http://wiki.apache.org/solr/UpdateXmlMessages You still can work around your outdated GUI view problem with calling DIH synchronously, by adding synchronous=true to you

Re: dataimports response returns before done?

2010-12-06 Thread Alexey Serba
> After issueing a dataimport, I've noticed solr returns a response prior to > finishing the import. Is this correct?   Is there anyway i can make solr not > return until it finishes? Yes, you can add synchronous=true to your request. But be aware that it could take a long time and you can see ht

Re: Query performance very slow even after autowarming

2010-12-06 Thread Alexey Serba
* Do you use EdgeNGramFilter in index analyzer only? Or you also use it on query side as well? * What if you create additional field first_letter (string) and put first character/characters (multivalued?) there in your external processing code. And then during search you can filter all documents t

Re: DIH delta, deltaQuery

2010-11-26 Thread Alexey Serba
Are you sure that it's deltaQuery that's taking a minute? It only retrieves ids of updated records and then deltaImportQuery is executed N times for each id record. You might want to try the following technique - http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport On Wed, Nov 24,

Re: Basic Solr Configurations and best practice

2010-11-26 Thread Alexey Serba
> 1-      How to combine data from DIH and content extracted from file system > document into one document in the index? http://wiki.apache.org/solr/TikaEntityProcessor You can have one sql entity that retrieves metadata from database and another nested entity that parses binary file into additiona

Re: using DIH with mets/alto file sets

2010-11-26 Thread Alexey Serba
> The idea is to create a full text index of the alto content, accompanied by > the author/title info from the mets file for purposes of results display. - Then you need to list only alto files in your landscapes entity (fileName="^ID.{3}-ALTO\d{3}.xml$" or something like that), because you don't

Re: Searching with wrong keyboard layout or using translit

2010-10-31 Thread Alexey Serba
Another approach for this problem is to use another Solr core for storing users queries for auto complete functionality ( see http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ ) and index not only user_query field, but also transliterated and diff_l

Re: problem on running fullimport

2010-10-24 Thread Alexey Serba
" Caused by: java.sql.SQLException: Illegal value for setFetchSize(). " Try to add batchSize="-1" to your data source declaration http://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_a_MySQL_database._My_table_is_huge_and_DataImportHandler_is_going_out_of_memory._Wh

Re: DataImportHandler dynamic fields clarification

2010-10-13 Thread Alexey Serba
Harry, could you please file a jira for this and I'll address this in a patch. I fixed related issue (SOLR-2102) and I think it's pretty similar. > Interesting, I was under the impression that case does not matter. > > From http://wiki.apache.org/solr/DataImportHandler#A_shorter_data-config : > "I

Re: Help need in setting up delta imports

2010-09-24 Thread Alexey Serba
Your example doesn't mention deleting Employee. Is this a valid use case? If not then you can simplify things: query="SELECT name, address from employee where endtimestamp is null" deltaQuery= "SELECT DISTINCT name FROM employee eventtimestamp > '${dataimporter.last_index_time}' " d

Re: Delta Import with something other than Date

2010-09-10 Thread Alexey Serba
> Can you provide a sample of passing the parameter via URL? And how using it > would look in the data-config.xml http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters

Re: Solr is indexing jdbc properties

2010-09-06 Thread Alexey Serba
http://wiki.apache.org/solr/DataImportHandlerFaq#Blob_values_in_my_table_are_added_to_the_Solr_document_as_object_strings_like_B.401f23c5 Try to add convertType attribute to dataSource declaration, i.e. HTH, Alex On Mon, Sep 6, 2010 at 5:49 PM, savvas.andreas wrote: > > Hello, > > I am trying

Re: Data Import Handler Query

2010-08-12 Thread Alexey Serba
Try to define image solr fields <-> db columns mapping explicitly in "image" entity, i.e. See http://www.lucidimagination.com/search/document/c8f2ed065ee75651/dih_and_multivariable_fields_problems On Thu, Aug 12, 2010 at 2:30 AM, Manali Joshi wrote: > I tried making the schema

Re: DIH and multivariable fields problems

2010-08-10 Thread Alexey Serba
> Have others successfully imported dynamic multivalued fields in a > child entity using the DataImportHandler via the child entity returning > multiple records through a RDBMS? Yes, it's working ok with static fields. I didn't even know that it's possible to use variables in field names ( "dynami

Re: Implementing lookups while importing data

2010-08-10 Thread Alexey Serba
> We are currently doing this via a JOIN on the numeric > field, between the main data table and the lookup table, but this > dramatically slows down indexing. I believe SQL JOIN is the fastest and easiest way in your case (in comparison with nested entity even using CachedSqlEntity). You probably

Re: DIH: Rows fetch OK, Total Documents Failed??

2010-08-10 Thread Alexey Serba
Do you have any required fields or uniqueKey in your schema.xml? Do you provide values for all these fields? AFAIU you don't need commonField attribute for id and title fields. I don't think that's your problem but anyway... On Sat, Jul 31, 2010 at 11:29 AM, wrote: > >  Hi, > > I'm a bit lost

Re: Performance issues when querying on large documents

2010-07-23 Thread Alexey Serba
Do you use highlighting? ( http://wiki.apache.org/solr/HighlightingParameters ) Try to disable it and compare performance. On Fri, Jul 23, 2010 at 10:52 PM, ahammad wrote: > > Hello, > > I have an index with lots of different types of documents. One of those > types basically contains extracts o

Re: 2 solr dataImport requests on a single core at the same time

2010-07-23 Thread Alexey Serba
> having multiple Request Handlers will not degrade the performance IMO you shouldn't worry unless you have hundreds of them

Re: commit is taking very very long time

2010-07-23 Thread Alexey Serba
> I am not sure why some commits take very long time. Hmm... Because it merges index segments... How large is your index? > Also is there a way to reduce the time it takes? You can disable commit in DIH call and use autoCommit instead. It's kind of hack because you postpone commit operation and ma

Re: 2 solr dataImport requests on a single core at the same time

2010-07-22 Thread Alexey Serba
DataImportHandler does not support parallel execution of several requests. You should either send your requests sequentially or register several DIH handlers in solrconfig and use them in parallel. On Thu, Jul 22, 2010 at 11:20 AM, kishan wrote: > > please help me > -- > View this message in con

Re: Adding new elements to index

2010-07-07 Thread Alexey Serba
1) Shouldn't you put your "entity" elements under "document" tag, i.e. ... ... 2) What happens if you try to run full-import with explicitly specified "entity" GET parameter? command=full-import&entity=carrers command=full-import&entity=hidrants On Wed, Jul 7, 2010 at 11:1

Re: solr data config questions

2010-06-29 Thread Alexey Serba
5225","[...@1b308c1","[...@103f345"], > > I use the same query on mysql database, it returns right results. > > Can someone answer me this ? > > Many Thanks > > Vivian > > -Original Message- > From: Alexey Serba [mailto:ase...@gma

Re: DIH and denormalizing

2010-06-28 Thread Alexey Serba
> It seems that ${ncdat.feature} is not being set. Try ${dataTable.feature} instead. On Tue, Jun 29, 2010 at 1:22 AM, Shawn Heisey wrote: > I am trying to do some denormalizing with DIH from a MySQL source.  Here's > part of my data-config.xml: > >      query="SELECT *,FROM_UNIXTIME(post_date)

Re: solr data config questions

2010-06-28 Thread Alexey Serba
Hi, You can add additional commentreplyjoin entity to story entity, i.e. ... Thus, you will have multivalued field commentreply that contains list of related "comment_id, reply_id" ("comment_id," if you don't have any related replies for this entry

Re: Data Import Handler Rich Format Documents

2010-06-28 Thread Alexey Serba
> Ok, I'm trying to integrate the TikaEntityProcessor as suggested.  I'm using > Solr Version: 1.4.0 and getting the following error: > > java.lang.ClassNotFoundException: Unable to load BinURLDataSource or > org.apache.solr.handler.dataimport.BinURLDataSource It seems that DIH-Tika integration is

Re: dataimport.properties is not updated on delta-import

2010-06-25 Thread Alexey Serba
Please note that Oracle ( or Oracle jdbc driver ) converts column names to upper case eventhough you state them in lower case. If this is the case then try to rewrite your query in the following form select id as "id", name as "name" from table On Thursday, June 24, 2010, warb wrote: > > Hello ag

Re: Data Import Handler Rich Format Documents

2010-06-21 Thread Alexey Serba
You are right. It seems TikaEntityProcessor is exactly the tool you need in this case. Alex On Sat, Jun 19, 2010 at 2:59 AM, Chris Hostetter wrote: > : I think you can use existing ExtractingRequestHandler to do the job, > : i.e. add child entity to your DIH metadata > > why would you do this in

Re: Data Import Handler Rich Format Documents

2010-06-18 Thread Alexey Serba
I think you can use existing ExtractingRequestHandler to do the job, i.e. add child entity to your DIH metadata http://localhost:8983/solr/update/extract?extractOnly=true&wt=xml&indent=on&stream.url=${metadata.url}"; dataSource="solr"> That's not working example, just basic

Re: Solr DataConfig / DIH Question

2010-06-16 Thread Alexey Serba
> There is a 1-[0,1] relationship between Person and Address with address_id > being the nullable foreign key. I think you should be good with single query/entity then (no need for nested entities) On Sunday, June 13, 2010, Holmes, Charles V. wrote: > I'm putting together an entity.  A simpli

Re: multiValued using

2010-06-07 Thread Alexey Serba
Hi Alberto, You can add child entity which returns multiple records, i.e. HTH, Alex 2010/6/7 Alberto García Sola : > Hello, this is my first message to this list. > > I was wondering if it is possible to use multiValued when using MySQL (or > any SQL-database engine) through DataImp

Re: Importing large datasets

2010-06-07 Thread Alexey Serba
What's the relation between items and item_descriptions table? I.e. is there only one item_descriptions record for every id? If 1-1 then you can merge all your data into single database and use the following query HTH, Alex On Thu, Jun 3, 2010 at 6:34 AM, Blargy wrote: > > > Erik Hatcher-4

Re: indexer threading?

2010-04-27 Thread Alexey Serba
Hi Brian, I was testing indexing performance on a high cpu box recently and came to the same issue. I tried different indexing methods ( xml, CSVRequestHandler and Solrj + BinaryRequestWriter with multiple threads ). The last method is the fastest indeed. I believe that multiple threads approach g

Re: Short Question: Fills this entity multiValued Fields (DIH)?

2010-04-08 Thread Alexey Serba
> Have a look at these two lines: > > >                 > > > If there is more than one description per item_ID, does the features-field > gets multiple values if it is defined as multiValued=true? Correct.

Re: SOLR-1316 How To Implement this autosuggest component ???

2010-03-24 Thread Alexey Serba
You should add this component (suggest or spellcheck, depends how do you name it) to request handler, i.e. add suggest And then you can hit the following url and get your suggestions http://localhost:8983/solr/suggest/?spellcheck=true&spellcheck.dictionary=suggest

Re: SOLR-1316 How To Implement this autosuggest component ???

2010-03-23 Thread Alexey Serba
> Error loading class 'org.apache.solr.spelling.suggest.Suggester' Are you sure you applied the patch correctly? See http://wiki.apache.org/solr/HowToContribute#Working_With_Patches Checkout Solr trunk source code ( http://svn.apache.org/repos/asf/lucene/solr/trunk ), apply patch, verify that ever

Re: Term Highlighting without store text in index

2010-03-18 Thread Alexey Serba
Hey Dominique, See http://www.lucidimagination.com/search/document/5ea8054ed8348e6f/highlight_arbitrary_text#3799814845ebf002 Although it might be not good solution for huge texts, wildcard/phrase queries. http://issues.apache.org/jira/browse/SOLR-1397 On Mon, Mar 15, 2010 at 4:09 PM, dbejean

Re: implementing profanity detector

2010-02-11 Thread Alexey Serba
> - A TokenFilter would allow me to tap into the existing analysis pipeline so > I get the tokens for free but I can't access the document. https://issues.apache.org/jira/browse/SOLR-1536 On Fri, Jan 29, 2010 at 12:46 AM, Mike Perham wrote: > We'd like to implement a profanity detector for docume

DataImportHandler - case sensitivity of column names

2010-02-08 Thread Alexey Serba
I encountered the problem with Oracle converting column names to upper case. As a result SolrInputDocument is created with field names in upper case and "Document [null] missing required field: id" exception is thrown ( although ID field is defined ). I do not specify "field" elements explicitly.

Re: Indexing an oracle warehouse table

2010-02-03 Thread Alexey Serba
> What would be the right way to point out which field contains the term > searched for. I would use highlighting for all of these fields and then post process Solr response in order to check highlighting tags. But I don't have so many fields usually and don't know if it's possible to configure So

Re: Indexing a oracle warehouse table

2010-02-02 Thread Alexey Serba
> Dont define any so that column in > SOLR will be same as in the database table. Correct You can define dynamic field ( see http://wiki.apache.org/solr/SchemaXml#Dynamic_fields ) > 1)How do I define unique field in this scenario? You can create primary key into database or generate it directly

DataImportHandler - convertType attribute

2010-02-02 Thread Alexey Serba
Hello, I encountered blob indexing problem and found convertType solution in FAQ I was wondering why it is not enabled by default and found the following comm

Re: DataImportHandler - synchronous execution

2010-01-13 Thread Alexey Serba
Hi, I created Jira issue SOLR-1721 and attached simple patch ( no documentation ) for this. HIH, Alex 2010/1/13 Noble Paul നോബിള്‍ नोब्ळ् : > it can be added > > On Tue, Jan 12, 2010 at 10:18 PM, Alexey Serba wrote: >> Hi, >> >> I found that there's no explicit

DataImportHandler - synchronous execution

2010-01-12 Thread Alexey Serba
Hi, I found that there's no explicit option to run DataImportHandler in a synchronous mode. I need that option to run DIH from SolrJ ( EmbeddedSolrServer ) in the same thread. Currently I pass dummy stream to DIH as a workaround for this, but I think it makes sense to add specific option for that.

Re: Adaptive search?

2009-12-18 Thread Alexey Serba
You can add click counts to your index as additional field and boost results based on that value. http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_change_the_score_of_a_document_based_on_the_.2Avalue.2A_of_a_field_.28say.2C_.22popularity.22.29 You can keep some kind of buffer for clicks and

Re: preserve relational strucutre in solr?

2009-12-14 Thread Alexey Serba
http://wiki.apache.org/solr/DataImportHandler#Full_Import_Example See full import example, it has 1-n and n-n relationships On Mon, Dec 14, 2009 at 4:34 PM, Faire Mii wrote: > >  was able to import data through solr DIH. > > in my db i have 3 tables: > > threads: id tags: id thread_tag_map: thre

Re: sanizing/filtering query string for security

2009-11-09 Thread Alexey Serba
> BTW, I have not used DisMax handler yet, but does it handle *:* properly? See q.alt DisMax parameter http://wiki.apache.org/solr/DisMaxRequestHandler#q.alt You can specify q.alt=*:* and q as empty string to get all results. > do you care if users issue this query I allow users to issue an empty

Re: sanizing/filtering query string for security

2009-11-09 Thread Alexey Serba
I added some kind of pre and post processing of Solr results for this, i.e. If I find fieldname specified in query string in form of "fieldname:term" then I pass this query string to standard request handler, otherwise use DisMaxRequestHandler ( DisMaxRequestHandler doesn't break the query, at lea

Re: Similar documents from multiple cores with different schemas

2009-11-09 Thread Alexey Serba
> Or maybe it's > possible to tweak MoreLikeThis just to return the fields and terms that > could be used for a search on the other core? Exactly See parameter mlt.interestingTerms in MoreLikeThisHandler http://wiki.apache.org/solr/MoreLikeThisHandler You can get interesting terms and build query

Re: MoreLikeThis and filtering/restricting on "target" fields

2009-11-06 Thread Alexey Serba
Hi Cody, > I have tried using MLT as a search component so that it has access to > filter queries (via fq) but I cannot seem to get it to give me any > data other than more of the same, that is, I can get a ton of Articles > back but not other "content types". Filter query ( fq ) should work, for

Re: Dismax and Standard Queries together

2009-11-03 Thread Alexey Serba
Hi Ram, You can add another field total ( catchall field ) and copy all other fields into this field ( using copyField directive ) http://wiki.apache.org/solr/SchemaXml#Copy_Fields and use this field in DisMax qf parameter, for example qf=business_name^2.0 category_name^1.0 sub_category_name^1.0

Re: Solr Cell on web-based files?

2009-11-02 Thread Alexey Serba
> e.g (doesn't work) > curl http://localhost:8983/solr/update/extract?extractOnly=true > --data-binary @http://myweb.com/mylocalfile.htm -H "Content-type:text/html" > You might try remote streaming with Solr (see > http://wiki.apache.org/solr/SolrConfigXml). Yes, curl example curl 'http://local

Re: adding and updating a lot of document to Solr, metadata extraction etc

2009-11-02 Thread Alexey Serba
Hi Eugene, > - ability to iterate over all documents, returned in search, as Lucene does >  provide within a HitCollector instance. We would need to extract and >  aggregate various fields, stored in index, to group results and aggregate > them >  in some way. > > Also I did not find any way

Re: Keepwords Schema

2009-10-05 Thread Alexey Serba
Probably you want to use - multivalued field 'authors' login.php alex brian ... - return facets for this field - you can filter unwanted authors whether during indexing process or post process returned search results On Fri, Oct 2, 2009 at 4:35 PM, Shalin Shekhar Mangar < shalin

Re: yellow pages navigation kind menu. howto take every 100th row from resultset

2009-10-05 Thread Alexey Serba
It seems that you need Faceted Search On Fri, Oct 2, 2009 at 3:35 PM, Julian Davchev wrote: > Hi, > > Long story short: how can I take every 100th row from solr resultset. > What would syntax for this

Re: do NOT want to stem plurals for a particular field, or words

2009-09-16 Thread Alexey Serba
>  You can enable/disable stemming per field type in the schema.xml, by > removing the stemming filters from the type definition. > > Basically, copy your prefered type, rename it to something like > 'text_nostem', remove the stemming filter from the type and use your > 'text_nostem' type for your

Re: Disabling tf (term frequency) during indexing and/or scoring

2009-09-16 Thread Alexey Serba
Hi Aaron, You can overwrite default Lucene Similarity and disable tf and lengthNorm factors in scoring formula ( see http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Similarity.html and http://lucene.apache.org/java/2_4_1/api/index.html ) You need to 1) compile the following clas

Re: query too long / has-many relation

2009-09-09 Thread Alexey Serba
> But apart from that everything works fine now (10,000 OR clauses takes 10 > seconds). Not fast. I would recommend to denormalize your data, put everything into Solr index and use Solr faceting http://wiki.apache.org/solr/SolrFacetingOverview to get relevant persons ( see my previous message )

Re: query too long / has-many relation

2009-09-09 Thread Alexey Serba
>> Is there a way to configure Solr to accept POST queries (instead of GET >> only?). >> Or: is there some other way to make Solr accept queries longer than 2,000 >> characters? (Up to 10,000 would be nice) > Solr accepts POST queries by default. I switched to POST for exactly > the same reason. I

Re: query too long / has-many relation

2009-09-09 Thread Alexey Serba
> Is there a way to configure Solr to accept POST queries (instead of GET > only?). > Or: is there some other way to make Solr accept queries longer than 2,000 > characters? (Up to 10,000 would be nice) Solr accepts POST queries by default. I switched to POST for exactly the same reason. I use Solr

  1   2   >