Re: facet.method enum vs fc

2013-04-19 Thread Mingfeng Yang
the url field a Disk based DocValue and shift the > memory from Solr to the file system cache. But to run efficiently this is > still going to take a lot of memory in the OS file cache. > > > > > On Thu, Apr 18, 2013 at 12:00 PM, Mingfeng Yang >wrote: > > > 20G is allo

Re: Updating clusterstate from the zookeeper

2013-04-19 Thread Mingfeng Yang
Right. I am wondering if/how we can download a specific file from the zookeeper, modify it and then upload to rewrite it. Anyone ? Thanks, Ming On Fri, Apr 19, 2013 at 10:53 AM, Michael Della Bitta < michael.della.bi...@appinions.com> wrote: > I would like to know the answer to this as well. >

iterate through each document in Solr

2013-05-05 Thread Mingfeng Yang
Dear Solr Users, Does anyone know what is the best way to iterate through each document in a Solr index with billion entries? I tried to use select?q=*:*&start=xx&rows=500 to get 500 docs each time and then change start value, but it got very slow after getting through about 10 million docs. T

Re: iterate through each document in Solr

2013-05-06 Thread Mingfeng Yang
at 3:33 AM, Dmitry Kan wrote: > Are you doing it once? Is your index sharded? If so, can you ask each shard > individually? > Another way would be to do it on Lucene level, i.e. read from the binary > indices (API exists). > > Dmitry > > > On Mon, May 6, 2013 at

Re: iterate through each document in Solr

2013-05-06 Thread Mingfeng Yang
Andre, Thanks for the info! Unfortunately, my solr is on 3.6 version, and looks like those options are not available. :( Ming- On Mon, May 6, 2013 at 5:32 AM, Andre Bois-Crettez wrote: > On 05/06/2013 06:03 AM, Michael Sokolov wrote: > >> On 5/5/13 7:48 PM, Mingfeng Yang wrote:

solr 3.6 use only one CPU

2013-05-30 Thread Mingfeng Yang
We have a solr instance running on a 4 CPU box. Sometimes, we send a query to our solr server and it take up 100% of one CPU and > 60% of memory. I assume that if we send another query request, solr should be able to use another idling CPU. However, it is not the case. Using top, I only see on

SolrEntityProcessor gets slower and slower

2013-06-10 Thread Mingfeng Yang
I trying to migrate 100M documents from a solr index (v3.6) to a solrcloud index (v4.1, 4 shards) by using SolrEntityProcessor. My data-config.xml is like http://10.64.35.117:8995/solr/"; query="*:*" rows="2000" fl= "author_class,authorlink,author_location_text,author_text,author,category,date,

shard splitting

2013-06-10 Thread Mingfeng Yang
>From the solr wiki, I saw this command ( http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=&shard=shardId) which split one index into 2 shards. However, is there someway to split into more shards? Thanks, Ming-

Re: shard splitting

2013-06-10 Thread Mingfeng Yang
l.com> wrote: > No, it is hard coded to split into two shards only. You can call it > recursively on a sub shard to split into more pieces. Please note that some > serious bugs were found in that command which will be fixed in the next > (4.3.1) release of Solr. > > > On Tu

retrieve datefield value from document

2013-06-14 Thread Mingfeng Yang
I have an index first built with solr1.4 and later upgraded to solr3.6, which has 150million documents, and all docs have a datefield which are not blank. (verified by solr query). I am using the following code snippet to retrieve import org.apache.lucene.index.IndexReader; import org.apache.luce

Re: retrieve datefield value from document

2013-06-14 Thread Mingfeng Yang
Street > > New York, NY 10017 > > t: @appinions <https://twitter.com/Appinions> | g+: > plus.google.com/appinions > w: appinions.com <http://www.appinions.com/> > > > On Fri, Jun 14, 2013 at 4:15 PM, Mingfeng Yang >wrote: > > > I have an index firs

Re: retrieve datefield value from document

2013-06-14 Thread Mingfeng Yang
see if that > > changes what you see. > > > > > > Michael Della Bitta > > > > Applications Developer > > > > o: +1 646 532 3062 | c: +1 917 477 7906 > > > > appinions inc. > > > > “The Science of Influence Marketing” > > > >

Re: retrieve datefield value from document

2013-06-14 Thread Mingfeng Yang
. > On Jun 14, 2013 6:05 PM, "Mingfeng Yang" wrote: > > > Michael, > > > > That's what I thought as well. I would assume an optimization of the > index > > would rewrite all documents in the newer format then? > > > > Ming- > > > > &

Re: retrieve datefield value from document

2013-06-14 Thread Mingfeng Yang
(num, DateTools.Resolution.SECOND)); Then you get dt as a string in the right format. Ming- On Fri, Jun 14, 2013 at 4:20 PM, Michael Della Bitta < michael.della.bi...@appinions.com> wrote: > Use EmbeddedSolrServer rather than Lucene directly. > On Jun 14, 2013 6:47 PM, "Ming

dynamic field

2013-06-17 Thread Mingfeng Yang
How is daynamic field in solr implemented? Does it get saved into the same Document as other regular fields in lucene index? Ming-

preserve special characters

2013-06-18 Thread Mingfeng Yang
We need to index and search lots of tweets which can like "@solr: solr is great". or "@solr_lucene, good combination". And we want to search with "@solr" or "@solr_lucene". How can we preserve "@" and "_" in the index? If using whitespacetokennizer followed by worddelimiterfilter, @solr_lucene

Re: preserve special characters

2013-06-18 Thread Mingfeng Yang
; > >types="at-under-alpha.txt"/> > > > > The file +at-under-alpha.txt+ would contain: > > @ => ALPHA > _ => ALPHA > > The analysis results: > > Source: Hello @World_bar, r@end. >Tokens: 1: Hello 2

plugin init failure for ShingleFilterFactory

2013-07-26 Thread Mingfeng Yang
I am trying to upgrade solr to 4.4 version, and looks like solr cann't load the ShingleFilterFactory class. 417 [coreLoadExecutor-4-thread-1] ERROR org.apache.solr.core.CoreContainer – Unable to create core: collection1 org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fi

maxScore field in SolrCloud response

2013-01-25 Thread Mingfeng Yang
We are migrating our Solr index from single index to multiple shards with solrcloud. I noticed that when I query solrcloud (to all shards or just one of the shards), the response has a field of maxScore, but query of single index does not include this field. In both cases, we are using Solr 4.0.

How to migrate SolrCloud shards to different servers?

2013-01-25 Thread Mingfeng Yang
Right now I have an index with four shards on a single EC2 server, each running on different ports. Now I'd like to migrate three shards to independent servers. What should I do to safely accomplish this process? Can I just 1. shutdown all four solr instances. 2. copy three shards (indexes) to d

Re: How to migrate SolrCloud shards to different servers?

2013-01-25 Thread Mingfeng Yang
chines as replicas of the > cores you want to move - then once they are active, unload the cores on the > old machine, stop the Solr instances and remove the stuff left on the > filesystem. > > - Mark > > On Jan 25, 2013, at 7:42 PM, Mingfeng Yang wrote: > > > Right now

secure Solr server

2013-01-27 Thread Mingfeng Yang
Before Solr 4.0, I secure solr by enable password protection in Jetty. However, password protection will make solrcloud not work. We use EC2 now, and we need the www admin interface of solr to be accessible (with password) from anywhere. How do you protect your solr sever from unauthorized acces

Re: Distibuted search

2013-01-28 Thread Mingfeng Yang
In your case, since there is no co-current queries, adding replicas won't help much on improving the response speed. However, break your index into a few shards do help increase query performance. I recently break an index with 30 million documents (30G) into 4 shards, and the boost is pretty impr

Traditional replication behind SolrCloud

2013-01-29 Thread Mingfeng Yang
Our application of Solr is somehow non-typical. We constantly feed Solr with lots of documents grabbed from internet, and NRT searching is not required. A typical search will return millions of result, and query response need to be as fast as possible. Since in SolrCloud environment, indexing re

Re: How to migrate SolrCloud shards to different servers?

2013-01-29 Thread Mingfeng Yang
odes > > Good luck! > > Regards, Per Steffensen > > > > On 1/26/13 6:56 AM, Mingfeng Yang wrote: > >> Hi Mark, >> >> When I did testing with SolrCloud, I found the following. >> >> 1. I started 4 shards on the same host on port 8983, 8973, 8963, an

fatest way to rebuild Solr index

2013-02-14 Thread Mingfeng Yang
I have a few Solr indexes, each with 20-200 millions documents, which were indexed by querying multiple PostgreSQL databases. If I do rebuild the index by the same way, it would take a few months, because the PostgresSQL query is slow. Now, I need to do the following changes to all indexes. 1. de

Re: fatest way to rebuild Solr index

2013-02-14 Thread Mingfeng Yang
Shawn, Awesome. Exactly something I am looking for. Thanks! Ming On Thu, Feb 14, 2013 at 12:00 PM, Shawn Heisey wrote: > On 2/14/2013 12:46 PM, Mingfeng Yang wrote: > >> I have a few Solr indexes, each with 20-200 millions documents, which were >> indexed by querying m

RequestHandler init failure

2013-02-18 Thread Mingfeng Yang
When trying to use SolrEntityProcessor to do data import from another solr index (solor 4.1) I added the following in solrconfig.xml data-config.xml and create new file data-config.xml with http://wolf:1Xnbdoq@myserver:8995/solr/"; query="*:*" fl="id,md5_text,title,text

Re: RequestHandler init failure

2013-02-18 Thread Mingfeng Yang
Found it by myself. It's here http://mirrors.ibiblio.org/maven2/org/apache/solr/solr-dataimporthandler/4.1.0/ Download and move the jar file to solr-webapp/webapp/WEB-INF/lib directory, and the errors are all gone. Ming On Mon, Feb 18, 2013 at 11:52 AM, Mingfeng Yang wrote: > When t

Re: RequestHandler init failure

2013-02-20 Thread Mingfeng Yang
Chris, My config file did include the section of loading related plugin. Ming On Tue, Feb 19, 2013 at 10:42 AM, Chris Hostetter wrote: > > : Found it by myself. It's here > : > http://mirrors.ibiblio.org/maven2/org/apache/solr/solr-dataimporthandler/4.1.0/ > : > : Download and move the jar fi

Re: How to change the index dir in Solr 4.1

2013-02-21 Thread Mingfeng Yang
How about passing -Dsolr.data.dir=/ur/data/dir in the command line to java when you start Solr service. On Thu, Feb 21, 2013 at 9:05 AM, chamara wrote: > Yes that is what i am doing now? I taught this solution is not elegant for > a > deployment? Is there any other way to do this from the Solr

Re: can i install new SOLR 4.1 as slaver(3.3 Master)

2013-02-21 Thread Mingfeng Yang
I cannot give an affirmative answer. But I am thinking that it would have potential problem, as the index format in 3.3 and 4.1 are slightly different. Why don't you upgrade to 4.1? The only thing you need to do is 1. install solr 4.1 2.1 copy all related config files from 3.3 2.2 back up the in

solrcloud data directory structure

2013-02-22 Thread Mingfeng Yang
I see the items under my solorcloud data directory of "replica node" as drwxr-xr-x 2 solr solr42 Feb 22 18:19 index drwxr-xr-x 2 solr solr 12288 Feb 23 01:00 index.20130222181947835 -rw-r--r-- 1 solr solr78 Feb 22 18:25 index.properties -rw-r--r-- 1 solr solr 209 Feb 22 18:25 replication

pivot facet with solrcloud (solr 4.1)

2013-03-04 Thread Mingfeng Yang
Looks like pivot facet with solrcloud does not work (I am using Solr 4.1). The query below return no pivot search result unless I added "&shards=shard1". http://localhost:8995/solr/collection1/select?q=*%3A*&facet=true&facet.mincount=1&facet.pivot=source_domain,author&rows=1&wt=json&facet.limit=5

update some fields vs replace the whole document

2013-03-08 Thread Mingfeng Yang
Generally speaking, which has better performance for Solr? 1. updating some fields or adding new fields into a document. or 2. replacing the whole document. As I understand, update fields need to search for the corresponding doc first, and then replace field values. While replacing the whole doc

Re: update some fields vs replace the whole document

2013-03-08 Thread Mingfeng Yang
gh I'd be very curious to see someone actually test that. > > Upayavira > > On Fri, Mar 8, 2013, at 09:51 PM, Mingfeng Yang wrote: > > Generally speaking, which has better performance for Solr? > > 1. updating some fields or adding new fields into a document. > >

tokenizer of solr

2013-04-11 Thread Mingfeng Yang
Dear Solr users and developers, I am trying to index some documents some of which are twitter messages, and we have a problem when indexing retweet. Say a twitter user named "jpc_108" post a tweet, and then someone retweet his msg, and now @jpc_108 become part of the tweet text body. Seems like

Re: tokenizer of solr

2013-04-11 Thread Mingfeng Yang
looks like it's due to the word delimiter filter. Anyone know if the "protected" file support regular expression or not? Ming On Thu, Apr 11, 2013 at 4:58 PM, Jack Krupansky wrote: > Try the whitespace tokenizer. > > -- Jack Krupansky > > -----Original Message

Re: tokenizer of solr

2013-04-12 Thread Mingfeng Yang
olr/**AnalyzersTokenizersTokenFilter**s#solr.** > WordDelimiterFilterFactory<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory> > . > > > -- Jack Krupansky > > -Original Message- From: Mingfeng Yang > Sent: Thursday, April 1

facet.method enum vs fc

2013-04-17 Thread Mingfeng Yang
I am doing faceting on an index of 120M documents, on the field of url, using the following two queries. Note that the only difference of the two queries is that first one uses default facet.method, and the second one uses face.method=enum. ( each document in the index contains a review we extra

Re: facet.method enum vs fc

2013-04-17 Thread Mingfeng Yang
Apr 17, 2013 at 12:06 PM, Mingfeng Yang >wrote: > > > I am doing faceting on an index of 120M documents, on the field of url, > > using the following two queries. Note that the only difference of the > two > > queries is that first one uses default facet.method, and t

Re: facet.method enum vs fc

2013-04-18 Thread Mingfeng Yang
20G is allocated to Solr already. Ming On Wed, Apr 17, 2013 at 11:56 PM, Toke Eskildsen wrote: > On Wed, 2013-04-17 at 20:06 +0200, Mingfeng Yang wrote: > > I am doing faceting on an index of 120M documents, > > on the field of url[...] > > I would guess that you woul

list docs with geo location info

2013-08-15 Thread Mingfeng Yang
I have a schema with a geolocation field named "author_geo" defined as How can I list docs whose author_geo fields are not empty? Seems filter query "fq=author_geo:*" does not work like other fields which are string or text or float type. curl 'localhost/solr/select?q=*:*&rows=10&wt=json&inde

Re: list docs with geo location info

2013-08-15 Thread Mingfeng Yang
Figured out. use author_geo:[* TO *] will do the trick. On Thu, Aug 15, 2013 at 1:26 PM, Mingfeng Yang wrote: > I have a schema with a geolocation field named "author_geo" defined as > > stored="true" /> > > How can I list docs whose author_geo fields

spatial search, geofilt does not work

2013-08-19 Thread Mingfeng Yang
My solr index has a field called "author_geo" which contains the author's location, and when I am trying to get all docs whose author are within 10 km of 35.0,35.0 using the following query. curl ' http://localhost/solr/select?q=*:*&fq={!geofilt%20sfield=author_geo}&pt=35.0,35.0&d=10&wt=json&inden

Re: spatial search, geofilt does not work

2013-08-19 Thread Mingfeng Yang
BTW: my schema.xml contains the following related lines. On Mon, Aug 19, 2013 at 2:02 PM, Mingfeng Yang wrote: > My solr index has a field called "author_geo" which contains the author's > location, and when I am trying to get all docs whose author are within 10 > k

Re: spatial search, geofilt does not work

2013-08-20 Thread Mingfeng Yang
t;!geofilt sfield=author_geo" > Clearly wrong. Try escaping the braces with URL percent escapes, etc. > > ~ David > > > Mingfeng Yang wrote > > My solr index has a field called "author_geo" which contains the author's > > location, and when I am trying t

Problem of facet on 170M documents

2013-11-01 Thread Mingfeng Yang
I have an index with 170M documents, and two of the fields for each doc is "source" and "url". And I want to know the top 500 most frequent urls from Video source. So I did a facet with "fq=source:Video&facet=true&facet.field=url&facet.limit=500", and the matching documents are about 9 millions.

Re: Problem of facet on 170M documents

2013-11-04 Thread Mingfeng Yang
above) try the new parameter facet.threads with a > > reasonable value (4 to 8 gave me a massive performance speedup when > > working with large facets, i.e. nTerms >> 10^7). > > > > -Sascha > > > > > > Mingfeng Yang wrote: > > > I h

can't overwrite and can't delete by id

2013-11-22 Thread Mingfeng Yang
Recently, I found out that I can't delete doc by id or overwrite a doc from/in my SOLR index which is based on SOLR 4.4.0 version. Say, I have a doc http://pastebin.com/GqPP4Uw4 (to make it easier to view, I use pastebin here). And I tried to add a dynamic field "rank_ti" to it, want to make

Re: can't overwrite and can't delete by id

2013-11-22 Thread Mingfeng Yang
BTW: it's a 4 shards solorcloud cluster using zookeeper 3.3.5 On Fri, Nov 22, 2013 at 11:07 AM, Mingfeng Yang wrote: > Recently, I found out that I can't delete doc by id or overwrite a doc > from/in my SOLR index which is based on SOLR 4.4.0 version. > > S