Re: retrieve ids of all indexed docs efficiently

2017-01-18 Thread Erick Erickson
Added a tip on the CursorMark CWiki page, thanks for the suggestion! On Wed, Jan 18, 2017 at 5:21 PM, Pushkar Raste wrote: > I think we should add the suggestion about docValues to the cursormark wiki > (documentation), we too ran in the same problem. > > On Jan 18, 2017

Re: retrieve ids of all indexed docs efficiently

2017-01-18 Thread Pushkar Raste
I think we should add the suggestion about docValues to the cursormark wiki (documentation), we too ran in the same problem. On Jan 18, 2017 5:52 PM, "Erick Erickson" wrote: > Is your ID field docValues? Making it a docValues field should reduce > the amount of JVM heap

Re: retrieve ids of all indexed docs efficiently

2017-01-18 Thread Erick Erickson
Is your ID field docValues? Making it a docValues field should reduce the amount of JVM heap you need. But the export is _much_ preferred, it'll be lots faster as well. Of course to export you need the values you're returning to be docValues... Erick On Wed, Jan 18, 2017 at 1:12 PM, Slomin,

Re: retrieve ids of all indexed docs efficiently

2017-01-18 Thread Slomin, David
The export feature sounds promising, although I'll have to talk with our deployment folks here about enabling it. The query I'm issuing is: http://:8983/solr/_shard1_replica1/select?q=*:*=id+asc=1000==id=true=false=json Thanks, Div. On 1/18/17, 3:54 PM, "Jan Høydahl"

Re: retrieve ids of all indexed docs efficiently

2017-01-18 Thread Jan Høydahl
Don't know why you have mem problems. Can you paste in examples of full query strings during cursor mark querying? Sounds like you may be using it wrong. Or try exporting https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets -- Jan Høydahl > Den 18. jan. 2017 kl. 21.44 skrev

retrieve ids of all indexed docs efficiently

2017-01-18 Thread Slomin, David
Hi -- I'd like to retrieve the ids of all the docs in my Solr 5.3.1 index. In my query, I've set rows=1000, fl=id, and am using the cursorMark mechanism to split the overall traversal into multiple requests. Not because I care about the order, but because the documentation implies that it's