Yes, you are right - generally autocommit is a better way. If you are doing a one-off indexing, then a manual commit may well be the best option, but generally, autocommit is a better way.
Upayavira On Mon, Aug 3, 2015, at 11:15 PM, Konstantin Gribov wrote: > Upayavira, manual commit isn't a good advice, especially with small bulks > or single document, is it? I see recommendations on using > autoCommit+autoSoftCommit instead of manual commit mostly. > > вт, 4 авг. 2015 г. в 1:00, Upayavira <u...@odoko.co.uk>: > > > SolrJ is just a "SolrClient". In pseudocode, you say: > > > > SolrClient client = new > > SolrClient("http://localhost:8983/solr/whatever"); > > > > List<SolrInputDocument> docs = new ArrayList<>(); > > SolrInputDocument doc = new SolrInputDocument(); > > doc.addField("id", "abc123"); > > doc.addField("some-text-field", "I like it when the sun shines"); > > docs.add(doc); > > client.add(docs); > > client.commit(); > > > > (warning, the above is typed from memory) > > > > So, the question is simply how many documents do you add to docs before > > you do client.add(docs); > > > > And how often (if at all) do you call client.commit(). > > > > So when you are told "Use SolrJ", really, you are being told to write > > some Java code that happens to use the SolrJ client library for Solr. > > > > Upayavira > > > > > > On Mon, Aug 3, 2015, at 10:01 PM, Alexandre Rafalovitch wrote: > > > Well, > > > > > > If it is just file names, I'd probably use SolrJ client, maybe with > > > Java 8. Read file names, split the name into parts with regular > > > expressions, stuff parts into different field names and send to Solr. > > > Java 8 has FileSystem walkers, etc to make it easier. > > > > > > You could do it with DIH, but it would be with nested entities and the > > > inner entity would probably try to parse the file. So, a lot of wasted > > > effort if you just care about the file names. > > > > > > Or, I would just do a directory listing in the operating system and > > > use regular expressions to split it into CSV file, which I would then > > > import into Solr directly. > > > > > > In all of these cases, the question would be which field is the ID of > > > the record to ensure no duplicates. > > > > > > Regards, > > > Alex. > > > > > > ---- > > > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > > > http://www.solr-start.com/ > > > > > > > > > On 3 August 2015 at 15:34, Mugeesh Husain <muge...@gmail.com> wrote: > > > > @Alexandre No i dont need a content of a file. i am repeating my > > requirement > > > > > > > > I have a 40 millions of files which is stored in a file systems, > > > > the filename saved as ARIA_SSN10_0007_LOCATION_0000129.pdf > > > > > > > > I just split all Value from a filename only,these values i have to > > index. > > > > > > > > I am interested to index value to solr not file contains. > > > > > > > > I have tested the DIH from a file system its work fine but i dont know > > how > > > > can i implement my code in DIH > > > > if my code get some value than how i can i index it using DIH. > > > > > > > > If i will use DIH then How i will make split operation and get value > > from > > > > it. > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > View this message in context: > > http://lucene.472066.n3.nabble.com/Can-Apache-Solr-Handle-TeraByte-Large-Data-tp3656484p4220552.html > > > > Sent from the Solr - User mailing list archive at Nabble.com. > > > -- > Best regards, > Konstantin Gribov