If you’re talking about a generic web crawl you could use something like Nutch [1] keep in mind that his a full web crawler and it does a pretty good job. I’ve been using it for over more than 2 years now and I’m very happy, although I don’t crawl just a couple of sites but a more wide spectrum (think a country web scale). But with Nutch you just have to configure a couple of options in an xml file and it will crawl the web and index the content into Solr.
Regards, [1] http://nutch.apache.org On Oct 7, 2014, at 4:53 PM, Vishal Sharma <vish...@grazitti.com> wrote: > Makes sense. > > I'll just dive in now. Thanks so much. > > *Vishal Sharma**TL, Grazitti Interactive*T: +1 650 641 1754 > E: vish...@grazitti.com > www.grazitti.com [image: Description: LinkedIn] > <http://www.linkedin.com/company/grazitti-interactive>[image: Description: > Twitter] <https://twitter.com/grazitti>[image: fbook] > <https://www.facebook.com/grazitti.interactive>*dreamforce®*Oct 13-16, > 2014 *Meet > us at the Cloud Expo* > Booth N2341 Moscone North, > San Francisco > Schedule a Meeting > <http://www.vcita.com/v/grazittiinteractive/online_scheduling#/schedule> > | Follow us <https://twitter.com/grazitti>ZakCalendar > Dreamforce® Featured > App > <https://appexchange.salesforce.com/listingDetail?listingId=a0N3000000B5UPKEA3> > > > > > > > On Tue, Oct 7, 2014 at 1:44 PM, Alexandre Rafalovitch <arafa...@gmail.com> > wrote: > >> I am pretty sure Swift is not Solr. That's why I was asking whether >> you were starting from scratch. >> >> As to the other items, please re-read my original response. Solr has >> an example reading in RSS feeds, you could probably use that. Or a >> generic XML using DataImportHandler's mapping. Or directly from >> database, again with DIH. >> >> Basically, it sounds totally doable. So, it's hard to advise anything >> specific beyond "go, do it" and wait for you to come back with a lot >> more specific issue once you get going. Most of the issues will be >> related to your schema and your WordPress configuration, so no >> abstract advice is available. >> >> Regards, >> Alex. >> >> On 7 October 2014 16:36, Vishal Sharma <vish...@grazitti.com> wrote: >>> Hey Alex, >>> >>> Thanks for the prompt response. >>> >>> Here is what I am trying to solve: I am showing search results from >> content >>> coming from 3 different places on a single site. And, I have done that by >>> pumping all this content to Solr server running on single flat schema by >>> using different APIs of these platforms. Now, I need to index blog posts >>> written in word press also. I was wondering if there is any solution >>> already availablw which can help me crawl and pump this posst to my >> running >>> solr instance. Otherwise I might have to write few more scripts to do >> that. >>> >>> BTW, Is Swift using Solr on the backend? Because I thought its a paid >>> enterprise solution. >>> >> Concurso "Mi selfie por los 5". Detalles en http://justiciaparaloscinco.wordpress.com