1, Download and install all three (Nutch 1.7 and Solr 4.5 on Tomcat 7:) on Windows or Linux.
- Install JDK 1.6+ if not. Make sure all three work independently.
2, Install solr.war on Tomcat by following instructions here.
- Make sure that you copy solr's /lib/ext/*.jar to /lib of Tomcat (solr 4 is using log4j).
- Copy solr's resources/log4j.properties to webapps/solr/WEB-INF/classes/ in Tomcat.
- Change webapps/solr/WEB-INF/web.xml to point to solr home
3, Follow Nutch+Solr Tutorial here.
- Merge Nutch's conf/schema-solr4.xml into collection1/conf/scema.xml (non-trival)
- Be careful about field name/type "url". No duplicate is allowed in schema..xml.
Now solr is ready for access at:
http://localhost:8080/solr/#/collection1
http://localhost:8080/solr/collection1/browse
Nutch: crawling
Solr: Indexing