Pages

Monday, November 25, 2013

Apache Nutch 1.7 and Solr 4.5 on Tomcat 7

1, Download and install all three (Nutch 1.7 and Solr 4.5 on Tomcat 7:) on Windows or Linux.
    - Install JDK 1.6+ if not. Make sure all three work independently.

2, Install solr.war on Tomcat by following instructions here.
    - Make sure that you copy solr's /lib/ext/*.jar to /lib of Tomcat (solr 4 is using log4j).
    - Copy solr's resources/log4j.properties to webapps/solr/WEB-INF/classes/ in Tomcat.
    - Change webapps/solr/WEB-INF/web.xml to point to solr home

3, Follow Nutch+Solr Tutorial here.
    - Merge Nutch's conf/schema-solr4.xml into collection1/conf/scema.xml (non-trival)
    - Be careful about field name/type "url". No duplicate is allowed in schema..xml.

Now solr is ready for access at:

http://localhost:8080/solr/#/collection1
http://localhost:8080/solr/collection1/browse

Nutch: crawling
Solr: Indexing