September 6, 2010

Installing Apache Mahout In Ubuntu 10.04

Apache Mahout is a project that helps in making intelligent applications.

Pre-Requisites

1. JDK 1.6 or higher
2. Ant 1.7 or higher
3. Maven 2.0.9 or 2.0.10 (needed if you want to build Mahout source)

Installation

1. Download the sample code.

2. Unzip the downloaded file.

varsha@varsha-laptop:~$ unzip j-mahout.zip

3. Go into the extracted directory.

varsha@varsha-laptop:~$ cd apache-mahout-examples

4. Install.

varsha@varsha-laptop:~/
apache-mahout-examples$ ant install


Possible errors you may encounter

1. Error: JAVA_HOME is not defined correctly. We cannot execute java Bootstrap FAILED ubuntu

Solution:

varsha@varsha-laptop:~$ export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.20

Look into
/usr/lib/jvm/ to know what JDK is installed on your system.

2. BUILD FAILED
/home/varsha/apache-mahout-examples/build.xml:92: The following error occurred while executing this line:
/home/varsha/apache-mahout-examples/build.xml:85: java.net.ConnectException: Connection timed out


Solution:

i) Go to Line 81, and comment out:


<target name="get-enwiki" depends="check-files" unless="enwiki.exists">

<echo>Downloading Wikipedia Data - (~2.5GB)</echo>

<get src="http://people.apache.org/~gsingers/wikipedia/enwiki-20070527-pages-articles.xml.bz2"

dest="${wiki.dir}/enwiki-20070527-pages-articles.xml.bz2"/>

</target>



ii) About five lines below that, comment out:


<antcall target="get-enwiki"/>

This is 2.5 GB of compressed wiki pages.

iii) Download manually enwiki-20070527-pages-articles.xml.bz2 and save in apache-mahout-examples/wikipedia/.

iv) Go down to 141st line and comment the following:


<echo>Downloading Clustering data (9.2M)</echo>

<get src="http://people.apache.org/~gsingers/wikipedia/n2.tar.gz"

dest="${wiki.dir}/n2.tar.gz"/>


v) Download manually
n2.tar.gz and save in apache-mahout-examples/wikipedia/.