Installing Apache Mahout In Ubuntu 10.04

September 06, 2010

Apache Mahout is a project that helps in making intelligent applications.

Pre-Requisites

1. JDK 1.6 or higher
2. Ant 1.7 or higher
3. Maven 2.0.9 or 2.0.10 (needed if you want to build Mahout source)

Installation

1. Download the sample code.

2. Unzip the downloaded file.

varsha@varsha-laptop:~$ unzip j-mahout.zip

3. Go into the extracted directory.

varsha@varsha-laptop:~$ cd apache-mahout-examples

4. Install.

varsha@varsha-laptop:~/apache-mahout-examples$ ant install

Possible errors you may encounter

1. Error: JAVA_HOME is not defined correctly. We cannot execute java Bootstrap FAILED ubuntu

Solution:

varsha@varsha-laptop:~$ export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.20

Look into /usr/lib/jvm/ to know what JDK is installed on your system.

2. BUILD FAILED
/home/varsha/apache-mahout-examples/build.xml:92: The following error occurred while executing this line:
/home/varsha/apache-mahout-examples/build.xml:85: java.net.ConnectException: Connection timed out

Solution:

i) Go to Line 81, and comment out:


<target name="get-enwiki" depends="check-files" unless="enwiki.exists">

  <echo>Downloading Wikipedia Data - (~2.5GB)</echo>

  <get src="http://people.apache.org/~gsingers/wikipedia/enwiki-20070527-pages-articles.xml.bz2"

  dest="${wiki.dir}/enwiki-20070527-pages-articles.xml.bz2"/>

</target>

ii) About five lines below that, comment out:


<antcall target="get-enwiki"/>

This is 2.5 GB of compressed wiki pages.

iii) Download manually enwiki-20070527-pages-articles.xml.bz2 and save in apache-mahout-examples/wikipedia/.

iv) Go down to 141st line and comment the following:

<echo>Downloading Clustering data (9.2M)</echo>

  <get src="http://people.apache.org/~gsingers/wikipedia/n2.tar.gz"

       dest="${wiki.dir}/n2.tar.gz"/>

v) Download manually n2.tar.gz and save in apache-mahout-examples/wikipedia/.

Comments

AjayOctober 2, 2010 at 11:16 PM
is there any other way to instead of downloading 2.5 GB of data ???
ReplyDelete
Replies
Varsha JaikumarOctober 3, 2010 at 12:05 AM
@Ajay: the net speed here doesn't permit. I haven't yet figured out any other way. Using a download accelerator should be good enough.
ReplyDelete
Replies
UnknownMarch 15, 2011 at 9:56 PM
i downloaded 2.5 gb. .is it enough to unzip that or have to modify the path and all??
ReplyDelete
Replies
Varsha JaikumarMarch 15, 2011 at 11:04 PM
@Prem: which path? If the data got downloaded without having to do it manually, I guess it will be zipped automatically according to the script. If it doesn't, you'll have to do it.
ReplyDelete
Replies
shravankumarMarch 18, 2011 at 4:06 AM
do u know hadoop
ReplyDelete
Replies
Varsha JaikumarMarch 18, 2011 at 4:45 AM
@Shravan Kumar: I started learning for a project.
ReplyDelete
Replies
UnknownMarch 26, 2011 at 3:01 AM
for mahout to be installed,is it neccessary that hadoop should be insatalled??
ReplyDelete
Replies
Varsha JaikumarMarch 26, 2011 at 3:22 AM
@Prem: Hadoop is a library, and is not always necessary. And installation doesn't depend on Hadoop.
ReplyDelete
Replies
AnonymousApril 24, 2011 at 7:33 PM
Hi, I downloaded, can you tell me the next step to run the examples?
Regards,
Ravi.
ReplyDelete
Replies
Varsha JaikumarApril 25, 2011 at 5:50 AM
@Ravi: Please check the website. A tutorial has been given.
ReplyDelete
Replies
AnonymousJune 27, 2011 at 5:39 AM
hai madam we done distributed mode worked out but my problem how to import mahout from eclipse.

see i done this :
just download and extracted and going to import but files not importing. we installed ant but with out maven how to install mahout
my id

regards
crswamy929@gmail.com
ReplyDelete
Replies
Varsha JaikumarJune 27, 2011 at 7:58 AM
As a note, Ant gives you command-line building and deploying tools. You can choose to use Maven.

For using Eclipse, see the documentation that comes with it, or have a look at the Mahout page. Checking the website or the official documentation is more useful.

The following links may be useful:

1. https://cwiki.apache.org/confluence/display/MAHOUT/Quickstart

2. http://help.eclipse.org/indigo/index.jsp?topic=%2Forg.eclipse.platform.doc.user%2FgettingStarted%2Fqs-81_basics.htm

3. http://ant.apache.org/manual/index.html
ReplyDelete
Replies
Ranga SwamyJuly 5, 2011 at 11:44 PM
hi versha
see versha i done what ever u given,if u have worked out mahout with eclipse please post it

Thank you versha
Regards
Ranga Swamy
crswamy929@gmail.com
08904524975
ReplyDelete
Replies
Varsha JaikumarJuly 6, 2011 at 6:34 AM
@Ranga Swamy: please see the documentation. I have provided the links above.
ReplyDelete
Replies
UnknownAugust 25, 2011 at 12:18 PM
hello i want test any simple algorithm in mahout ubuntu server 11.04..so any suggestion for it..
ReplyDelete
Replies
Varsha JaikumarAugust 25, 2011 at 10:46 PM
@Sagar Soni: please refer to the documentation on the Mahout page. I am sure you'll get everything answered.
ReplyDelete
Replies
UnknownAugust 27, 2011 at 3:40 AM
hello varsha ...i want only simple recommendation

algorithm for mahout..and i read all ur documents but

not got about it...
ReplyDelete
Replies
vigneshSeptember 9, 2011 at 1:24 PM
varsha,thanks for this interesting post..

I have run this examples successfully.then how can i run this examples on hadoop..?
ReplyDelete
Replies
AnonymousSeptember 28, 2011 at 11:00 AM
thanks very much for posting this - it saved me as I operate through an HTTP proxy.
ReplyDelete
Replies
ശബ്ദശക്തി വ്യാഖ്യാകാരഃOctober 4, 2011 at 1:30 AM
You can avoid downloading test data by executing the following command
mvn clean install -DskipTests=true
ReplyDelete
Replies
AnonymousDecember 13, 2011 at 9:31 PM
Great Job posting this . Saved me a lot of work

Rahul
ReplyDelete
Replies
AnonymousJanuary 3, 2012 at 11:38 PM
I want to use Apache Mahout to built a Prototype.

The Prototype should calculate the TF (Term Frequency) and the IDF (Inverse Document Frequency) of the input files (.txt/pdf).

Can you please tell me the how to go bout it?

Regards
Junaid
junaidsurve@gmail.com
ReplyDelete
Replies
Varsha JaikumarJanuary 4, 2012 at 8:17 PM
Hi Junaid, I am new to Mahout yourself. I'll see if I can help you.
ReplyDelete
Replies
WilfredFebruary 17, 2012 at 12:16 AM
Thanks alot Varsha. :)
ReplyDelete
Replies
UnknownApril 12, 2012 at 3:26 AM
Hi Varsha

Is it needed to download the wiki 2.5 GB data or is there a way I can bypass it?

Can I use some other data instead...say the data on which I want to do mining using mahout?
ReplyDelete
Replies
UnknownOctober 31, 2012 at 2:21 PM
Hi Varsha
It is indeed very nice tutorial. But as I read on apache mahout wiki, mahout runs on Hadoop cluster. So do I need to install hadoop on my machine?

I am very new to this things, so thanks for helping me out.
ReplyDelete
Replies
Varsha JaikumarNovember 1, 2012 at 10:52 AM
Hi,

I had just started learning Mahout when I posted this. Would like to be of help, but I'd suggest you look up the Mahout webpage for help.
ReplyDelete
Replies

Add comment

Varsha Jaikumar

Search This Blog

Installing Apache Mahout In Ubuntu 10.04

Comments

Post a Comment