Apache Mahout is a project that helps in making intelligent applications.
Pre-Requisites
1. JDK 1.6 or higher
2. Ant 1.7 or higher
3. Maven 2.0.9 or 2.0.10 (needed if you want to build Mahout source)
Installation
1. Download the sample code.
2. Unzip the downloaded file.
varsha@varsha-laptop:~$ unzip j-mahout.zip
3. Go into the extracted directory.
varsha@varsha-laptop:~$ cd apache-mahout-examples
4. Install.
varsha@varsha-laptop:~/apache-mahout-examples$ ant install
Possible errors you may encounter
1. Error: JAVA_HOME is not defined correctly. We cannot execute java Bootstrap FAILED ubuntu
Solution:
varsha@varsha-laptop:~$ export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.20
Look into /usr/lib/jvm/ to know what JDK is installed on your system.
2. BUILD FAILED
/home/varsha/apache-mahout-examples/build.xml:92: The following error occurred while executing this line:
/home/varsha/apache-mahout-examples/build.xml:85: java.net.ConnectException: Connection timed out
Solution:
i) Go to Line 81, and comment out:
ii) About five lines below that, comment out:
This is 2.5 GB of compressed wiki pages.
iii) Download manually enwiki-20070527-pages-articles.xml.bz2 and save in apache-mahout-examples/wikipedia/.
iv) Go down to 141st line and comment the following:
v) Download manually n2.tar.gz and save in apache-mahout-examples/wikipedia/.
Pre-Requisites
1. JDK 1.6 or higher
2. Ant 1.7 or higher
3. Maven 2.0.9 or 2.0.10 (needed if you want to build Mahout source)
Installation
1. Download the sample code.
2. Unzip the downloaded file.
varsha@varsha-laptop:~$ unzip j-mahout.zip
3. Go into the extracted directory.
varsha@varsha-laptop:~$ cd apache-mahout-examples
4. Install.
varsha@varsha-laptop:~/apache-mahout-examples$ ant install
Possible errors you may encounter
1. Error: JAVA_HOME is not defined correctly. We cannot execute java Bootstrap FAILED ubuntu
Solution:
varsha@varsha-laptop:~$ export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.20
Look into /usr/lib/jvm/ to know what JDK is installed on your system.
2. BUILD FAILED
/home/varsha/apache-mahout-examples/build.xml:92: The following error occurred while executing this line:
/home/varsha/apache-mahout-examples/build.xml:85: java.net.ConnectException: Connection timed out
Solution:
i) Go to Line 81, and comment out:
<target name="get-enwiki" depends="check-files" unless="enwiki.exists">
<echo>Downloading Wikipedia Data - (~2.5GB)</echo>
<get src="http://people.apache.org/~gsingers/wikipedia/enwiki-20070527-pages-articles.xml.bz2"
dest="${wiki.dir}/enwiki-20070527-pages-articles.xml.bz2"/>
</target>
ii) About five lines below that, comment out:
<antcall target="get-enwiki"/>
This is 2.5 GB of compressed wiki pages.
iii) Download manually enwiki-20070527-pages-articles.xml.bz2 and save in apache-mahout-examples/wikipedia/.
iv) Go down to 141st line and comment the following:
<echo>Downloading Clustering data (9.2M)</echo>
<get src="http://people.apache.org/~gsingers/wikipedia/n2.tar.gz"
dest="${wiki.dir}/n2.tar.gz"/>
v) Download manually n2.tar.gz and save in apache-mahout-examples/wikipedia/.
is there any other way to instead of downloading 2.5 GB of data ???
ReplyDelete@Ajay: the net speed here doesn't permit. I haven't yet figured out any other way. Using a download accelerator should be good enough.
ReplyDeletei downloaded 2.5 gb. .is it enough to unzip that or have to modify the path and all??
ReplyDelete@Prem: which path? If the data got downloaded without having to do it manually, I guess it will be zipped automatically according to the script. If it doesn't, you'll have to do it.
ReplyDeletedo u know hadoop
ReplyDelete@Shravan Kumar: I started learning for a project.
ReplyDeletefor mahout to be installed,is it neccessary that hadoop should be insatalled??
ReplyDelete@Prem: Hadoop is a library, and is not always necessary. And installation doesn't depend on Hadoop.
ReplyDeleteHi, I downloaded, can you tell me the next step to run the examples?
ReplyDeleteRegards,
Ravi.
@Ravi: Please check the website. A tutorial has been given.
ReplyDeletehai madam we done distributed mode worked out but my problem how to import mahout from eclipse.
ReplyDeletesee i done this :
just download and extracted and going to import but files not importing. we installed ant but with out maven how to install mahout
my id
regards
crswamy929@gmail.com
As a note, Ant gives you command-line building and deploying tools. You can choose to use Maven.
ReplyDeleteFor using Eclipse, see the documentation that comes with it, or have a look at the Mahout page. Checking the website or the official documentation is more useful.
The following links may be useful:
1. https://cwiki.apache.org/confluence/display/MAHOUT/Quickstart
2. http://help.eclipse.org/indigo/index.jsp?topic=%2Forg.eclipse.platform.doc.user%2FgettingStarted%2Fqs-81_basics.htm
3. http://ant.apache.org/manual/index.html
hi versha
ReplyDeletesee versha i done what ever u given,if u have worked out mahout with eclipse please post it
Thank you versha
Regards
Ranga Swamy
crswamy929@gmail.com
08904524975
@Ranga Swamy: please see the documentation. I have provided the links above.
ReplyDeletehello i want test any simple algorithm in mahout ubuntu server 11.04..so any suggestion for it..
ReplyDelete@Sagar Soni: please refer to the documentation on the Mahout page. I am sure you'll get everything answered.
ReplyDeletehello varsha ...i want only simple recommendation
ReplyDeletealgorithm for mahout..and i read all ur documents but
not got about it...
varsha,thanks for this interesting post..
ReplyDeleteI have run this examples successfully.then how can i run this examples on hadoop..?
thanks very much for posting this - it saved me as I operate through an HTTP proxy.
ReplyDeleteYou can avoid downloading test data by executing the following command
ReplyDeletemvn clean install -DskipTests=true
Great Job posting this . Saved me a lot of work
ReplyDeleteRahul
I want to use Apache Mahout to built a Prototype.
ReplyDeleteThe Prototype should calculate the TF (Term Frequency) and the IDF (Inverse Document Frequency) of the input files (.txt/pdf).
Can you please tell me the how to go bout it?
Regards
Junaid
junaidsurve@gmail.com
Hi Junaid, I am new to Mahout yourself. I'll see if I can help you.
ReplyDeleteThanks alot Varsha. :)
ReplyDeleteHi Varsha
ReplyDeleteIs it needed to download the wiki 2.5 GB data or is there a way I can bypass it?
Can I use some other data instead...say the data on which I want to do mining using mahout?
Hi Varsha
ReplyDeleteIt is indeed very nice tutorial. But as I read on apache mahout wiki, mahout runs on Hadoop cluster. So do I need to install hadoop on my machine?
I am very new to this things, so thanks for helping me out.
Hi,
ReplyDeleteI had just started learning Mahout when I posted this. Would like to be of help, but I'd suggest you look up the Mahout webpage for help.