September 6, 2010

Installing Apache Mahout In Ubuntu 10.04

Apache Mahout is a project that helps in making intelligent applications.

Pre-Requisites

1. JDK 1.6 or higher
2. Ant 1.7 or higher
3. Maven 2.0.9 or 2.0.10 (needed if you want to build Mahout source)

Installation

1. Download the sample code.

2. Unzip the downloaded file.

varsha@varsha-laptop:~$ unzip j-mahout.zip

3. Go into the extracted directory.

varsha@varsha-laptop:~$ cd apache-mahout-examples

4. Install.

varsha@varsha-laptop:~/
apache-mahout-examples$ ant install


Possible errors you may encounter

1. Error: JAVA_HOME is not defined correctly. We cannot execute java Bootstrap FAILED ubuntu

Solution:

varsha@varsha-laptop:~$ export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.20

Look into
/usr/lib/jvm/ to know what JDK is installed on your system.

2. BUILD FAILED
/home/varsha/apache-mahout-examples/build.xml:92: The following error occurred while executing this line:
/home/varsha/apache-mahout-examples/build.xml:85: java.net.ConnectException: Connection timed out


Solution:

i) Go to Line 81, and comment out:


<target name="get-enwiki" depends="check-files" unless="enwiki.exists">

<echo>Downloading Wikipedia Data - (~2.5GB)</echo>

<get src="http://people.apache.org/~gsingers/wikipedia/enwiki-20070527-pages-articles.xml.bz2"

dest="${wiki.dir}/enwiki-20070527-pages-articles.xml.bz2"/>

</target>



ii) About five lines below that, comment out:


<antcall target="get-enwiki"/>

This is 2.5 GB of compressed wiki pages.

iii) Download manually enwiki-20070527-pages-articles.xml.bz2 and save in apache-mahout-examples/wikipedia/.

iv) Go down to 141st line and comment the following:


<echo>Downloading Clustering data (9.2M)</echo>

<get src="http://people.apache.org/~gsingers/wikipedia/n2.tar.gz"

dest="${wiki.dir}/n2.tar.gz"/>


v) Download manually
n2.tar.gz and save in apache-mahout-examples/wikipedia/.



27 comments:

  1. is there any other way to instead of downloading 2.5 GB of data ???

    ReplyDelete
  2. @Ajay: the net speed here doesn't permit. I haven't yet figured out any other way. Using a download accelerator should be good enough.

    ReplyDelete
  3. i downloaded 2.5 gb. .is it enough to unzip that or have to modify the path and all??

    ReplyDelete
  4. @Prem: which path? If the data got downloaded without having to do it manually, I guess it will be zipped automatically according to the script. If it doesn't, you'll have to do it.

    ReplyDelete
  5. @Shravan Kumar: I started learning for a project.

    ReplyDelete
  6. for mahout to be installed,is it neccessary that hadoop should be insatalled??

    ReplyDelete
  7. @Prem: Hadoop is a library, and is not always necessary. And installation doesn't depend on Hadoop.

    ReplyDelete
  8. Hi, I downloaded, can you tell me the next step to run the examples?
    Regards,
    Ravi.

    ReplyDelete
  9. @Ravi: Please check the website. A tutorial has been given.

    ReplyDelete
  10. hai madam we done distributed mode worked out but my problem how to import mahout from eclipse.

    see i done this :
    just download and extracted and going to import but files not importing. we installed ant but with out maven how to install mahout
    my id

    regards
    crswamy929@gmail.com

    ReplyDelete
  11. As a note, Ant gives you command-line building and deploying tools. You can choose to use Maven.

    For using Eclipse, see the documentation that comes with it, or have a look at the Mahout page. Checking the website or the official documentation is more useful.

    The following links may be useful:

    1. https://cwiki.apache.org/confluence/display/MAHOUT/Quickstart

    2. http://help.eclipse.org/indigo/index.jsp?topic=%2Forg.eclipse.platform.doc.user%2FgettingStarted%2Fqs-81_basics.htm

    3. http://ant.apache.org/manual/index.html

    ReplyDelete
  12. hi versha
    see versha i done what ever u given,if u have worked out mahout with eclipse please post it

    Thank you versha
    Regards
    Ranga Swamy
    crswamy929@gmail.com
    08904524975

    ReplyDelete
  13. @Ranga Swamy: please see the documentation. I have provided the links above.

    ReplyDelete
  14. hello i want test any simple algorithm in mahout ubuntu server 11.04..so any suggestion for it..

    ReplyDelete
  15. @Sagar Soni: please refer to the documentation on the Mahout page. I am sure you'll get everything answered.

    ReplyDelete
  16. hello varsha ...i want only simple recommendation

    algorithm for mahout..and i read all ur documents but


    not got about it...

    ReplyDelete
  17. varsha,thanks for this interesting post..

    I have run this examples successfully.then how can i run this examples on hadoop..?

    ReplyDelete
  18. thanks very much for posting this - it saved me as I operate through an HTTP proxy.

    ReplyDelete
  19. You can avoid downloading test data by executing the following command
    mvn clean install -DskipTests=true

    ReplyDelete
  20. Great Job posting this . Saved me a lot of work

    Rahul

    ReplyDelete
  21. I want to use Apache Mahout to built a Prototype.

    The Prototype should calculate the TF (Term Frequency) and the IDF (Inverse Document Frequency) of the input files (.txt/pdf).

    Can you please tell me the how to go bout it?

    Regards
    Junaid
    junaidsurve@gmail.com

    ReplyDelete
  22. Hi Junaid, I am new to Mahout yourself. I'll see if I can help you.

    ReplyDelete
  23. Hi Varsha

    Is it needed to download the wiki 2.5 GB data or is there a way I can bypass it?

    Can I use some other data instead...say the data on which I want to do mining using mahout?

    ReplyDelete
  24. Hi Varsha
    It is indeed very nice tutorial. But as I read on apache mahout wiki, mahout runs on Hadoop cluster. So do I need to install hadoop on my machine?

    I am very new to this things, so thanks for helping me out.

    ReplyDelete
  25. Hi,

    I had just started learning Mahout when I posted this. Would like to be of help, but I'd suggest you look up the Mahout webpage for help.

    ReplyDelete