Thursday, May 23, 2013

Hadoop Setup on Multinode Cluster(Linux)


Step1 : Create new user for hadoop hduser or other one like as on ubuntu/Red Hat

useradd hduser
passwd hduser
/*Type the password*/

Step2 : Create new group hadoop and add hduser to in that.
addgroup hadoop
adduser --ingroup hadoop hduser

 Do all the following process from /home/hduser other permission denied problem occur at some steps.
Step 3 : Download hadoop tar file

Step 4 : Extract in /home/hduser/

Step 5 : Disable IPV6 as :
 open /etc/sysctl.conf and add these lines in that :
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

 Step 6 : Check whether IPv6 is enabled on your machine with the following command:
cat /proc/sys/net/ipv6/conf/all/disable_ipv6



Step 7 : Add the entry of all m/c in /etc/hosts. If you are not able to edit this, then change the permission of file from root to every one access permission
(chmod 777 /etc/hosts)

 like as :

152.144.198.245 tarunrhels1
152.144.198.246 tarunrhels2
152.144.198.247 tarunrhels3


In this tarunrhels1,tarunrhels2 and tarunrhels3 are m/c names which are we using for hadoop cluster. It includes bot Namenodes and Datanodes.



 Steps for Name node

 DO all the above 7 steps for each m/c which will be use for hadoop.
For hadoop setup, we have to crate one Namenode and others Datanode.


Step 8 : If ssh is not running on m/c then first install ssh.

 Generate ssh key for hduser as :
ssh-keygen -t rsa -P ""

Step 9 : Enable SSH access to local machine with this newly created key as :
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Step 10  :  Hadoop creates temporary directory both for the local file system and HDFS where it generate data files.
For local system create directory as :
mkdir -p /home/hduser/hdfs

Step 11 : Change the  JAVA_HOME path in conf/hadoop-env.sh file according to java installed on linux m/c.

Step12 : Change the conf/core-site.xml as

 <?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

 <!-- Put site-specific property overrides in this file. -->

 <configuration>

 <property>
    <name>fs.default.name</name>
        <value>hdfs://152.144.198.245</value>
            <description>The name of the default file system.  Either the
                  literal string "local" or a host:port for NDFS.
            </description>
        <final>true</final>
</property>


 </configuration>



Step 13 : Change conf/mapre-site.xml as :

 <?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

 <!-- Put site-specific property overrides in this file. -->

 <configuration>

 <property>
    <name>mapred.job.tracker</name>
        <value>152.144.198.245:50300</value>
            <final>true</final>
            </property>

 <property>
    <name>mapred.system.dir</name>
        <value>/home/marvin1/mapred/system</value>
            <final>true</final>
            </property>

 <property>
    <name>mapred.local.dir</name>
        <value>/home/marvin1/cache/mapred/local</value>
            <final>true</final>
            </property>


 </configuration>


Step 14 : Change the conf/hdfs-site.xml as

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

 <property>
     <name>dfs.name.dir</name>
         <value>/home/hduser/hdfs/name</value>
             <description>Determines where on the local filesystem the DFS name
node should store the name table.  If this is a comma-delimited list
of directories then the name table is replicated in all of the
                  directories, for redundancy.
                </description>
          <final>true</final>
  </property>

  <property>
      <name>dfs.data.dir</name>
          <value>/home/hduser/hdfs/data</value>
              <description>Determines where on the local filesystem an DFS data node should store its blocks.  If this is a comma-delimited list of directories,
then data will be stored in all named directories, typically on different devices.Directories that do not exist are ignored.
              </description>
      <final>true</final>
   </property>


</configuration>



 Step 15 : Add new file as conf/masters
In that masters file add the entry of Namenode. Master file entry will be ip address of m/c or localhost.

Step16 : Add new file as conf/slaves
In this slaves file add entry of slaves/Datanodes. In this add ip of each slave m/c. If we want to treat master node as Datanode also then add entry of master node in slaves also.
slaves file look like as :

152.144.198.245 tarunrhels1
152.144.198.246 tarunrhels2
152.144.198.247 tarunrhels3

Step 17 : To format the filesystem for hdfs run the command as
/home/hduser/hadoop/bin/hadoop namenode -format

 ###############################################################################
Do the following step at each Datanode.

 Login to slave/datanode(152.144.198.246) and do these steps

 Step 1 to Step 7 as mentioned above.

Step A : Copy conf/core-site.xml,conf/mapred-site.xml and conf/hdfs-site.xml from Namenode/Master m/c to Datanode/Slave m/c at same conf/core-site.xml,conf/mapred-site.xml and conf/hdfs-site.xml path by scp command.

Step B :  Copy id_rsa.pub from master node to slave node through scp as

scp hduser@152.144.198.245:/home/hduser/.ssh/id_rsa.pub /home/hduser/.ssh/

Step C : Do this :
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys


Login to master node again and do these steps

Step X_1 : Start hadoop processes from master node as :
/home/hduser/hadoop/bin/start-all.sh

Step X_2 : Check the processes through jps command, it should list down processes as:


 50682 TaskTracker
50471 JobTracker
49554 Jps
50084 DataNode
50281 SecondaryNameNode
49881 NameNode


Step X_3 : Same jps command run at each slave/Datanode m/c. It should diplay results as :

62216 Jps
8122 TaskTracker


Tips :

T1) If jps command is not working, then add your java installation in PATH variable as :
export PATH = $PATH:/usr/java/jdk1.7.0_15/bin

jps helps to check the status of hadoop processes.

T2) check with netstat if Hadoop is listening on the configured port or not on master m/c as :
netstat -plten | grep java

If port you had added in conf/mapred-site.xml in some conflicting state by checking in above command. Then change the port number from conf/mapred-site.xml and start the hadoop process again. Then again check port status through above command.

T3) Sometime hadoop not able to write on hdfs it gives error of permission denied or java.io.IOException, then do these steps at master m/c.
Step a) Stop the hadoop process as
/home/hduser/hadoop/bin/stop-all.sh
Step b) delete data and name folder from /home/hduser/hdfs folder
Step c) start the hadoop again through /home/hduser/hadoop/bin/start-all.sh
Hopefully by this it will start working.

T4) Sometime face the proble of safemode. If safemode is on, hadoop start giving error. Then off the safemode by this :
/home/hduser/hadoop/bin/hadoop dfsadmin -safemode leave


25 comments:

  1. hi ,you have gathered a valuable information on Hadoop...., and i am much impressed with the information and it is useful for Hadoop Learners.These blogs are valuable because these are providing such informative information for all the people.
    Hadoop Training in hyderabad

    ReplyDelete
  2. Uniqe informative article and of course True words, thanks for sharing. Today I see myself proud to be a hadoop professional with strong dedication and will power by blasting the obstacles. Thanks to Hadoop training institute in chennai

    ReplyDelete
  3. I get a valuable information about hadoop setup. Thanks for sharing the information
    AWS course chennai | AWS Certification in chennai | AWS Certification chennai

    ReplyDelete
  4. Each step by step process is clearly explaining. The overall explanation is very good for the beginners.Nice article.
    Cloud Computing Training in chennai | Cloud Computing Training chennai | Cloud Computing Course in chennai | Cloud Computing Course chennai

    ReplyDelete
  5. I learn a worthful information by this training.This makes very helpful for future reference.Thank you very much.
    VMWare Training in chennai | VMWare Training chennai | VMWare course in chennai

    ReplyDelete
  6. I have read your post, it was good to read & I am getting some useful info's through your blog keep sharing...
    JAVA Training in Chennai | JAVA Training Institutes in Chennai

    ReplyDelete
  7. Thanks for sharing your informative article on Hive ODBC Driver. Your article is very descriptive and assists me to learn whole concept in detail. Hadoop Training in Chennai

    ReplyDelete
  8. Thanks for splitting your comprehension with us. It’s really useful to me & I hope it helps the people who in need of this vital information.
    Regards,
    ccna course in Chennai|ccna training in Chennai
    ccna training institute in Chennai

    ReplyDelete
  9. Excellent Post, I welcome your interest about to post blogs. It will help many of them to update their skills in their interesting field.
    Regards,
    sas training in Chennai|sas course in Chennai|sas training institute in Chennai

    ReplyDelete
  10. Truely a very good article on how to handle the future technology. This content creates a new hope and inspiration within me. Thanks for sharing article like this. The way you have stated everything above is quite awesome. Keep blogging like this. Thanks :)

    Software testing training in chennai | Testing training in chennai | Software testing course in chennai

    ReplyDelete
  11. Cloud computing is the next big thing, through cloud the users have the liberty to use a shared network. The companies can focus on core business parts rather than investing heavily on infrastucture.
    cloud computing training in chennai|cloud computing courses in chennai|cloud computing training

    ReplyDelete
  12. All are saying the same thing repeatedly, but in your blog I had a chance to get some useful and unique information, I love your writing style very much, I would like to suggest your blog in my dude circle, so keep on updates…
    Regards
    Angularjs training in chennai|Angularjs training chennai|Angularjs course in chennai|Angularjs training center in Chennai

    ReplyDelete
  13. This information is impressive. I am inspired with your post writing style & how continuously you describe this topic. Eagerly waiting for your new blog keep doing more.
    IELTS Coaching Classes in Mumbai
    IELTS Course in Mumbai
    IELTS Institute in Mumbai
    Best IELTS Coaching Classes in Mumbai
    IELTS Coaching Center in Mumbai
    Best IELTS Classes in Mumbai

    ReplyDelete
  14. hi thanku so much this infromation thanku so much
    Wordpress
    milesweb-review

    ReplyDelete
  15. Thank You For The Auspicious Write-up. It In Reality Used To Be A Amusement Account. VT Markets Glance Advanced To More Introduced Agreeable From You! By The Way, How Can We Communicate?

    ReplyDelete
  16. That Is Very Interesting, You’re An Excessively Professional Blogger. I Have Joined Your Feed And Sit Up For Searching For More Of Your VT Markets Wonderful Post. Also, I’ve Shared Your Web Site In My Social Networks

    ReplyDelete