Its all about Data: Hadoop Setup on Multinode Cluster(Linux)

Thursday, May 23, 2013

Hadoop Setup on Multinode Cluster(Linux)

Step1 : Create new user for hadoop hduser or other one like as on ubuntu/Red Hat

useradd hduser
passwd hduser
/*Type the password*/

Step2 : Create new group hadoop and add hduser to in that.
addgroup hadoop
adduser --ingroup hadoop hduser

Do all the following process from /home/hduser other permission denied problem occur at some steps.
Step 3 : Download hadoop tar file

Step 4 : Extract in /home/hduser/

Step 5 : Disable IPV6 as :
open /etc/sysctl.conf and add these lines in that :
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Step 6 : Check whether IPv6 is enabled on your machine with the following command:
cat /proc/sys/net/ipv6/conf/all/disable_ipv6

Step 7 : Add the entry of all m/c in /etc/hosts. If you are not able to edit this, then change the permission of file from root to every one access permission
(chmod 777 /etc/hosts)

like as :

152.144.198.245 tarunrhels1
152.144.198.246 tarunrhels2
152.144.198.247 tarunrhels3

In this tarunrhels1,tarunrhels2 and tarunrhels3 are m/c names which are we using for hadoop cluster. It includes bot Namenodes and Datanodes.

Steps for Name node

DO all the above 7 steps for each m/c which will be use for hadoop.
For hadoop setup, we have to crate one Namenode and others Datanode.

Step 8 : If ssh is not running on m/c then first install ssh.

Generate ssh key for hduser as :
ssh-keygen -t rsa -P ""

Step 9 : Enable SSH access to local machine with this newly created key as :
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Step 10 : Hadoop creates temporary directory both for the local file system and HDFS where it generate data files.
For local system create directory as :
mkdir -p /home/hduser/hdfs

Step 11 : Change the JAVA_HOME path in conf/hadoop-env.sh file according to java installed on linux m/c.

Step12 : Change the conf/core-site.xml as

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>



<configuration>

<property>
<name>fs.default.name</name>
<value>hdfs://152.144.198.245</value>
<description>The name of the default file system. Either the
literal string "local" or a host:port for NDFS.
</description>
<final>true</final>
</property>

</configuration>

Step 13 : Change conf/mapre-site.xml as :

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>



<configuration>

<property>
<name>mapred.job.tracker</name>
<value>152.144.198.245:50300</value>
<final>true</final>
</property>

<property>
<name>mapred.system.dir</name>
<value>/home/marvin1/mapred/system</value>
<final>true</final>
</property>

<property>
<name>mapred.local.dir</name>
<value>/home/marvin1/cache/mapred/local</value>
<final>true</final>
</property>

</configuration>

Step 14 : Change the conf/hdfs-site.xml as

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>



<configuration>

<property>
<name>dfs.name.dir</name>
<value>/home/hduser/hdfs/name</value>
<description>Determines where on the local filesystem the DFS name
node should store the name table. If this is a comma-delimited list
of directories then the name table is replicated in all of the
directories, for redundancy.
</description>
<final>true</final>
</property>

<property>
<name>dfs.data.dir</name>
<value>/home/hduser/hdfs/data</value>
<description>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories,
then data will be stored in all named directories, typically on different devices.Directories that do not exist are ignored.
</description>
<final>true</final>
</property>

</configuration>

Step 15 : Add new file as conf/masters
In that masters file add the entry of Namenode. Master file entry will be ip address of m/c or localhost.

Step16 : Add new file as conf/slaves
In this slaves file add entry of slaves/Datanodes. In this add ip of each slave m/c. If we want to treat master node as Datanode also then add entry of master node in slaves also.
slaves file look like as :

152.144.198.245 tarunrhels1
152.144.198.246 tarunrhels2
152.144.198.247 tarunrhels3

Step 17 : To format the filesystem for hdfs run the command as
/home/hduser/hadoop/bin/hadoop namenode -format

###############################################################################
Do the following step at each Datanode.

Login to slave/datanode(152.144.198.246) and do these steps

Step 1 to Step 7 as mentioned above.

Step A : Copy conf/core-site.xml,conf/mapred-site.xml and conf/hdfs-site.xml from Namenode/Master m/c to Datanode/Slave m/c at same conf/core-site.xml,conf/mapred-site.xml and conf/hdfs-site.xml path by scp command.

Step B : Copy id_rsa.pub from master node to slave node through scp as

scp hduser@152.144.198.245:/home/hduser/.ssh/id_rsa.pub /home/hduser/.ssh/

Step C : Do this :
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Login to master node again and do these steps

Step X_1 : Start hadoop processes from master node as :
/home/hduser/hadoop/bin/start-all.sh

Step X_2 : Check the processes through jps command, it should list down processes as:

50682 TaskTracker
50471 JobTracker
49554 Jps
50084 DataNode
50281 SecondaryNameNode
49881 NameNode

Step X_3 : Same jps command run at each slave/Datanode m/c. It should diplay results as :

62216 Jps
8122 TaskTracker

Tips :

T1) If jps command is not working, then add your java installation in PATH variable as :
export PATH = $PATH:/usr/java/jdk1.7.0_15/bin

jps helps to check the status of hadoop processes.

T2) check with netstat if Hadoop is listening on the configured port or not on master m/c as :
netstat -plten | grep java

If port you had added in conf/mapred-site.xml in some conflicting state by checking in above command. Then change the port number from conf/mapred-site.xml and start the hadoop process again. Then again check port status through above command.

T3) Sometime hadoop not able to write on hdfs it gives error of permission denied or java.io.IOException, then do these steps at master m/c.
Step a) Stop the hadoop process as
/home/hduser/hadoop/bin/stop-all.sh
Step b) delete data and name folder from /home/hduser/hdfs folder
Step c) start the hadoop again through /home/hduser/hadoop/bin/start-all.sh
Hopefully by this it will start working.

T4) Sometime face the proble of safemode. If safemode is on, hadoop start giving error. Then off the safemode by this :
/home/hduser/hadoop/bin/hadoop dfsadmin -safemode leave

25 comments:

mareddyonlineJuly 19, 2014 at 9:59 PM
hi ,you have gathered a valuable information on Hadoop...., and i am much impressed with the information and it is useful for Hadoop Learners.These blogs are valuable because these are providing such informative information for all the people.
Hadoop Training in hyderabad
ReplyDelete
Replies
UnknownApril 2, 2015 at 11:24 PM
Uniqe informative article and of course True words, thanks for sharing. Today I see myself proud to be a hadoop professional with strong dedication and will power by blasting the obstacles. Thanks to Hadoop training institute in chennai
ReplyDelete
Replies
UnknownApril 16, 2015 at 11:46 PM
I get a valuable information about hadoop setup. Thanks for sharing the information
AWS course chennai | AWS Certification in chennai | AWS Certification chennai
ReplyDelete
Replies
surangacloudApril 16, 2015 at 11:47 PM
Each step by step process is clearly explaining. The overall explanation is very good for the beginners.Nice article.
Cloud Computing Training in chennai | Cloud Computing Training chennai | Cloud Computing Course in chennai | Cloud Computing Course chennai

ReplyDelete
Replies
UnknownApril 16, 2015 at 11:50 PM
I learn a worthful information by this training.This makes very helpful for future reference.Thank you very much.
VMWare Training in chennai | VMWare Training chennai | VMWare course in chennai
ReplyDelete
Replies
UnknownApril 29, 2015 at 4:04 AM
I have read your post, it was good to read & I am getting some useful info's through your blog keep sharing...
JAVA Training in Chennai | JAVA Training Institutes in Chennai
ReplyDelete
Replies
UnknownAugust 30, 2015 at 11:15 PM
Thanks for sharing your informative article on Hive ODBC Driver. Your article is very descriptive and assists me to learn whole concept in detail. Hadoop Training in Chennai
ReplyDelete
Replies
MelisaDecember 7, 2015 at 2:48 AM
Thanks for splitting your comprehension with us. It’s really useful to me & I hope it helps the people who in need of this vital information.
Regards,
ccna course in Chennai|ccna training in Chennai
ccna training institute in Chennai
ReplyDelete
Replies
UnknownDecember 10, 2015 at 3:41 AM
Excellent Post, I welcome your interest about to post blogs. It will help many of them to update their skills in their interesting field.
Regards,
sas training in Chennai|sas course in Chennai|sas training institute in Chennai
ReplyDelete
Replies
UnknownDecember 16, 2015 at 3:40 AM
Truely a very good article on how to handle the future technology. This content creates a new hope and inspiration within me. Thanks for sharing article like this. The way you have stated everything above is quite awesome. Keep blogging like this. Thanks :)

Software testing training in chennai | Testing training in chennai | Software testing course in chennai
ReplyDelete
Replies
UnknownJanuary 17, 2016 at 3:37 AM
Cloud computing is the next big thing, through cloud the users have the liberty to use a shared network. The companies can focus on core business parts rather than investing heavily on infrastucture.
cloud computing training in chennai|cloud computing courses in chennai|cloud computing training
ReplyDelete
Replies
UnknownFebruary 12, 2016 at 4:26 AM
All are saying the same thing repeatedly, but in your blog I had a chance to get some useful and unique information, I love your writing style very much, I would like to suggest your blog in my dude circle, so keep on updates…
Regards
Angularjs training in chennai|Angularjs training chennai|Angularjs course in chennai|Angularjs training center in Chennai
ReplyDelete
Replies
akshayaOctober 13, 2018 at 2:49 AM
This information is impressive. I am inspired with your post writing style & how continuously you describe this topic. Eagerly waiting for your new blog keep doing more.
IELTS Coaching Classes in Mumbai
IELTS Course in Mumbai
IELTS Institute in Mumbai
Best IELTS Coaching Classes in Mumbai
IELTS Coaching Center in Mumbai
Best IELTS Classes in Mumbai
ReplyDelete
Replies
mercyroyOctober 15, 2018 at 10:25 PM
This blog is more effective and it is very much useful for me.
we need more information please keep update more.
Cloud computing Training in Bangalore
Cloud computing courses in Anna Nagar
Cloud Computing Training in T nagar
Cloud Computing Training in OMR
ReplyDelete
Replies
Aruna RamOctober 16, 2018 at 10:55 PM
What an awesome post, I just read it from start to end. Learned something new after a long time.
Big Data Hadoop Training in Tnagar
Big Data Hadoop Training in Nungambakkam
Big Data Hadoop Training in Saidapet
Big Data Hadoop Training in sholinganallur
Big Data Hadoop Training in navalur
Big Data Hadoop Training in kelambakkam
ReplyDelete
Replies
Anjali SivaOctober 24, 2018 at 11:28 PM
Very good blog, thanks for sharing such a wonderful blog with us. Keep sharing such worthy information to my vision.
ccna Training in Chennai
ccna Training near me
ccna course in Chennai
RPA Training in Chennai
Angularjs Training in Chennai
AWS Training in Chennai
ReplyDelete
Replies
VenuBharath2010@gmail.comNovember 16, 2018 at 3:37 AM
Wonderful piece of work. Master stroke. I have become a fan of your words. Pls keep on writing.

Drupal Training in Chennai
Drupal Software
Drupal Training
Drupal 8 Training
Drupal Classes
Drupal 7 Training
Drupal Certification Training
Drupal Training Course
Drupal 7 Certification
ReplyDelete
Replies
LindaJasmineDecember 4, 2018 at 3:48 AM

Extra-ordinary. The way you narrate the post makes it a exemplorary piece of work. Pls Keep writing.
Tableau Training in Chennai
Tableau Course in Chennai
Tableau Certification in Chennai
Tableau Training Institutes in Chennai
Tableau Certification
Tableau Training
Tableau Course
ReplyDelete
Replies
jefrinJanuary 17, 2019 at 4:10 AM
Very impressive to read the post

Tableau training in chennai
ReplyDelete
Replies
shinyMarch 11, 2019 at 8:55 PM
Interesting to learn a lot about HAdoop.
honor service center chennai
honor service center in chennai
honor service centre chennai
honor service centre
ReplyDelete
Replies
AnonymousMay 26, 2020 at 4:58 AM
So we always need to study around the things and the new part of educations with that we are not mindful.

Big Data Hadoop Training In Chennai | Big Data Hadoop Training In anna nagar | Big Data Hadoop Training In omr | Big Data Hadoop Training In porur | Big Data Hadoop Training In tambaram | Big Data Hadoop Training In velachery

ReplyDelete
Replies
kishorOctober 27, 2021 at 7:05 AM
hi thanku so much this infromation thanku so much
Wordpress
milesweb-review
ReplyDelete
Replies
UnknownDecember 16, 2021 at 8:01 PM
Thank You For The Auspicious Write-up. It In Reality Used To Be A Amusement Account. VT Markets Glance Advanced To More Introduced Agreeable From You! By The Way, How Can We Communicate?
ReplyDelete
Replies
UnknownDecember 16, 2021 at 8:01 PM
That Is Very Interesting, You’re An Excessively Professional Blogger. I Have Joined Your Feed And Sit Up For Searching For More Of Your VT Markets Wonderful Post. Also, I’ve Shared Your Web Site In My Social Networks
ReplyDelete
Replies
iteducationcentreFebruary 8, 2024 at 9:25 PM
Thanks for posting such an wonderful post.
Linux training in Pune
ReplyDelete
Replies

Add comment