Build Hadoop environment in Linux

Hadoop standalone installation:

1. Install JDK

Install JDK with below command:

sudo apt-get install sun-java6-jdk

Configure Java environment, open /etc/profile, add below contents:

export JAVA_HOME = ï¼ˆJava installation directoryï¼‰
export CLASSPATH =".:$JAVA_HOME/lib:$CLASSPATH"
export PATH = "$JAVA_HOME/:PATH"

Verify installation of Java

Type java --version, if it outputs Java version information, then Java is successfully installed.

2. Install SSH

Install SSH with below command:

sudo apt-get install ssh

Configure SSH to login to local PC without password:

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

Press "Enter", two files will be created in ~/.ssh/ : id_rsa and id_rsa.pub . These two files appear in pair, similar to the key and lock.

Then add the id_rsa.pub to the authorized keys:

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Verify installation of SSH:

type ssh localhost, if it shows login success, then SSH is successfully installed.

3. Switch off firewall

sudo ufw disable

Note: This step is very important, if the firewall is not switched off, then you may encounter cannot find datanode issue.

4. Install Hadoop(Take version 0.20.2 as an example)

Download Hadoop from http://www.apache.org/dyn/closer.cgi/hadoop/core/

Install and configure Hadoop

Single node configuration:

There is no configuration needed for single node Hadoop. In this mode, Hadoop will be considered as a single Java process.

Pseudo-Distributed Mode

Pseudo-Distributed Mode is a cluster with only one node. In this cluster, the local machine is the master as well as the slave, it's the namenode as well as the datanode and it's the jobtracker as well as the tasktracker.

Configuration:

Modify below files in conf directory:

In Hadoop-env.sh:

Add exportJAVA_HOME = ï¼ˆJAVA installation directoryï¼‰

In core-site.xml, modify below contents:

<configuration>
	<!-- global properties -->
	<property>
	    <name>hadoop.tmp.dir</name>
	    <value>/home/zhongping/tmp</value>
	</property>

	<!-- file system properties -->
	<property>
	   <name>fs.default.name</name>
	   <value>hdfs://localhost:9000</value>
	</property>
</configuration>

In hdfs-site.xml, modify below contents:

<configuration>
	<property>
		<name>fs.replication</name>
		<value>1</value>
	</property>
</configuration>

In mapred-site.xml, modify below contents:

<configuration>
	<property>
		<name>mapred.job.tracker</name>
		<value>localhost:9001</value>
	</property>
</configuration>

Format Hadoop file system:

bin/hadoopnamenode -format

Start Hadoop:

bin/start-all.sh

Verify installation of Hadoop. Type below URL in browser. If they can be opened normally, then Hadoop is successfully installed.

http://localhost:50030(mapreduce

http://localhost:50070

5, Run instance

Create two files locally:

echo "Hello World Bye World" > file01
echo "Hello Hadoop Goodbye Hadoop" > file02

Create an input directory in hufs:

hadoop fs -mkdir input

Copy file01 and file02 to hufs:

hadoop fs -copyFromLocal /home/zhongping/file0* input

Run wordcount:

hadoop jar hadoop-0.20.2-examples.jarwordcount input output

Check result:

hadoop fs -cat output/part-r-00000

Source : http://jingshengsun888.blog.51cto.com/1767811/1261385

Build Hadoop environment in Linux

RELATED

1 COMMENT

ABOUT

HOW IT WORKS

FOLLOW US

FEEDBACK

tao [Reply]	@ 2013-08-05 00:25:16
what's hadoop?