Build Hadoop environment in Linux
Hadoop standalone installation:
1. Install JDK
Install JDK with below command:
sudo apt-get install sun-java6-jdk
Configure Java environment, open /etc/profile, add below contents:
export JAVA_HOME = （Java installation directory） export CLASSPATH =".:$JAVA_HOME/lib:$CLASSPATH" export PATH = "$JAVA_HOME/:PATH"
Verify installation of Java
Type java --version, if it outputs Java version information, then Java is successfully installed.
2. Install SSH
Install SSH with below command:
sudo apt-get install ssh
Configure SSH to login to local PC without password:
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Press "Enter", two files will be created in ~/.ssh/ : id_rsa and id_rsa.pub . These two files appear in pair, similar to the key and lock.
Then add the id_rsa.pub to the authorized keys:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Verify installation of SSH:
type ssh localhost, if it shows login success, then SSH is successfully installed.
3. Switch off firewall
sudo ufw disable
Note: This step is very important, if the firewall is not switched off, then you may encounter cannot find datanode issue.
4. Install Hadoop(Take version 0.20.2 as an example)
Download Hadoop from http://www.apache.org/dyn/closer.cgi/hadoop/core/
Install and configure Hadoop
Single node configuration:
There is no configuration needed for single node Hadoop. In this mode, Hadoop will be considered as a single Java process.
Pseudo-Distributed Mode is a cluster with only one node. In this cluster, the local machine is the master as well as the slave, it's the namenode as well as the datanode and it's the jobtracker as well as the tasktracker.
Modify below files in conf directory:
Add exportJAVA_HOME = （JAVA installation directory）
In core-site.xml, modify below contents:
<configuration> <!-- global properties --> <property> <name>hadoop.tmp.dir</name> <value>/home/zhongping/tmp</value> </property> <!-- file system properties --> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>
In hdfs-site.xml, modify below contents:
<configuration> <property> <name>fs.replication</name> <value>1</value> </property> </configuration>
In mapred-site.xml, modify below contents:
<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration>
Format Hadoop file system:
Verify installation of Hadoop. Type below URL in browser. If they can be opened normally, then Hadoop is successfully installed.
5, Run instance
Create two files locally:
echo "Hello World Bye World" > file01 echo "Hello Hadoop Goodbye Hadoop" > file02
Create an input directory in hufs:
hadoop fs -mkdir input
Copy file01 and file02 to hufs:
hadoop fs -copyFromLocal /home/zhongping/file0* input
hadoop jar hadoop-0.20.2-examples.jarwordcount input output
hadoop fs -cat output/part-r-00000
Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it... From Web