1. OOZIE job failed: Error message
: ERROR is considered as FAILED for SLA
Cause 1 : Not able to find hadoop namenode (master), jobtracker machine.
Suppose you are running oozie, hadoop-master and job tracker on one machine and datanode, tasktracker are running on another machine.
Your job.properties file contains following lines:
In above case, FS action will work fine because no map-reduce opertion is perform in FS action case. But, if you run map-reduce action then tasktracker will look hadoop-master on localhost machine becuase we have used localhost:9000 in job.properties file.
Used IP of hadoop-namenode and jobtracker machine in job.properties file instead of localhost. Cause 2 :
Oozie not able to find Mysql server.
Suppose I am using mysql as a metastore for hive.
Hive hive-default.xml file have following lines :
<description>JDBC connect string for a JDBC metastore</description>
</property> Solution :
Use IP of mysql machine instead of localhost.
2. Zookeeper server not running: Error message:
Could not find my address: zk-serevr1 in list of ZooKeeper quorum servers Causes :
HBase tries to start a ZK server on some machine but that machine isn’t able to find itself in the hbase.zookeeper.quorum configuration. This is a name lookup problem. Solution:
Use the hostname presented in the error message instead of the value you used (zk-server1). If you have a DNS server, you can set hbase.zookeeper.dns.interface and hbase.zookeeper.dns.nameserver in hbase-site.xml to make sure it resolves to the correct FQDN.
3. Hadoop-datanode job failed or datanode not running
: java.io.IOException: File ../mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 Cause 1:
Make sure atleast one datanode is running.Cause 2:
namespaceID of master and slaves machines are not same.
If you see the error java.io.IOException: Incompatible namespaceIDs in the logs of a datanode , chances are you are affected by bug HADOOP-1212 (well, I’ve been affected by it at least). Solution :
If namespaceID of master and slaves machines are not same. Than replace the namespaceID of slaves machine with master namespaceID.
- dfs/name/current/VERSION file contains the namespaceID of master machine
- dfs/data/current/VERSION file contains the namespaceID of master machine Cause 3:
Datanode instance running out of space.Solution :
Free some space.Cause 4
: You may also get this message due to permissions. May be JobTracker can not create jobtracker.info on startup.
4. Sqoop export command failed:Error message:
attempt_201101151840_1006_m_000001_0, Status : FAILED
at impressions_by_zip.parse(impressions_by_zip.java:108) Cause :
Given field separator is not validSolution :
Specify correct field delimeter in sqoop export command.
5. HBase regionserver not running :Error message:
2012-01-02 13:48:49,973 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: Master rejected startup because clock is out of sync
org.apache.hadoop.hbase.ClockOutOfSyncException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server hadoop-datanode2,60020,1325492317440 has been rejected; Reported time is too far out of sync with master. Time difference of 206141ms > max allowed of 30000msSolution:
Clock of regionservers are not sync with master machine. Synchronized the clock of hbase master and regionserver machines.