Thursday, February 9, 2012

Finding "Number of Under-Replicated Blocks" in Hadoop

This one was bugging me for a long time. Even with the cluster idle, the Name Node summary would tell me there were a number of Under-Replicated Blocks in the system.

Turns out that all the Name Node problems we've been having were leaving 'temporary' files in HDFS and for whatever reason when we restarted the Name Node it wouldn't fix them.

I found them under: /log/hadoop/tmp/mapred/staging/<>/.staging/job_*

After confirming that the users weren't running active jobs, removing these directories via the command line reduced the number of blocks in the report and eventually all were cleared.

FYI our Name Node problems APPEAR to have been resolved in Cloudera CDH3 u3. Name Node has been up for 3 days now. Previously we were lucky if it lasted 48 hours.