There doesn't appear to be any method within hadoop (0.20.x) that achieves this. Google doesn't appear to give me any answers other than manually moving the blocks to the new disk. Thats a lot of effort. If I can't find one, I'll write a shell script that automatically spreads blocks out equally amongst all available disks.
Stop the datanode process, rebalance within the datanode, restart the datanode process and let it scan for block locations.
Any major flaws to this plan or is there something around already that does this?
Gracias
A.M
**********
Edit
**********
I wrote a small shell script to do this. It works out how much data is on a node, and works out (if spread evenly across each disk) the average utilisation percentage. It then finds and moves the required number of blocks (picks out the block size from hdfs-site.xml) onto the least used disk in a randomly named subdirXX with less than 60 blocks contained within.
Seems to work fine....
**********
Second Edit
**********
Decommission the node within the namenode so that all blocks are copied elsewhere, then just format all data disks. It's the cleanest way of ensuring a single node is balanced.
**********
Edit
**********
I wrote a small shell script to do this. It works out how much data is on a node, and works out (if spread evenly across each disk) the average utilisation percentage. It then finds and moves the required number of blocks (picks out the block size from hdfs-site.xml) onto the least used disk in a randomly named subdirXX with less than 60 blocks contained within.
Seems to work fine....
**********
Second Edit
**********
Decommission the node within the namenode so that all blocks are copied elsewhere, then just format all data disks. It's the cleanest way of ensuring a single node is balanced.
could you opensource your script?
ReplyDelete