Saturday 25 February 2012

Success at last...hadoop-1.0.0 passes all tests


Re ran the tests last night and only one (new) test failed :

[junit] Test org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl FAILED

So I assume the GangliaMetrics test was fixed by the hosts file too. 

I found MAPREDUCE-3894 which seems to explain the failure and it suggests its intermittent. I've tested this by rerunning just that test and low and behold it passed :

# ant test -Dtestcase=TestMetricsSystemImpl
# BUILD SUCCESSFUL
# Total time: 24 seconds

Finally, a clean test run (even with my patch attached). Now I just need someone to look at / approve my patch:

https://issues.apache.org/jira/browse/MAPREDUCE-3807

Any clue as to how I draw someones attention to this?

Friday 24 February 2012

The quest continues

The next run finished with two failure. Getting closer:


HADOOP-7949 [junit] Test org.apache.hadoop.ipc.TestSaslRPC FAILED
[junit] Test org.apache.hadoop.metrics2.impl.TestGangliaMetrics FAILED

On further inspection, the TestSaslRPC failure was tracked down to HADOOP-7949 and the need for localhost to appear before localhost.localdomain in the /etc/hosts file.

Wow, just wow.

It shouldn't be this difficult to set up a build environment. I've resorted to going to the hadoop dev mailing lists to see if anyone can help with the last one.

Fingers crossed!!!


Thursday 23 February 2012

And so the real testing begins.....

First run through of the ant tests with ant 1.7.1 and the following tests failed (along with JIRA's that help describe the failure):


HADOOP-7949 [junit] Test org.apache.hadoop.ipc.TestSaslRPC FAILED
MAPREDUCE-3357 [junit] Test org.apache.hadoop.filecache.TestMRWithDistributedCache FAILED
MAPREDUCE-2073 [junit] Test org.apache.hadoop.filecache.TestTrackerDistributedCacheManager FAILED
HBASE-3285 [junit] Test org.apache.hadoop.hdfs.TestFileAppend4 FAILED
MAPREDUCE-3594 [junit] Test org.apache.hadoop.streaming.TestUlimit FAILED
Too many open file [junit] Test org.apache.hadoop.mapred.TestCapacityScheduler FAILED

Increased the number of open files to fix the last test, chmod'd my home dir to resolve MAPREDUCE-2073 and run again.....

Wednesday 22 February 2012

Abject failure continued 2

After trawling mailing lists again, I finally stumbled across this : 

http://grokbase.com/p/hadoop/common-dev/1188pq2p5h/vote-release-0-20-204-0-rc0

The useful part:

I hit this one too. If you look at that test case, you'll see it has an@Ignore on it. For some unknown reason, when you use ant 1.8.2 junit doesthe wrong thing. Use ant 1.7.1 and the test cases will be properly ignored.
-- Owen

To coin Bobby Boucher....."that information would have been useful yesterday!!!!!"

Finally stumbled around installing ant 1.7.1 and re-running ant test.

A few more hours to go....

Thursday 16 February 2012

Abject failure continued......

Built my centos vm, checked out source, ensured all dependencies are met, ran ant test and it failed with the same tests as before. Time to go to the mailing lists to see what I'm doing wrong....

Monday 13 February 2012

Abject failure.....

Well that didn't go to plan. Failure once again. A few tests failed. I can read the log output within:

build/test/TEST-<test-name>

But I dont see any kind of report that might alude to which of the numerous tests failed. Scrolling back through hundreds of lines of text doesn't seem productive.

Think i need to do some more research (and build a dedicated hadoop build machine in a VM on my laptop). A plain rhel5u4 image built from scratch. Lets document this shit outta this process and see where I'm going wrong....

Interrupted slightly tonight by purchasing a new NAS drive (A Synology DS411). Something to play with whilst pulling my hair out over my inability to submit a patch that took me about 5 minutes to actually create.

So far it's taken about 10 hours of effort to get down to a handful of unit tests.

To be continued.....

Contributing to apache-hadoop (hadoop-1.0.0)

I decided it was time to contribute to the apache hadoop project. I haven't touched java in a while so thought i'd start with something easy. I found an unassigned JIRA, fixed the issue (using vim as it just worked) and then started to look at how i actually test / submit my patch. There's some information on how to contribute from the following links:

http://wiki.apache.org/hadoop/HowToContribute
http://wiki.apache.org/hadoop/QwertyManiac/BuildingHadoopTrunk

I had a fedora 16 box and installed the necessary software to develop my hadoop patch. I checked out the source using svn :


# svn checkout http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1 hadoop

Then checked my environment by running ant:

# cd hadoop
# ant

This took a while, but built cleanly. Cool. I thought my environment must be good. Think again....

Following the instructions on the HowToContribute page, I created a patch. I then needed to test the patch against the existing tests. Easy....

# ant test

I left it running overnight as it can take up to 3 hours to complete (as suggested on the mailing lists). I check the next morning and about 25% of the tests failed. I read through some of the logs and it looks like permission errors. I'm quite competent in this area so dug a little and the JUnit tests expect specific file permissions. After some trial and error, the umask of 0022 seemed to work just fine:

# umask 0022

Re-run the tests and wait a few hours.

It complete again with a lot of errors. Less than before but not zero so something else must be wrong. I look through these logs and notice the regular expressions expect the hostname to be localhost. That's clearly not the case. I *correct* this and am currently re-running the tests. We shall see how it goes in the morning......

If and when the tests complete OK, I then need to read up on any pre-requisites to submitting patches. As this is only a small fix (and a similar one didnt require a new test), I'm not planning to (learn how to) write a JUnit test. Other than that I'll just submit the patch and hope the community doesn't bite