Hi Guys, I need to restart discussion around
http://www.nabble.com/Mapper-Out-of-Memory-td14200563.html I saw the same OOM error in my map-reduce job in the map phase.
1. I tried changing mapred.child.java.opts (bumped to 600M)
2. io.sort.mb was kept at 100MB.
I see the same errors still.
I checked with debug the size of "keyValBuffer" in collect(), that is always less than io.sort.mb and is spilled to disk properly.
I tried changing the map.task number to a very high number so that the input is split into smaller chunks. It helps for a while as the map job went a bit far (56% from 5%) but still see the problem.
I tried bumping mapred.child.java.opts to 1000M , still got the same error.
I also tried using the -verbose:gc -Xloggc:/tmp/@taskid@.gc value in opts to get the gc.log but didnt got any log??
I tried using 'jmap -histo pid' to see the heap information, it didnt gave me any meaningful or obvious problem point.
What are the other possible memory hog during mapper phase ?? Is the input file chunk kept fully in memory ??
Application:
My map-reduce job is running with about 2G of input. in the Mapper phase I read each line and output [5-500] (key,value) pair. so the intermediate data should be really blown up. will that be a problem.
The Error file is attached
error.txt