amazon s3 - Spark execution occasionally gets stuck at mapPartitions at Exchange.scala:44 -


i running spark job on 2 node standalone cluster (v 1.0.1). spark execution gets stuck @ task mappartitions @ exchange.scala:44. happens @ final stage of job in call saveastextfile (as expect spark's lazy execution). hard diagnose problem because (1) never experience in local mode local io paths, , job on cluster complete expected correct output (same output local mode). seems possibly related reading s3 (of ~170mb file) prior, see following logging in console:

debug natives3filesystem - getfilestatus returning 'file' key '[path_removed].avro'

info fileinputformat - total input paths process : 1

debug fileinputformat - total # of splits: 3

...

info dagscheduler - submitting 3 missing tasks stage 32 (mappartitionsrdd[96] @ mappartitions @ exchange.scala:44)

debug dagscheduler - new pending tasks: set(shufflemaptask(32, 0), shufflemaptask(32, 1), shufflemaptask(32, 2))

the last logging see before task apparently hangs/gets stuck is: info natives3filesystem: info natives3filesystem: opening key '[path_removed].avro' reading @ position '67108864'

has else experience non-deterministic problems related reading s3 in spark?


Comments

Popular posts from this blog

javascript - how to protect a flash video from refresh? -

android - Associate same looper with different threads -

visual studio 2010 - Connect to informix database windows form application -