amazon s3 - Spark execution occasionally gets stuck at mapPartitions at Exchange.scala:44 -
i running spark job on 2 node standalone cluster (v 1.0.1). spark execution gets stuck @ task mappartitions @ exchange.scala:44. happens @ final stage of job in call saveastextfile (as expect spark's lazy execution). hard diagnose problem because (1) never experience in local mode local io paths, , job on cluster complete expected correct output (same output local mode). seems possibly related reading s3 (of ~170mb file) prior, see following logging in console:
debug natives3filesystem - getfilestatus returning 'file' key '[path_removed].avro'
info fileinputformat - total input paths process : 1
debug fileinputformat - total # of splits: 3
...
info dagscheduler - submitting 3 missing tasks stage 32 (mappartitionsrdd[96] @ mappartitions @ exchange.scala:44)
debug dagscheduler - new pending tasks: set(shufflemaptask(32, 0), shufflemaptask(32, 1), shufflemaptask(32, 2))
the last logging see before task apparently hangs/gets stuck is: info natives3filesystem: info natives3filesystem: opening key '[path_removed].avro' reading @ position '67108864'
has else experience non-deterministic problems related reading s3 in spark?
Comments
Post a Comment