amazon s3 - Spark execution occasionally gets stuck at mapPartitions at Exchange.scala:44 -

- January 15, 2015

i running spark job on 2 node standalone cluster (v 1.0.1). spark execution gets stuck @ task mappartitions @ exchange.scala:44. happens @ final stage of job in call saveastextfile (as expect spark's lazy execution). hard diagnose problem because (1) never experience in local mode local io paths, , job on cluster complete expected correct output (same output local mode). seems possibly related reading s3 (of ~170mb file) prior, see following logging in console:

debug natives3filesystem - getfilestatus returning 'file' key '[path_removed].avro'

info fileinputformat - total input paths process : 1

debug fileinputformat - total # of splits: 3

...

info dagscheduler - submitting 3 missing tasks stage 32 (mappartitionsrdd[96] @ mappartitions @ exchange.scala:44)

debug dagscheduler - new pending tasks: set(shufflemaptask(32, 0), shufflemaptask(32, 1), shufflemaptask(32, 2))

the last logging see before task apparently hangs/gets stuck is: info natives3filesystem: info natives3filesystem: opening key '[path_removed].avro' reading @ position '67108864'

has else experience non-deterministic problems related reading s3 in spark?

Search This Blog

Back

amazon s3 - Spark execution occasionally gets stuck at mapPartitions at Exchange.scala:44 -

Comments

Post a Comment

Popular posts from this blog

c# - HttpResponseMessage System.InvalidOperationException -

sql - Postgresql error: "failed to find conversion function from unknown to text" -

how to remove index.php file from url in codeigniter? -