We are trying to capture performance of a Spark streaming application and are facing difficulties setting up the output folder.
The jobs are deployed on a cluster, and actual run folder is only known at execution time.
The 'dir' parameter should be assigned to ${spark.yarn.app.container.log.dir} but looks like Yourkit agent has no way to resolve env variables.
Beside, we would like to avoid to hardcode a local path on the cluster (multi-tenant infrastructure).
What is the proper way to work with Spark on a cluster?
For reference, log4j configuration can accept such parameter:
Code: Select all
cog4j.appender.RollingFile.File=${spark.yarn.app.container.log.dir}/executor-xxxx.log