Spark profiling - env variable resolution?

freds · Post by **freds** » Tue Nov 23, 2021 2:36 pm

Hi
We are trying to capture performance of a Spark streaming application and are facing difficulties setting up the output folder.
The jobs are deployed on a cluster, and actual run folder is only known at execution time.
The 'dir' parameter should be assigned to ${spark.yarn.app.container.log.dir} but looks like Yourkit agent has no way to resolve env variables.
Beside, we would like to avoid to hardcode a local path on the cluster (multi-tenant infrastructure).

What is the proper way to work with Spark on a cluster?

For reference, log4j configuration can accept such parameter:

Code: Select all

cog4j.appender.RollingFile.File=${spark.yarn.app.container.log.dir}/executor-xxxx.log

Post by **Anton Katilin** » Tue Nov 23, 2021 6:31 pm

Hello,

If you set the environment variable YOURKIT_HOME to a non-empty value, the snapshots will be created in $YOURKIT_HOME/Snapshots and logs in $YOURKIT_HOME/.yjp/log.

Does this help?

aeddbali · Post by **aeddbali** » Tue Nov 30, 2021 8:58 am

Hi,

This can't work actually because ${spark.yarn.app.container.log.dir} is set in process environment.
The value set in YOURKIT_HOME won't be correctly resolved because we only know value ${spark.yarn.app.container.log.dir} once container is actually running.

Regards,

Post by **Anton Katilin** » Tue Nov 30, 2021 9:36 am

We'll add a feature request to allow environment variable resolution in such options as "dir" and "logdir" and maybe others. By the way, the option "sessionname" already supports that.

As a possible workaround, maybe there is a way to set the environment variable programmatically a launcher script which starts the JVM.

Another idea is to programmatically create a symlink ~/Snapshots which points to actual directory.

aeddbali · Post by **aeddbali** » Wed Dec 01, 2021 9:17 am

Anton Katilin wrote:We'll add a feature request to allow environment variable resolution in such options as "dir" and "logdir" and maybe others. By the way, the option "sessionname" already supports that.

As a possible workaround, maybe there is a way to set the environment variable programmatically a launcher script which starts the JVM.

Another idea is to programmatically create a symlink ~/Snapshots which points to actual directory.

Do you have a ticket number for follow up ?

Post by **Anton Katilin** » Thu Dec 02, 2021 7:00 am

The tracker is not publicly available. Internally it's #3489.

YourKit Java Profiler and .NET Profiler Forums

Spark profiling - env variable resolution?

Spark profiling - env variable resolution?

Re: Spark profiling - env variable resolution?

Re: Spark profiling - env variable resolution?

Re: Spark profiling - env variable resolution?

Re: Spark profiling - env variable resolution?

Re: Spark profiling - env variable resolution?