Spark profiling - env variable resolution?

Questions about YourKit Java Profiler
Post Reply
freds
Posts: 1
Joined: Tue Nov 23, 2021 2:27 pm

Spark profiling - env variable resolution?

Post by freds »

Hi
We are trying to capture performance of a Spark streaming application and are facing difficulties setting up the output folder.
The jobs are deployed on a cluster, and actual run folder is only known at execution time.
The 'dir' parameter should be assigned to ${spark.yarn.app.container.log.dir} but looks like Yourkit agent has no way to resolve env variables.
Beside, we would like to avoid to hardcode a local path on the cluster (multi-tenant infrastructure).

What is the proper way to work with Spark on a cluster?

For reference, log4j configuration can accept such parameter:

Code: Select all

cog4j.appender.RollingFile.File=${spark.yarn.app.container.log.dir}/executor-xxxx.log
Anton Katilin
Posts: 6172
Joined: Wed Aug 11, 2004 8:37 am

Re: Spark profiling - env variable resolution?

Post by Anton Katilin »

Hello,

If you set the environment variable YOURKIT_HOME to a non-empty value, the snapshots will be created in $YOURKIT_HOME/Snapshots and logs in $YOURKIT_HOME/.yjp/log.

Does this help?
aeddbali
Posts: 2
Joined: Mon Nov 29, 2021 4:14 pm

Re: Spark profiling - env variable resolution?

Post by aeddbali »

Hi,

This can't work actually because ${spark.yarn.app.container.log.dir} is set in process environment.
The value set in YOURKIT_HOME won't be correctly resolved because we only know value ${spark.yarn.app.container.log.dir} once container is actually running.

Regards,
Anton Katilin
Posts: 6172
Joined: Wed Aug 11, 2004 8:37 am

Re: Spark profiling - env variable resolution?

Post by Anton Katilin »

We'll add a feature request to allow environment variable resolution in such options as "dir" and "logdir" and maybe others. By the way, the option "sessionname" already supports that.

As a possible workaround, maybe there is a way to set the environment variable programmatically a launcher script which starts the JVM.

Another idea is to programmatically create a symlink ~/Snapshots which points to actual directory.
aeddbali
Posts: 2
Joined: Mon Nov 29, 2021 4:14 pm

Re: Spark profiling - env variable resolution?

Post by aeddbali »

Anton Katilin wrote:We'll add a feature request to allow environment variable resolution in such options as "dir" and "logdir" and maybe others. By the way, the option "sessionname" already supports that.

As a possible workaround, maybe there is a way to set the environment variable programmatically a launcher script which starts the JVM.

Another idea is to programmatically create a symlink ~/Snapshots which points to actual directory.
Do you have a ticket number for follow up ?
Anton Katilin
Posts: 6172
Joined: Wed Aug 11, 2004 8:37 am

Re: Spark profiling - env variable resolution?

Post by Anton Katilin »

The tracker is not publicly available. Internally it's #3489.
Post Reply