Fully featured low overhead profiler for Java EE and Java SE platforms.
Ultimate profiling and monitoring solution for Gradle, Maven, Ant, JUnit and TestNG.
Easy to use performance and memory profiler for .NET framework.

Large performance improvement after enabling yourkit. Why?

Moderators: Vladimir Kondratyev, Anton Katilin

Large performance improvement after enabling yourkit. Why?

Postby sanderbos » Tue Mar 01, 2016 4:30 pm

Hello,

I added -agentpath:yjpagent.dll to a test-process I want to profile, which has unexpected performance problems on a certain system.

To my surprise, adding the parameter hugely improved the performance over running the same test-process plainly without a profiler attached. The test-run went from 155 seconds to 45 seconds. I have tested this a number of times in different ways, and the performance improvement consistently appears.

I tried to look with jvisualvm sampling at what the difference between the process with and without the yourkit agent is, but have been unable to figure it out (the performance difference happens in an embedded Cassandra server, and I do not know that codebase well enough (at all) to look at it properly).
I cannot actually use yourkit to inspect the problem, because as described enabling yourkit makes the performance problem go away....

It appears to be in some way related to the specific machine (Win 7 x64, no memory pressure, Java 1.8, yourkit 15084), because the 45 seconds is actually the sort of execution time I was expecting from this hardware. Somehow the yourkit agent makes the process performance more as expected, but I have no idea why.


My question is: Does anybody know of reasons how enabling the yourkit agent could improve performance in a big way?? E.g., does the agent change disk access or network sockets that could make performance go faster (the Cassandra (database) code under test does have local network connections and disk access, although it should be said that the amount of actual data involved is minimal).

Regards,

Sander.
sanderbos
 
Posts: 8
Joined: Tue Mar 01, 2016 4:14 pm

Re: Large performance improvement after enabling yourkit. Wh

Postby Anton Katilin » Tue Mar 01, 2016 4:59 pm

Hi Sander

Looks like some timings changed, implicitly causing different behavior in your tests.

For example, some operations that used to happen in parallel thus spending time in synchronization now happen to work more "sequentially", and synchronization does not occur.

Do you have many threads performing almost empty tasks, very short methods etc.?

1. Could you please try the latest version 2016.02

2. Could you please provide the profiler agent log file ~/.yjp/log/<session name>-<pid>.log

Best regards,
Anton
Anton Katilin
 
Posts: 5574
Joined: Wed Aug 11, 2004 8:37 am

Re: Large performance improvement after enabling yourkit. Wh

Postby Anton Katilin » Tue Mar 01, 2016 7:05 pm

Update: please consider the following changes to check whether they affect the observed behavior:

3. Try playing with biased locking.

(For theory, see for here, for example: http://stackoverflow.com/questions/9439 ... ng-in-java )

Does turning it off with -XX:+UseBiasedLocking (it's on by default since Java 6) affect your test performance with and without the agent?

4. Try running on a different JVM version.
Anton Katilin
 
Posts: 5574
Joined: Wed Aug 11, 2004 8:37 am

Re: Large performance improvement after enabling yourkit. Wh

Postby sanderbos » Wed Mar 02, 2016 10:55 am

Hello Anton,

thank you very much for your reply.

> 1. Could you please try the latest version 2016.02

I would rather not, as I am not sure we have a license for it (big company, I just enter a corporate license key without knowing what versions it is for). Also, the problem is not yourkit, yourkit is working great in that it makes the process much faster.

> 2. Could you please provide the profiler agent log file ~/.yjp/log/<session name>-<pid>.log

I will post one in a comment after this (I anonimized it a little, probably not enough, I took out the classpath)

> 3. Try playing with biased locking

Adding -XX:+UseBiasedLocking made no difference to the performance, it was equally slow without the agentpath and equally fast with the agentpath and this extra option.

> 4. Try running on a different JVM version.

We require Java 8 so my options are limited. I was running with JDK SE 1.8.60, I now also installed the latest JDK SE 1.8.74 and the performance characteristics are identical (equally slow without yourkit agent path, equally fast with).



What I completely forgot to mention yesterday was that the test process hardly takes any CPU, at least during the part of the test I am focusing on. In the test there is always just one thread active really, that is to say it is single threaded for the most part but the thread with activity changes constantly, as within the process there are constant client server calls (both the client and server run in the same process, and communicate over a socket that is also in that same process). So there are in fact many many threads in the process (something of the Embedded Cassandra), but most of the time there is only one active.

What I did mention yesterday was that the process is doing disk IO and network IO (from sockets going from the process to the process, it is both the client and the server). I have now investigated the process more, and more or less ruled out disk IO. There is hardly any disk IO occurring in the part I am investigating. However, I have also seen that the amount of network communication is very very very large, much more than I thought (it is done by Cassandra that we use as black box library). So I have become convinced the issue is in the network communication (there is no disk IO, no real CPU activity, not a lot of multithreadedness, but much much network communication). There are 'only' 170 sockets involved, but procmon (a windows monitoring tool) registered 50.000 TCP send actions (the amount of data is still very small, you can see it as many many small queries).

Somehow, at least as I now look at it, yourkit on this very specific Windows machine (we do not see this behavior on any other machine) appears to make the network communication from this test process go more than 3 times as fast. Where that speed is what we actually expect this process to have with or without yourkit enabled. But I still have no idea why.

I will continue in another post, because there is actually something I intentionally did not tell yesterday...
sanderbos
 
Posts: 8
Joined: Tue Mar 01, 2016 4:14 pm

Re: Large performance improvement after enabling yourkit. Wh

Postby sanderbos » Wed Mar 02, 2016 11:15 am

So there is one thing that I intentionally did not tell yesterday, because I figured that as soon as I would tell it you would immediately not take me serious anymore.

But I really want to know what is going on, so I feel I should mention it because I really want to know what is going on here.

So I did already mention that the application with everything combined is quite complex. You could see it as this

client <--socket--> server <--socket--> Cassandra database.

So we have a client app that communicates with a server through a socket, and then that server component talks to a Cassandra database, with another socket connection. This is important, that between the server and Cassandra there are also socket connections (for completeness, Cassandra then also has many socket connections to talk to itself).

In all descriptions so far in this thread, the client, server, and Cassandra component all run in one process. That is what I specified the yourkit agent on. In this setup there are no external processes whatsoever, all sockets are created from the process but again to the process internally.

I have added timing to the server part of the program, and that is what I am focusing on. I can see there that the performance issue is in the ... <--socket--> Cassandra part

But I can change the test setup to have the client be one process, and the server and Cassandra be another. Ascii art time:
Code: Select all
+---------+                  +-------------------------------+
+ client  +  <--socket-->    + server <--socket--> Cassandra +
+---------+                  +-------------------------------+

So the boxes represent now two processes in this new setup. I mentioned that the performance issue happens completely in box 2.
(this change I could make relatively easily, but I cannot break it up more)

But now the part I am sort of afraid to share: If I now set the -agentpath:yjpagent.dll on the client process only, the code in the server process becomes much faster. It is all running on one machine, but I verified a 1000 times that the 'server process' does not get the agentpath parameter, only the client process. And they only communicate over this socket. And it is not that socket that has slowness, I have pinpointed the issue to the code and sockets that are only in the server process.

Is this possible? I mean I know it is possible because I am seeing this behavior 10 out of 10 times. But I cannot believe it myself.

If you set the yourkit dll as the agent path on a new process, can that affect the already running Java processes on the same machine? (I start the server first) I have brainstormed on how with e.g. jps apparently Java processes can see each other and then perhaps also change each other, but I imagine that the yourkit agent does not do this kind of communication?!
Another thing I could imagine is that if the client->server socket is created in a certain way, it could affect sockets created in that server process after that. I mean conceptually I can imagine that, but I have no idea what that could be in practice...

Can you help me? Do you believe a word I am saying?? :P

Regards,

Sander.
sanderbos
 
Posts: 8
Joined: Tue Mar 01, 2016 4:14 pm

Re: Large performance improvement after enabling yourkit. Wh

Postby sanderbos » Wed Mar 02, 2016 11:20 am

Output from yourkit (so to be clear again, this is then with yourkit enabled, where the performance problem does not occur, this is also with the older JVM and with everything in one process, and I 'redacted' the full classpath and some package names):

15084 0.003: Command line:
C:\programs\Java\jdk1.8.0_60\bin\javaw.exe
-agentpath:c:/Program Files (x86)/YourKit Java Profiler 2015 build 15084/bin/win64/yjpagent.dll
-Djava.io.tmpdir=e:\views\ramdisk\temp
-Dcluster.data.dir=e:\views\ramdisk\somecluster
-Dfile.encoding=Cp1252
-classpath
C:\perforce_ws\Platform\XDB\Database\rel_XDB_11.0\eclipse_build;...redacted...
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner
-version
3
-port
54948
-testLoaderClass
org.eclipse.jdt.internal.junit4.runner.JUnit4TestLoader
-loaderpluginname
org.eclipse.jdt.junit4.runtime
-test
redacted.clustertests.ChangeBindingTest:testScenario1A
15084 0.003: Executable: C:\programs\Java\jdk1.8.0_60\bin\javaw.exe
15084 0.003: Windows 7; version 6.1.7601; 64-bit process
15084 0.004: hid: redacted491964044W2XV91V 64-bit machine
15084 0.004: Today is 20160302
15084 0.004: full_build_stamp: 15084 201512161313
15084 0.004: Agent library path: c:\Program Files (x86)\YourKit Java Profiler 2015 build 15084\bin\win64\yjpagent.dll
15084 0.004: Agent_OnLoad: orig_options=
full_options=
15084 0.004: phase: 1
15084 0.004: jvmtiVersion: 30010203
15084 0.005: vendor: Oracle Corporation
15084 0.005: version: 25.60-b23
15084 0.005: info: mixed mode, sharing
15084 0.005: session_id: 40047731982010 (1456914106,9324)
15084 0.007: line_method: 1
15084 0.011: Reusing existing jar file: C:\Users\boss2\AppData\Local\Temp\yjp201512161313.jar
15084 0.012: JVMTI 30010203; 25.60-b23; Oracle Corporation; mixed mode, sharing
15084 0.090: Successfully registered probe classes: 120
Registered: .AWTEvents$EventQueue_dispatchEvent_Probe
Registered: .AgentLoadingSupport$JBoss
Registered: .AgentLoadingSupport$OSGi1
Registered: .AgentLoadingSupport$OSGi2
Registered: .AgentLoadingSupport$OSGi3
Registered: .Databases$Connection_close_Probe
Registered: .Databases$Connection_createStatement_Probe
Registered: .Databases$Connection_prepareStatement_Probe
Registered: .Databases$DataSource_getConnection_Probe
Registered: .Databases$DataSource_setName_Probe
Registered: .Databases$DriverManager_getConnection_Probe
Registered: .Databases$Driver_connect_Probe
Registered: .Databases$InitialContext_lookup_Probe
Registered: .Databases$PooledConnection_getConnection_Probe
Registered: .Databases$PreparedStatement_addBatch_Probe
Registered: .Databases$PreparedStatement_clearBatch_Probe
Registered: .Databases$PreparedStatement_execute_Probe
Registered: .Databases$Statement_addBatch_Probe
Registered: .Databases$Statement_clearBatch_Probe
Registered: .Databases$Statement_close_Probe
Registered: .Databases$Statement_executeBatch_Probe
Registered: .Databases$Statement_execute_Probe
Registered: .DatabasesLW$Connection_close_Probe
Registered: .DatabasesLW$Connection_createStatement_Probe
Registered: .DatabasesLW$Connection_prepareStatement_Probe
Registered: .DatabasesLW$DataSource_getConnection_Probe
Registered: .DatabasesLW$DriverManager_getConnection_Probe
Registered: .DatabasesLW$Driver_connect_Probe
Registered: .DatabasesLW$PooledConnection_getConnection_Probe
Registered: .DatabasesLW$PreparedStatement_execute_Probe
Registered: .DatabasesLW$Statement_close_Probe
Registered: .DatabasesLW$Statement_execute_Probe1
Registered: .DatabasesLW$Statement_execute_Probe2
Registered: .Files$FileChannel_read_Probe
Registered: .Files$FileChannel_write_Probe
Registered: .Files$FileInputStream_close_Probe
Registered: .Files$FileInputStream_open_Probe
Registered: .Files$FileInputStream_readBytes_Probe
Registered: .Files$FileInputStream_read_Probe
Registered: .Files$FileOutputStream_close_Probe
Registered: .Files$FileOutputStream_open_Probe
Registered: .Files$FileOutputStream_writeBytes_Probe
Registered: .Files$FileOutputStream_write_Probe
Registered: .Files$GetChannel_Probe
Registered: .Files$RandomAccessFile_close_Probe
Registered: .Files$RandomAccessFile_open_Probe
Registered: .Files$RandomAccessFile_readBytes_Probe
Registered: .Files$RandomAccessFile_read_Probe
Registered: .Files$RandomAccessFile_writeBytes_Probe
Registered: .Files$RandomAccessFile_write_Probe
Registered: .FilesLW$FileChannelsTelemetry$FileChannel_read_Probe
Registered: .FilesLW$FileChannelsTelemetry$FileChannel_write_Probe
Registered: .FilesLW$FileChannelsTelemetry$GetChannel_Probe
Registered: .FilesLW$FileStreamsTelemetry$FileInputStream_close_Probe
Registered: .FilesLW$FileStreamsTelemetry$FileInputStream_open_Probe
Registered: .FilesLW$FileStreamsTelemetry$FileInputStream_readBytes_Probe
Registered: .FilesLW$FileStreamsTelemetry$FileInputStream_read_Probe
Registered: .FilesLW$FileStreamsTelemetry$FileOutputStream_close_Probe
Registered: .FilesLW$FileStreamsTelemetry$FileOutputStream_open_Probe
Registered: .FilesLW$FileStreamsTelemetry$FileOutputStream_writeBytes_Probe
Registered: .FilesLW$FileStreamsTelemetry$FileOutputStream_write_Probe
Registered: .FilesLW$RandomAccessFilesTelemetry$RandomAccessFile_close_Probe
Registered: .FilesLW$RandomAccessFilesTelemetry$RandomAccessFile_open_Probe
Registered: .FilesLW$RandomAccessFilesTelemetry$RandomAccessFile_readBytes_Probe
Registered: .FilesLW$RandomAccessFilesTelemetry$RandomAccessFile_read_Probe
Registered: .FilesLW$RandomAccessFilesTelemetry$RandomAccessFile_writeBytes_Probe
Registered: .FilesLW$RandomAccessFilesTelemetry$RandomAccessFile_write_Probe
Registered: .JNDI$DirContext_1
Registered: .JNDI$DirContext_2
Registered: .JNDI$NamingContext_1
Registered: .JNDI$NamingContext_2
Registered: .JUnitTests$Old_Run_Probe
Registered: .JUnitTests$Old_SetUp_Probe
Registered: .JUnitTests$Old_TearDown_Probe
Registered: .JUnitTests$Run_Probe
Registered: .JUnitTests$SetUp_Probe
Registered: .JUnitTests$TearDown_Probe
Registered: .Processes$FileOutputStream_writeBytes_Probe
Registered: .Processes$FileOutputStream_write_Probe
Registered: .Processes$InputStream_readBytes_Probe
Registered: .Processes$InputStream_read_Probe
Registered: .Processes$ProcessImpl_getErrorStream_Probe
Registered: .Processes$ProcessImpl_getInputStream_Probe
Registered: .Processes$ProcessImpl_getOutputStream_Probe
Registered: .Processes$ProcessImpl_start_Probe
Registered: .Processes$ProcessImpl_waitFor_Probe
Registered: .Processes$.
Registered: .Servlets$Filter_doFilter_Probe
Registered: .Servlets$HttpJspPage_jspService_Probe
Registered: .Servlets$Servlet_service_Probe
Registered: .ServletsLW$Filter_doFilter_Probe
Registered: .ServletsLW$HttpJspPage_jspService_Probe
Registered: .ServletsLW$Servlet_service_Probe
Registered: .Sockets$ServerSocketChannel_accept_Probe
Registered: .Sockets$ServerSocket_accept_Probe
Registered: .Sockets$SocketChannel_close_Probe
Registered: .Sockets$SocketChannel_connect_Probe
Registered: .Sockets$SocketChannel_read_Probe
Registered: .Sockets$SocketChannel_write_Probe
Registered: .Sockets$SocketInputStream_read_Probe
Registered: .Sockets$SocketOutputStream_write_Probe
Registered: .Sockets$Socket_close_Probe
Registered: .Sockets$Socket_connect_Probe
Registered: .Sockets$Socket_getInputStream_Probe
Registered: .Sockets$Socket_getOutputStream_Probe
Registered: .SocketsLW$ServerSocketChannel_accept_Probe
Registered: .SocketsLW$ServerSocket_accept_Probe
Registered: .SocketsLW$SocketChannel_close_Probe
Registered: .SocketsLW$SocketChannel_connect_Probe
Registered: .SocketsLW$SocketChannel_read_Probe
Registered: .SocketsLW$SocketChannel_write_Probe
Registered: .SocketsLW$SocketInputStream_read_Probe
Registered: .SocketsLW$SocketOutputStream_write_Probe
Registered: .SocketsLW$Socket_close_Probe
Registered: .SocketsLW$Socket_connect_Probe
Registered: .TestNG$Test_Probe
Registered: .Threads$Thread_constructor_Probe
Registered: .Threads$Thread_run_Probe
Registered: .Threads$Thread_setName_Probe
Registered: .Threads$Thread_start_Probe
15084 0.163: agentStart: 1
15084 0.163: agentStart: 2
15084 0.164: agentStart: OK
15084 0.195: tracing=off: large method: 3945 sun.nio.cs.StandardCharsets$Aliases.init(java.lang.Object[])
15084 0.272: agentInit: 1
15084 0.272: agentInit: Core loaded
15084 0.278: setTriggers:

15084 0.306: setTriggers: to unregister: 0
15084 0.307: setTriggers: method listeners to register: 0
15084 0.307: setTriggers: OK
15084 0.307: ourUnsafe: true
15084 0.307: shmem: Global\YOURKIT SHARE5 G: NULL (1)
15084 0.307: shmem: Local\YOURKIT SHARE5 L: OK, isFirst=1
15084 0.310: shmem: Global\YOURKIT SHARE2 G: NULL (1)
15084 0.310: shmem: Local\YOURKIT SHARE2 L: OK, isFirst=1
15084 0.340: Profiler agent is listening on port 10001
15084 0.341: *** HINT ***: To get profiling results, connect to the application from the profiler UI
15084 0.341: agentInit: Core.init() called
15084 0.341: socket timeout: 30000
15084 0.341: agentInit: OK
15084 0.341: startProfiling: 1
15084 0.341: startProfiling: OK
15084 0.374: SP: pin java.lang.ClassNotFoundException
15084 0.440: SP: pin java.lang.NoSuchFieldException
15084 0.458: SP: pin java.lang..
15084 0.590: [m: 490 62] total classes: processed=1000 instrumented=860 223 ms
15084 0.742: SP: pin java.io.FileNotFoundException
15084 0.742: SP: pin java.security.PrivilegedActionException
15084 0.753: tracing=off: large method: 4108 sun.text.resources.FormatData.getContents()
15084 0.761: tracing=off: large method: 8578 sun.util.resources.CurrencyNames.getContents()
15084 1.198: SP: pin java.lang.IllegalArgumentException
15084 1.575: SP: pin java.lang.NullPointerException
15084 1.660: SP: pin java.lang.NoSuchMethodException
15084 1.699: [m: 511 106] total classes: processed=2000 instrumented=1726 439 ms
15084 1.972: SP: pin org.apache.xerces.impl.XMLEntityScanner$1
15084 1.990: tracing=off: large method: 2370 redacted.statistics.monitors.PageStatisticsMonitor.createStatsAtTransactionStart(java.lang.String)
15084 1.990: tracing=off: large method: 2064 redacted.statistics.monitors.PageStatisticsMonitor.createSystemStats()
15084 2.037: tracing=off: large method: 22393 redacted.kernel.AbstractDataNodeReceiver.handleRequest(redacted.kernel.LineProtocol$RequestType)
15084 2.044: tracing=off: large method: 5365 redacted.kernel.AbstractDataNodeReceiver.$SWITCH_TABLE$redacted$kernel$LineProtocol$RequestType()
15084 2.089: tracing=off: large method: 4690 redacted.kernel.CNReceiver.handleRequest(redacted.kernel.LineProtocol$RequestType)
15084 2.090: tracing=off: large method: 5365 redacted.kernel.CNReceiver.$SWITCH_TABLE$redacted$kernel$LineProtocol$RequestType()
15084 2.655: SP: pin java.lang.UnsatisfiedLinkError
15084 2.722: SP: pin java.io.IOException
15084 2.727: tracing=off: large method: 2785 org.apache.cassandra.config.DatabaseDescriptor.applyConfig(org.apache.cassandra.config.Config)
15084 2.954: tracing=off: large method: 3116 javax.management.MBeanPermission.getMask(java.lang.String)
15084 2.964: SP: pin sun.nio.fs..
15084 2.965: SP: pin java.nio.file.NoSuchFileException
15084 3.028: SP: pin org.hyperic.sigar.SigarException
15084 3.153: [m: 458 175] total classes: processed=3000 instrumented=2661 816 ms
15084 3.217: SP: pin java.lang.SecurityException
15084 3.326: tracing=off: large method: 2371 org.apache.cassandra.cql3.CqlParser.selectStatement()
15084 3.417: warning: cannot instrument method because resulting bytecode is too long (A): org/apache/cassandra/cql3/CqlParser.<clinit>()V
15084 3.417: skipping: bytecode too long: org.apache.cassandra.cql3.CqlParser
15084 4.034: SP: pin javax.management..
15084 5.847: [m: 1314 732] total classes: processed=4000 instrumented=3656 1091 ms
15084 6.311: SP: pin java.lang.InterruptedException
15084 6.338: SP: pin javax.management.NotCompliantMBeanException
15084 7.523: tracing=off: large method: 2252 jdk.internal.org.objectweb.asm.Frame.execute(int, int, jdk.internal.org.objectweb.asm.ClassWriter, jdk.internal.org.objectweb.asm.Item)
15084 7.554: SP: pin java.lang.NoSuchFieldError
15084 7.725: [m: 1812 782] total classes: processed=5000 instrumented=4642 1352 ms
15084 7.735: SP: pin java.lang.IllegalStateException
15084 7.736: SP: pin java.lang.ExceptionInInitializerError
15084 8.263: SP: pin java.net.SocketTimeoutException
15084 8.264: SP: pin org.apache.thrift.transport.TTransportException
15084 8.937: tracing=off: large method: 2560 redacted.cluster.CassandraQueryManager.prepareStatements(com.datastax.driver.core.Session)
15084 9.251: [m: 1880 849] total classes: processed=6000 instrumented=5631 1670 ms
15084 9.297: tracing=off: large method: 5365 redacted.kernel.ClusterReceiver.$SWITCH_TABLE$redacted$kernel$LineProtocol$RequestType()
15084 9.702: tracing=off: large method: 2048 redacted.dom.DomDocumentStore.setSchemaLocationInternalPrivate(java.lang.String)
15084 9.703: tracing=off: large method: 2136 redacted.dom.DomDocumentStore.setSchemaIdInternalPrivate(java.lang.String)
15084 9.703: tracing=off: large method: 2249 redacted.dom.DomDocumentStore.setInputEncodingInternalPrivate(java.lang.String)
15084 9.704: tracing=off: large method: 2268 redacted.dom.DomDocumentStore.setDocumentUriInternalPrivate(java.lang.String)
15084 9.820: tracing=off: large method: 2118 redacted.dom.DomCatalogStore.setDefaultCatalogItemPrivate(java.lang.String)
15084 9.841: tracing=off: large method: 2132 redacted.dom.as.DomSchemaDocumentStore.setSchemaIdInternalPrivate(java.lang.String)
15084 9.842: tracing=off: large method: 2242 redacted.dom.as.DomSchemaDocumentStore.setInputEncodingInternalPrivate(java.lang.String)
15084 9.842: tracing=off: large method: 2328 redacted.dom.as.DomSchemaDocumentStore.setDocumentUriInternalPrivate(java.lang.String)
15084 9.843: tracing=off: large method: 2419 redacted.dom.as.DomSchemaDocumentStore.setSystemIdInternalPrivate(java.lang.String)
15084 9.844: tracing=off: large method: 2482 redacted.dom.as.DomSchemaDocumentStore.setCachedDtdStringPrivate(java.lang.String)
15084 11.417: [m: 1952 920] total classes: processed=7000 instrumented=6570 2043 ms
15084 13.742: tracing=off: large method: 3793 com.sun.crypto.provider.SunJCE$1.run()
15084 13.762: SP: pin javax.crypto.BadPaddingException
15084 13.762: SP: pin java.security.SignatureException
15084 13.816: tracing=off: large method: 7690 com.sun.crypto.provider.DESCrypt.expandKey(byte[])
15084 17.466: [m: 2063 1270] total classes: processed=8000 instrumented=7464 2369 ms
15084 18.090: tracing=off: large method: 2155 redacted.kernel.Backup.restoreSegmentsTransactional(redacted.kernel.LocalSession, redacted.kernel.Backup$RestoreConfig, redacted.kernel.Backup$OnlineRestoreSegmentIdMapper, java.util.List, com.google.common.collect.BiMap, java.util.Map, java.util.Set, java.util.Map, java.util.Map, boolean, redacted.kernel.Backup$ClusterIdInfo)
15084 18.981: SP: pin redacted.error.RedactedException
15084 27.348: SP: pin java.nio.channels.ClosedChannelException
15084 27.348: SP: pin java.io.EOFException
15084 27.497: SP: pin java.net.SocketException
15084 27.506: SP: pin java.nio.channels.AsynchronousCloseException
15084 27.590: f_t_a_d: #75 trn=193 conv=0/1741 ms=1
15084 27.591: f_t_a_d: #76 trn=257 conv=0/1741 ms=1
15084 27.604: f_t_a_d: #102 trn=259 conv=0/1741 ms=1
15084 27.607: f_t_a_d: #107 trn=190 conv=0/1741 ms=1
15084 27.613: f_t_a_d: #119 trn=285 conv=0/1741 ms=1
15084 27.648: SP: gc #0 40 ms=32 sz=49492 p=30 t=115
15084 27.674: f_t_a_d: #181 trn=226 conv=0/1741 ms=1
15084 30.224: SP: pin org.apache.cassandra.exceptions.PreparedQueryNotFoundException
15084 30.263: f_t_a_d: #194 trn=304 conv=0/2360 ms=1
15084 32.371: f_t_a_d: #233 trn=24 conv=0/2369 ms=1
15084 32.428: vmDeath: 1
15084 32.428: vmDeath: 2
15084 32.428: vmDeath: 3
15084 32.428: vmDeath: 4
15084 32.428: vmDeath: OK
sanderbos
 
Posts: 8
Joined: Tue Mar 01, 2016 4:14 pm

Re: Large performance improvement after enabling yourkit. Wh

Postby Anton Katilin » Wed Mar 02, 2016 12:13 pm

Hello Sander

Generally, I still think that using the agent has slightly changed timings of some parts of your program. This affected synchronization, maybe some internal stuff in the JVM or in Windows native socket implementation code - see my thoughts below.

We have once accidentally created an artificial test that was found to run under certain circumstances faster with the agent than without it. That test was creating a huge fixed number of threads with an empty run(). We measured how much time it took for the test to complete, i.e. to let it create N threads and wait them all to finish. And that total time was sometimes bigger without the agent than with it.

We came to the conclusion that adding the agent triggered some events in the JVM associated with thread start and stop.
Even when the event handlers were empty (we did so explicitly), the very fact that JVM issued the events seemed to shift things in time such that the finishing threads came to their end in a more "sequential" way.

Once again, it was an absolutely artificial example with empty threads. Profiling overhead, even the minimal, is nothing in a normal case but may be significant in case of an absolutely "empty" thread. Comparing "almost zero" with "almost zero" is a special case.

Adding some real load in the threads in the mentioned test eliminated the issue.

So, it was not the profiling making the test faster. It was the test such sensitive to any changes that it worked differently even with a minimal change of its environment.

In your particular case, this is not necessarily a threading issue, maybe socket. Anyway, the general idea is that by slightly changing timings of very short calls you can have a significant overall effect.

>> 1. Could you please try the latest version 2016.02
> I would rather not, as I am not sure we have a license for it (big company, I just enter a corporate license key without knowing what versions it is for). Also, the problem is not yourkit, yourkit is working great in that it makes the process much faster.

The agent library itself (yjpagent.dll) is under a BSD license. You can simply download the ZIP, extract yjpagent.dll from it and use it in the -agentpath instead of the one you're currently using.

The reason why I recommend to try it is that we have made several improvements in the idle mode. Perhaps they may affect performance of your example thus helping us to pin point the issue.

Anyway, it's not necessary of course. I think we may investigate with the older agent too.


Could you please try to specify startup options changing the default set of profiling capabilities and activities.

For example, to turn (almost) everything off:

-agentpath:c:/Program Files (x86)/YourKit Java Profiler 2015 build 15084/bin/win64/yjpagent.dll=disableall

In particulal, this will remove socket classes instrumentation, perhaps affecting overall performance.

Another try:

-agentpath:c:/Program Files (x86)/YourKit Java Profiler 2015 build 15084/bin/win64/yjpagent.dll=probe_disable=*

..or even so:

-agentpath:c:/Program Files (x86)/YourKit Java Profiler 2015 build 15084/bin/win64/yjpagent.dll=probe_disable=.Socket*

>> 2. Could you please provide the profiler agent log file ~/.yjp/log/<session name>-<pid>.log
> I will post one in a comment after this (I anonimized it a little, probably not enough, I took out the classpath)

Thank you for the log. We see no issues with it.

>> 3. Try playing with biased locking
> Adding -XX:+UseBiasedLocking made no difference to the performance, it was equally slow without the agentpath and equally fast with the agentpath and this extra option.

If you have only one thread actually doing something at a time, this is expected.

> Is this possible? I mean I know it is possible because I am seeing this behavior 10 out of 10 times. But I cannot believe it myself.

With and without the agent underlying native calls might happen with slightly different latencies, differently triggering native synchronization in the network protocol. This is my theory.

Furthermore, since you observe it 10 out of 100 times instead of 100 of 100, this means that some random factors apply.

> If you set the yourkit dll as the agent path on a new process, can that affect the already running Java processes on the same machine?

No.

The agent can apply to a JVM only if you either specify it in the Java command line with -agentpath,
or if you explicitly (from the profiler UI or a command line tool) attach the agent to a running JVM instance started without the agent.

> I have brainstormed on how with e.g. jps apparently Java processes can see each other and then perhaps also change each other, but I imagine that the yourkit agent does not do this kind of communication?!

YourKit agents do not discover other agents nor JVMs in any way.

> Another thing I could imagine is that if the client->server socket is created in a certain way, it could affect sockets created in that server process after that. I mean conceptually I can imagine that, but I have no idea what that could be in practice...

The agent does not affect the way sockets are created. AFAIK you may do this with some -D stuff, but I believe this is not the case.

Summary:

- please try the suggested agent startup options to see if the effect is still observable

- if possible, please try to change the test itself to let each request require more work, or just add minor Thread.sleep() to emulate heavier calls. It's likely that with such load you will not see the difference with and without the agent anymore.

Best regards,
Anton
Anton Katilin
 
Posts: 5574
Joined: Wed Aug 11, 2004 8:37 am

Re: Large performance improvement after enabling yourkit. Wh

Postby sanderbos » Wed Mar 02, 2016 12:58 pm

Hello Anton,

thank you again for your very elaborate and informative reply.

Anton Katilin wrote:Hello Sander

We have once accidentally created an artificial test that was found to run under certain circumstances faster with the agent than without it. That test was creating a huge fixed number of threads with an empty run(). We measured how much time it took for the test to complete, i.e. to let it create N threads and wait them all to finish. And that total time was sometimes bigger without the agent than with it.


Just to clarify (I understand it does not have to be clarified), the test code in question is not empty, so everything is attempting to perform some database related functionality, and it is in fact in no way a performance test, it is a strictly functional test which happens to have very poor performance on this machine.

Anton Katilin wrote:Once again, it was an absolutely artificial example with empty threads. Profiling overhead, even the minimal, is nothing in a normal case but may be significant in case of an absolutely "empty" thread. Comparing "almost zero" with "almost zero" is a special case.


Right, so as I explained there are many context switches between threads as communication flows through socket client server calls, but that that there is this TCP/IP communication involved in that 'context-switch' hopefully makes clear that we are not talking about very excessive level really testing the JVM threading.

Anton Katilin wrote:>> 1. Could you please try the latest version 2016.02
The agent library itself (yjpagent.dll) is under a BSD license. You can simply download the ZIP, extract yjpagent.dll from it and use it in the -agentpath instead of the one you're currently using.
-agentpath:c:/Program Files (x86)/YourKit Java Profiler 2015 build 15084/bin/win64/yjpagent.dll=disableall
-agentpath:c:/Program Files (x86)/YourKit Java Profiler 2015 build 15084/bin/win64/yjpagent.dll=probe_disable=*
-agentpath:c:/Program Files (x86)/YourKit Java Profiler 2015 build 15084/bin/win64/yjpagent.dll=probe_disable=.Socket*


So I downloaded the zip files and did the experiments with the 2016.02 dll.

Here are some quotes from the logfiles to show I applied the options correctly (so this is from three different logfiles).
16.02-28 0.005: Agent_OnLoad: orig_options=disableall
full_options=disableall
16.02-28 0.004: Agent_OnLoad: orig_options=probe_disable=*
full_options=probe_disable=*
16.02-28 0.004: Agent_OnLoad: orig_options=probe_disable=.Socket*
full_options=probe_disable=.Socket*

In all cases, so all with 2016.02, performance remains great (I see some minor variations but I suppose that is to be expected). And as soon as the agentpath is cleared again performance is bad again.

Anton Katilin wrote:> Is this possible? I mean I know it is possible because I am seeing this behavior 10 out of 10 times. But I cannot believe it myself.
...
Furthermore, since you observe it 10 out of 100 times instead of 100 of 100, this means that some random factors apply.


No, I said 10 out 10, there is no randomness, the performance increase happens every time, and also with two separate processes the expected time matched the actuals every time.

Anton Katilin wrote:> If you set the yourkit dll as the agent path on a new process, can that affect the already running Java processes on the same machine?
No.
YourKit agents do not discover other agents nor JVMs in any way.


Okay, thank you very much, that is very useful information. I just don't know in what way. But I think then I can be sure that the only way the processes communicate is with a socket, so it must be through the establishment of this network connection that the client process gets the server process to go quickly. Somehow.

Anton Katilin wrote:or if you explicitly (from the profiler UI or a command line tool) attach the agent to a running JVM instance started without the agent.


No, I never attached, only started with vm arguments.


Okay, thank you very much for your help. I still don't know what is going on, so maybe I will get back to this at some point, but at the moment I would not know what else to ask you. I will try to investigate further based on the information you gave me, where I guess I have learned:
- Just having the agent dll without any probing is enough to get the desired result.
- Yourkit should not affect processes running on the machine that were not explicitly started with the agent or attached later, so if enabling yourkit on one process makes another process go much faster, it must be communicated in a way not through yourkit.

Thanks again,

Sander.
sanderbos
 
Posts: 8
Joined: Tue Mar 01, 2016 4:14 pm

Re: Large performance improvement after enabling yourkit. Wh

Postby sanderbos » Wed Mar 02, 2016 4:22 pm

I said:
In all cases, so all with 2016.02, performance remains great (I see some minor variations but I suppose that is to be expected). And as soon as the agentpath is cleared again performance is bad again.


I said that adding extra options on yourkit profiler argument did not change the performance. And it is true that the biggest performance difference is on whether the yourkit agent is added at all or not. But now that I have run a larger test I noticed there actually is quite a large difference between the different options (I have run most except the last two of these tests multiple times to see whether it was not a one off time):
Code: Select all
    no yourkit agent:                          1433 seconds
    yourkit agent without additional options    206 seconds
    =disableall                                 318 seconds
    =probe_disable=*                            207 seconds
    =probe_disable=.Socket*                     209 seconds
    =disablealloc                               573 seconds
    =disabletracing                             578 seconds
    =disableexceptiontelemetry                  612 seconds
    =disablestacktelemetry                      788 seconds


So even within yourkit there are still differences depending on the options. And I imagine the differences are odd. For instance I would expect disableall to be faster than without any options specified?

No further response is needed, but I thought I would share a correction on my own post to keep this thread correct.

Regards,

Sander.
sanderbos
 
Posts: 8
Joined: Tue Mar 01, 2016 4:14 pm

Re: Large performance improvement after enabling yourkit. Wh

Postby Anton Katilin » Wed Mar 02, 2016 4:41 pm

> And I imagine the differences are odd. For instance I would expect disableall to be faster than without any options specified?

This is absolutely the same counter-intuitive situation we've seen in our artificial test I've mentioned earlier in this thread.

The more things you disable the lower the profiling overhead of a single operation. This is true. But any change of timings may cause side effects.

With more overhead some things happen later, this changes the moments of time when some (next) events happen, thread events in our case and most likely socket operations in your case. Fortunately (or unfortunately) this may eliminate some lower level synchronization costs as a side effect.

In other words, it looks like the profiler's impact accidentally happened to be an event dispatching improvement by slightly suspending some events.

Of course, such effects can be observed in very special cases only. In a common case less profiling overhead ultimately means higher overall performance.

Just a thought on how to investigate it more: did you try the Windows' Resource monitor? You can observe network activities, throughput, latency. Since we suspect sockets, such measurements may shed some light.

Best regards,
Anton
Anton Katilin
 
Posts: 5574
Joined: Wed Aug 11, 2004 8:37 am


Return to Java Profiler

Who is online

Users browsing this forum: Bing [Bot] and 8 guests