- Running the profiler
- Profiler activation
- Running applications with the profiler
- Connect to profiled application
- Troubleshoot connection problems
- Solving performance problems
- CPU profiling
- Deadlock detector
- Memory profiling
- Memory telemetry
- Memory snapshot
- Object allocation recording
- Shallow and retained sizes
- Memory views
- Memory inspections
- Comparing memory snapshots
- Support of HPROF format snapshots
- Support of Portable Heap Dumps (.phd)
- Values of primitive types
- Persistent object IDs
- Useful actions
- Set description language
- Garbage collection
- Monitor profiling
- Exception profiling
- Probes: monitor events of various kinds
- Performance Charts
- Inspections: automatic recognition of typical problems
- Automatically trigger actions on event
- Summary, snapshot annotation, automatic deobfuscation
- IDE integration
- Time measurement (CPU time, wall time)
- Snapshot directory customization
- Export of profiling results to HTML, CSV, XML, plain text
- Profiler API
- Command line tool to control profiling
- FAQ: How to profile in my scenario?
Object allocation recording
YourKit Java Profiler can optionally record object allocations, that is, track method calls where objects are created.
To start allocation recording, connect to the profiled application and use corresponding toolbar button:
Object allocation recording modes
Recording of allocations adds performance overhead. This is the reason why allocations should not be recorded permanently. Instead, it is recommended to record allocations only when you really need them.
Also, you may choose from two available recording modes to balance between result accuracy and fullness and profiling overhead.
Record thread and stack where objects are allocated (traditional recording)
This mode provides most detail: full stack and thread where particular object is created is obtained and remembered for each recorded object. Allocation information for particular objects in available in a memory snapshot. Also, this mode enables comprehensive analysis of excessive garbage allocation.
Memory snapshots captured when allocations are being recorded, or after object allocation recording has been stopped, contain allocation information.
If an object is created when allocations are not being recorded, or recording is restarted after the object has been created, or allocation results are explicitly cleared after the object has been created, snapshot will contain no allocation information for that object.
In order to keep moderate overhead, it is reasonable to skip allocation events for some percent of objects. This approach is useful to find the excessive garbage collection.
Also, you can record allocations for each object with size bigger than certain threshold. It is valuable to know where the biggest objects are allocated. Normally there are not so many such big objects, thus recording their allocation should not add any significant overhead.
In some rare cases you can record each created object e.g. when allocation information for some particular object must be obtained. To achieve this, set "Record each" to 1.
Count allocated objects (quick recording)
This is the most lightweight allocation recording mode.
Use this mode to quickly get insight on how many objects are created and of which classes. In particular, this identifies excessive garbage allocation problems (lots of temporary objects).
Object counting is specially designed to have minimal possible, almost zero overhead:
- Object counting provides allocated object counts by class then by immediate allocator method with the exact line number, if available. Unlike the traditional recording mode, it does not provide stack traces and does not track particular instances, i.e. no allocation information for particular live object(s) is available in a memory snapshot.
- Objects allocated in different threads are summed and cannot be distinguished.
Use object counting to initially detect possible problems: thanks to its low overhead you may do this even in production.
Further investigation may involve using traditional allocation recording to get comprehensive profiling results with stack traces (call tree).
Start and stop recording
You can start and stop recording of allocations during execution of your application as many times as you wish. When allocations are not recorded, memory profiling adds no performance overhead to the application being profiled.
You can control recording of allocations from the profiler UI as described above, or via Profiler API.
You can record allocations from the start of application execution (see Running applications with the profiler) by using the agent startup options alloceach, allocsizelimit, allocsampled, alloc_object_counting.
Object allocation recording results in the user interface
The profiler offers two implementations of object allocation recording: bytecode instrumentation-based and heap sampling-based. They provide same results but may differ in profiling overhead and footprint.
Bytecode instrumentation is available in all supported Java versions. The profiler applies it by default when heap sampling (see below) is not available, i.e. if it's Java 10 or older. To use bytecode instrumentation on Java 11 or newer please specify the agent startup option disable_heap_sampling.
Bytecode instrumentation imposes almost no overhead while allocation recording is not running.
If you apply the startup options
disableallto totally eliminate the overhead, allocation recording will not be possible.
Heap sampling is a new Java profiling capability introduced in Java 11 via JEP 331. It offers a new JVMTI event to record allocated objects without using bytecode instrumentation.
Not instrumenting bytecode for object allocation recording reduces class load time and size of the resulting bytecode.
Also, this new approach is extremely useful in the attach mode, because it eliminates the pause on a first attempt to start object allocation recording, which is associated with instrumenting classes loaded before the agent has attached.
By default, the heap sampling event is used instead of bytecode instrumentation whenever available, that is, on Java 11 and newer. To use bytecode instrumentation instead, please specify the agent startup option disable_heap_sampling.
Sampled object allocation recording option (advanced topic)
The greatest contribution to object allocation recording overhead in the traditional mode is made by obtaining exact stack trace for each recorded new object.
The idea is to estimate stack trace instead in order to reduce the overhead. It is similar to how CPU sampling works. The sampling thread periodically obtains stacks of running threads. When a thread creates a new object, the stack recently remembered for that thread is used as the object allocation stack estimation.
And just like in case of CPU sampling, the sampled object allocation recording results are relevant only for methods longer than the sampling period.
By default synchronous sampler is used to estimate stack traces if no CPU sampling is in progress. When you change CPU sampling mode, it also affects the sampled object allocation recording.
When should be used
Use this mode to get allocation hot spots, to find top methods responsible for most allocations. Don't use it if you need precise results for all recorded objects.
How to enable
Exact stack traces are gathered by default when you start object allocation recording. To use sampled stacks instead:
in the profiler UI: choose "Estimated (sampled) stacks instead of exact stacks" in the "Start Object Allocation Recording" toolbar popup window
- if object allocation recording is started with alloceach or allocsizelimit startup options, specify startup option allocsampled
use API method
boolean sampledAllocationRecording. The old version of the method without that parameter records exact stacks.