bucket distribution for HashMap instances

plethora · Post by **plethora** » Sat Aug 09, 2014 12:08 pm

When I have identified a specific HashMap object in a heap dump, it would be useful to get a distribution count by bucket.
The scenario is one where I'm suspecting a certain hashCode() implementation to result in a very high collision ratio, and I want to explore before/after snapshots.

I'm aware of the related inspection that is available, but the tool I'm wanting would apply to a specific map object instance.

Post by **Anton Katilin** » Sat Aug 09, 2014 1:02 pm

Hi Taras,

When a hash map is opened in object explorer, it looks like this:

Code: Select all

+----------------------------------------------------------------+-----------------+----------------+
|                              Name                              |  Retained Size  |  Shallow Size  |
+----------------------------------------------------------------+-----------------+----------------+
|  +---java.util.HashMap size = 16                               |         47,568  |            48  |
|  | +---table  java.util.HashMap$Entry[32]                      |         47,520  |           144  |
|  | | +---[28]  java.util.HashMap$Entry                         |         44,952  |            32  |
|  | | +---[13]  java.util.HashMap$Entry                         |          1,224  |            32  |
|  | | +---[1]  java.util.HashMap$Entry                          |            288  |            32  |
etc...

As far as I understand your suggestion, you want to see how many objects are in each of the java.util.HashMap$Entry[] elements (java.util.HashMap's field "table"), to find the worst one and then inspect the objects it contains with presumably matching hash codes.

If so, let's discuss possible UI.

One (presumably) simple thing I can offer is to show the number of contained elements for each java.util.HashMap$Entry instance as its custom presentation, like "elements = N". Just like in the line "java.util.HashMap size = 16" the part "size = 16" is a container's special presentation.

The results are immediately seen, this features is easily discoverable
Sorting by the Name column may sort the table array elements by the element count, thus biggest buckets would be at the top.

So, it would look like:

Code: Select all

+----------------------------------------------------------------+-----------------+----------------+
|                              Name                              |  Retained Size  |  Shallow Size  |
+----------------------------------------------------------------+-----------------+----------------+
|  +---java.util.HashMap size = 16                               |         47,568  |            48  |
|  | +---table  java.util.HashMap$Entry[32] elements = 5         |         47,520  |           144  |
|  | | +---[28]  java.util.HashMap$Entry elements = 2            |         44,952  |            32  |
|  | | +---[13]  java.util.HashMap$Entry elements = 1            |          1,224  |            32  |
|  | | +---[1]  java.util.HashMap$Entry  elements = 1            |            288  |            32  |
etc...

I also can suggest possible workaround using existing functionality:

1. Select the hash map of interest in object explorer
2. Press Shift+F4 to open retained objects in a new tab. All Entry of the hash map should be among the opened objects.
3. Select Inspections inside the new tab
4. Run "Objects with biggest distance to nearest GC root". The Entry from the longest bucket will be reported. It's object is one of the bad ones you were looking for. Entries with 2nd, 3rd etc distances are also likely from that bucket.

Best regards,
Anton

plethora · Post by **plethora** » Sun Aug 10, 2014 3:57 pm

Hi Anton,

Your understanding of what I'm looking for is correct.
Having an "element count" rendered for each bucket of a hash map seems a good idea.
The "retained size" by itself is not enough, since the size of the contained objects might be quite varying.

Other questions/comments:
1) If the objects contained in the hash map are also retained by data structures outside of the map, what "retained size" will be shown?
2) In your scenario where I can sort the buckets by name (and thus by size), I assume YK will have some special knowlegde to not use a lexicographic sort, but a numerical one using the bucket size?
3) Your suggestion to use "Objects with biggest distance to nearest GC root": I assume this will also depend on the various paths to GC roots for the various objects contained in the map. Reachability paths ouside the context of the hash map might pollute this view, I assume?

Another thing that came to mind was that it might be useful to get a frequency distribution of bucket sizes.

Kind regards,
Taras

Post by **Anton Katilin** » Mon Aug 11, 2014 7:17 am

Hi Taras,

The "retained size" by itself is not enough, since the size of the contained objects might be quite varying.

Absolutely.

1) If the objects contained in the hash map are also retained by data structures outside of the map, what "retained size" will be shown?

Do you ask about the current functionality? If the mapped objects or keys are also accessible from somewhere else, not only from the map, their size won't be added to the retained size of the map. The retained size of the map constitutes of the sizes of HashMap$Entry and retained sizes of only those objects which are retained from the map.

2) In your scenario where I can sort the buckets by name (and thus by size), I assume YK will have some special knowlegde to not use a lexicographic sort, but a numerical one using the bucket size?

Sure, it should sort the numbers, not their string representation. Furthermore, it seems that the order should be opposite to the current one: while it's good to see names or array indices ascending, the chunk sizes should descend, to see biggest at the top.

3) Your suggestion to use "Objects with biggest distance to nearest GC root": I assume this will also depend on the various paths to GC roots for the various objects contained in the map. Reachability paths ouside the context of the hash map might pollute this view, I assume?

I wouldn't agree with this assumption. Although the map content objects, keys and values, may indeed be referenced from somewhere else thus having path from root lengths not corresponding to their position in the map, the HashMap$Entry themselves are referenced from their map only, thus their depth is appropriate.

Another thing that came to mind was that it might be useful to get a frequency distribution of bucket sizes.

Should it be per map or total? Can we make it a part of the hash map inspection?

Best regards,
Anton

plethora · Post by **plethora** » Mon Aug 11, 2014 8:40 pm

Anton Katilin wrote:
2) In your scenario where I can sort the buckets by name (and thus by size), I assume YK will have some special knowlegde to not use a lexicographic sort, but a numerical one using the bucket size?
Sure, it should sort the numbers, not their string representation. Furthermore, it seems that the order should be opposite to the current one: while it's good to see names or array indices ascending, the chunk sizes should descend, to see biggest at the top.

Yes, that would make sense.

Anton Katilin wrote:
3) Your suggestion to use "Objects with biggest distance to nearest GC root": I assume this will also depend on the various paths to GC roots for the various objects contained in the map. Reachability paths ouside the context of the hash map might pollute this view, I assume?
I wouldn't agree with this assumption. Although the map content objects, keys and values, may indeed be referenced from somewhere else thus having path from root lengths not corresponding to their position in the map, the HashMap$Entry themselves are referenced from their map only, thus their depth is appropriate.

Ah, I see what you mean. The retained size would should have a linear relation with bucket size, assuming all the contents (keys and values) are also retained from outside the map

Anton Katilin wrote:
Another thing that came to mind was that it might be useful to get a frequency distribution of bucket sizes.
Should it be per map or total? Can we make it a part of the hash map inspection?

It would be calculated for a single map instance only, so I don't think integration with the inspection makes much sense.

An example:
* 10 buckets with size 2000 (e.g. the degenerate hashCode() case).
* 5000 buckets with size 4
* 10000 buckets with size 3
* 20000 buckets with size 2
* 50000 buckets with size 1
* 30000 buckets with size 0

Today I have some code that dumps such statistics through reflection, and I invoke it using the "Evaluate expression" feature of the debugger.
However, if the "entry size" feature you describe above would be available, having such a distribution does not add much value on top.

A few other notes on the topic:
* the feature would be useful for HashSet and ConcurrentHashMap as well
* please also note http://openjdk.java.net/jeps/180, which I believe was implemented for Java 8 (have not tested yet).

Kind regards,
Taras

Post by **Anton Katilin** » Fri Aug 15, 2014 10:39 am

Hi Taras,

We've just published build 14092:
http://www.yourkit.com/download
Changes:
http://www.yourkit.com/download/changelist-yjp2014.html

Among the changes:
1. The chunk sizes are shown in object explorer.
2. To get the chunks sorted by size please do the following:
- select a map in objects explorer;
- use new action "Memory | Contained Objects (Shift+F5)" which superseded previously existed "Array Elements";
- choose Entries in the popup;
- in the new tab which has opened switch to object explorer (if not selected by default), and sort by "Name" column;
- you will get chunk first elements sorted by chunk size;
- please note that you can apply the same action to a chunk's first element (which is indicated with "chunk size =") to see elements in this particular chunk only, e.g. to analyze their hash codes.

Best regards,
Anton

plethora · Post by **plethora** » Fri Aug 15, 2014 8:49 pm

Hi Anton,

It seems to work as expected, thanks for implementing this so quickly. This will come in quite useful.

I assume ConcurrentHashMap is not supported yet, is that correct?

On a related note, when I invoke "Contained Objects"->"Keys and Values", I would expect to get a table view that combines what I would get for "Contained Objects"->"Keys" and "Contained Objects"->"Values". In other words, a tabular presentation of the map. This seems not to be the case - is that intentional?

Kind regards,
Taras

Post by **Anton Katilin** » Mon Aug 18, 2014 8:14 am

Hi Taras,

I assume ConcurrentHashMap is not supported yet, is that correct?

Correct. I hope we'll add its support in the next build.

On a related note, when I invoke "Contained Objects"->"Keys and Values", I would expect to get a table view that combines what I would get for "Contained Objects"->"Keys" and "Contained Objects"->"Values". In other words, a tabular presentation of the map. This seems not to be the case - is that intentional?

The tabular presentation is not available yet, and we have to suspend its addition until the next EAP, as it would require serious code changes undesirable in the release branch.
As a workaround, please open entries instead. They show the key and the value together.

Best regards,
Anton

Post by **Anton Katilin** » Mon Aug 18, 2014 2:48 pm

An update:

We've done the following:
- added ConcurrentHashMap in the "HashMap hash code distribution" inspection;
- added ConcurrentHashMap and TreeMap in the action "Memory | Contained Objects"

Please write the following yjp.jar over <build 14092 directory>/lib/yjp.jar to try the changes before the build 14094 is published:
http://www.yourkit.com/download/tmp/20140818/yjp.jar

plethora · Post by **plethora** » Fri Aug 22, 2014 7:59 pm

Hi Anton,

Thanks for the improvements, I'll give the jar a try.
On a related note, it would be very nice to have a "size" label for ConcurrentLinkedQueue as well.

Kind regards,
Taras

Post by **Anton Katilin** » Tue Aug 26, 2014 4:38 pm

Hi Taras,

Build 14094 is ready for download.

On a related note, it would be very nice to have a "size" label for ConcurrentLinkedQueue as well.

Done.

Best regards,
Anton

YourKit Java Profiler and .NET Profiler Forums

bucket distribution for HashMap instances

bucket distribution for HashMap instances

Re: bucket distribution for HashMap instances

Re: bucket distribution for HashMap instances

Re: bucket distribution for HashMap instances

Re: bucket distribution for HashMap instances

Re: bucket distribution for HashMap instances

Re: bucket distribution for HashMap instances

Re: bucket distribution for HashMap instances

Re: bucket distribution for HashMap instances

Re: bucket distribution for HashMap instances

Re: bucket distribution for HashMap instances