bucket distribution for HashMap instances
-
- Posts: 314
- Joined: Thu Jun 02, 2005 8:36 pm
bucket distribution for HashMap instances
When I have identified a specific HashMap object in a heap dump, it would be useful to get a distribution count by bucket.
The scenario is one where I'm suspecting a certain hashCode() implementation to result in a very high collision ratio, and I want to explore before/after snapshots.
I'm aware of the related inspection that is available, but the tool I'm wanting would apply to a specific map object instance.
The scenario is one where I'm suspecting a certain hashCode() implementation to result in a very high collision ratio, and I want to explore before/after snapshots.
I'm aware of the related inspection that is available, but the tool I'm wanting would apply to a specific map object instance.
-
- Posts: 6172
- Joined: Wed Aug 11, 2004 8:37 am
Re: bucket distribution for HashMap instances
Hi Taras,
When a hash map is opened in object explorer, it looks like this:
As far as I understand your suggestion, you want to see how many objects are in each of the java.util.HashMap$Entry[] elements (java.util.HashMap's field "table"), to find the worst one and then inspect the objects it contains with presumably matching hash codes.
If so, let's discuss possible UI.
One (presumably) simple thing I can offer is to show the number of contained elements for each java.util.HashMap$Entry instance as its custom presentation, like "elements = N". Just like in the line "java.util.HashMap size = 16" the part "size = 16" is a container's special presentation.
The results are immediately seen, this features is easily discoverable
Sorting by the Name column may sort the table array elements by the element count, thus biggest buckets would be at the top.
So, it would look like:
I also can suggest possible workaround using existing functionality:
1. Select the hash map of interest in object explorer
2. Press Shift+F4 to open retained objects in a new tab. All Entry of the hash map should be among the opened objects.
3. Select Inspections inside the new tab
4. Run "Objects with biggest distance to nearest GC root". The Entry from the longest bucket will be reported. It's object is one of the bad ones you were looking for. Entries with 2nd, 3rd etc distances are also likely from that bucket.
Best regards,
Anton
When a hash map is opened in object explorer, it looks like this:
Code: Select all
+----------------------------------------------------------------+-----------------+----------------+
| Name | Retained Size | Shallow Size |
+----------------------------------------------------------------+-----------------+----------------+
| +---java.util.HashMap size = 16 | 47,568 | 48 |
| | +---table java.util.HashMap$Entry[32] | 47,520 | 144 |
| | | +---[28] java.util.HashMap$Entry | 44,952 | 32 |
| | | +---[13] java.util.HashMap$Entry | 1,224 | 32 |
| | | +---[1] java.util.HashMap$Entry | 288 | 32 |
etc...
If so, let's discuss possible UI.
One (presumably) simple thing I can offer is to show the number of contained elements for each java.util.HashMap$Entry instance as its custom presentation, like "elements = N". Just like in the line "java.util.HashMap size = 16" the part "size = 16" is a container's special presentation.
The results are immediately seen, this features is easily discoverable
Sorting by the Name column may sort the table array elements by the element count, thus biggest buckets would be at the top.
So, it would look like:
Code: Select all
+----------------------------------------------------------------+-----------------+----------------+
| Name | Retained Size | Shallow Size |
+----------------------------------------------------------------+-----------------+----------------+
| +---java.util.HashMap size = 16 | 47,568 | 48 |
| | +---table java.util.HashMap$Entry[32] elements = 5 | 47,520 | 144 |
| | | +---[28] java.util.HashMap$Entry elements = 2 | 44,952 | 32 |
| | | +---[13] java.util.HashMap$Entry elements = 1 | 1,224 | 32 |
| | | +---[1] java.util.HashMap$Entry elements = 1 | 288 | 32 |
etc...
1. Select the hash map of interest in object explorer
2. Press Shift+F4 to open retained objects in a new tab. All Entry of the hash map should be among the opened objects.
3. Select Inspections inside the new tab
4. Run "Objects with biggest distance to nearest GC root". The Entry from the longest bucket will be reported. It's object is one of the bad ones you were looking for. Entries with 2nd, 3rd etc distances are also likely from that bucket.
Best regards,
Anton
-
- Posts: 314
- Joined: Thu Jun 02, 2005 8:36 pm
Re: bucket distribution for HashMap instances
Hi Anton,
Your understanding of what I'm looking for is correct.
Having an "element count" rendered for each bucket of a hash map seems a good idea.
The "retained size" by itself is not enough, since the size of the contained objects might be quite varying.
Other questions/comments:
1) If the objects contained in the hash map are also retained by data structures outside of the map, what "retained size" will be shown?
2) In your scenario where I can sort the buckets by name (and thus by size), I assume YK will have some special knowlegde to not use a lexicographic sort, but a numerical one using the bucket size?
3) Your suggestion to use "Objects with biggest distance to nearest GC root": I assume this will also depend on the various paths to GC roots for the various objects contained in the map. Reachability paths ouside the context of the hash map might pollute this view, I assume?
Another thing that came to mind was that it might be useful to get a frequency distribution of bucket sizes.
Kind regards,
Taras
Your understanding of what I'm looking for is correct.
Having an "element count" rendered for each bucket of a hash map seems a good idea.
The "retained size" by itself is not enough, since the size of the contained objects might be quite varying.
Other questions/comments:
1) If the objects contained in the hash map are also retained by data structures outside of the map, what "retained size" will be shown?
2) In your scenario where I can sort the buckets by name (and thus by size), I assume YK will have some special knowlegde to not use a lexicographic sort, but a numerical one using the bucket size?
3) Your suggestion to use "Objects with biggest distance to nearest GC root": I assume this will also depend on the various paths to GC roots for the various objects contained in the map. Reachability paths ouside the context of the hash map might pollute this view, I assume?
Another thing that came to mind was that it might be useful to get a frequency distribution of bucket sizes.
Kind regards,
Taras
-
- Posts: 6172
- Joined: Wed Aug 11, 2004 8:37 am
Re: bucket distribution for HashMap instances
Hi Taras,
Best regards,
Anton
Absolutely.The "retained size" by itself is not enough, since the size of the contained objects might be quite varying.
Do you ask about the current functionality? If the mapped objects or keys are also accessible from somewhere else, not only from the map, their size won't be added to the retained size of the map. The retained size of the map constitutes of the sizes of HashMap$Entry and retained sizes of only those objects which are retained from the map.1) If the objects contained in the hash map are also retained by data structures outside of the map, what "retained size" will be shown?
Sure, it should sort the numbers, not their string representation. Furthermore, it seems that the order should be opposite to the current one: while it's good to see names or array indices ascending, the chunk sizes should descend, to see biggest at the top.2) In your scenario where I can sort the buckets by name (and thus by size), I assume YK will have some special knowlegde to not use a lexicographic sort, but a numerical one using the bucket size?
I wouldn't agree with this assumption. Although the map content objects, keys and values, may indeed be referenced from somewhere else thus having path from root lengths not corresponding to their position in the map, the HashMap$Entry themselves are referenced from their map only, thus their depth is appropriate.3) Your suggestion to use "Objects with biggest distance to nearest GC root": I assume this will also depend on the various paths to GC roots for the various objects contained in the map. Reachability paths ouside the context of the hash map might pollute this view, I assume?
Should it be per map or total? Can we make it a part of the hash map inspection?Another thing that came to mind was that it might be useful to get a frequency distribution of bucket sizes.
Best regards,
Anton
-
- Posts: 314
- Joined: Thu Jun 02, 2005 8:36 pm
Re: bucket distribution for HashMap instances
Yes, that would make sense.Anton Katilin wrote:Sure, it should sort the numbers, not their string representation. Furthermore, it seems that the order should be opposite to the current one: while it's good to see names or array indices ascending, the chunk sizes should descend, to see biggest at the top.2) In your scenario where I can sort the buckets by name (and thus by size), I assume YK will have some special knowlegde to not use a lexicographic sort, but a numerical one using the bucket size?
Ah, I see what you mean. The retained size would should have a linear relation with bucket size, assuming all the contents (keys and values) are also retained from outside the mapAnton Katilin wrote:I wouldn't agree with this assumption. Although the map content objects, keys and values, may indeed be referenced from somewhere else thus having path from root lengths not corresponding to their position in the map, the HashMap$Entry themselves are referenced from their map only, thus their depth is appropriate.3) Your suggestion to use "Objects with biggest distance to nearest GC root": I assume this will also depend on the various paths to GC roots for the various objects contained in the map. Reachability paths ouside the context of the hash map might pollute this view, I assume?
It would be calculated for a single map instance only, so I don't think integration with the inspection makes much sense.Anton Katilin wrote:Should it be per map or total? Can we make it a part of the hash map inspection?Another thing that came to mind was that it might be useful to get a frequency distribution of bucket sizes.
An example:
* 10 buckets with size 2000 (e.g. the degenerate hashCode() case).
* 5000 buckets with size 4
* 10000 buckets with size 3
* 20000 buckets with size 2
* 50000 buckets with size 1
* 30000 buckets with size 0
Today I have some code that dumps such statistics through reflection, and I invoke it using the "Evaluate expression" feature of the debugger.
However, if the "entry size" feature you describe above would be available, having such a distribution does not add much value on top.
A few other notes on the topic:
* the feature would be useful for HashSet and ConcurrentHashMap as well
* please also note http://openjdk.java.net/jeps/180, which I believe was implemented for Java 8 (have not tested yet).
Kind regards,
Taras
-
- Posts: 6172
- Joined: Wed Aug 11, 2004 8:37 am
Re: bucket distribution for HashMap instances
Hi Taras,
We've just published build 14092:
http://www.yourkit.com/download
Changes:
http://www.yourkit.com/download/changelist-yjp2014.html
Among the changes:
1. The chunk sizes are shown in object explorer.
2. To get the chunks sorted by size please do the following:
- select a map in objects explorer;
- use new action "Memory | Contained Objects (Shift+F5)" which superseded previously existed "Array Elements";
- choose Entries in the popup;
- in the new tab which has opened switch to object explorer (if not selected by default), and sort by "Name" column;
- you will get chunk first elements sorted by chunk size;
- please note that you can apply the same action to a chunk's first element (which is indicated with "chunk size =") to see elements in this particular chunk only, e.g. to analyze their hash codes.
Best regards,
Anton
We've just published build 14092:
http://www.yourkit.com/download
Changes:
http://www.yourkit.com/download/changelist-yjp2014.html
Among the changes:
1. The chunk sizes are shown in object explorer.
2. To get the chunks sorted by size please do the following:
- select a map in objects explorer;
- use new action "Memory | Contained Objects (Shift+F5)" which superseded previously existed "Array Elements";
- choose Entries in the popup;
- in the new tab which has opened switch to object explorer (if not selected by default), and sort by "Name" column;
- you will get chunk first elements sorted by chunk size;
- please note that you can apply the same action to a chunk's first element (which is indicated with "chunk size =") to see elements in this particular chunk only, e.g. to analyze their hash codes.
Best regards,
Anton
-
- Posts: 314
- Joined: Thu Jun 02, 2005 8:36 pm
Re: bucket distribution for HashMap instances
Hi Anton,
It seems to work as expected, thanks for implementing this so quickly. This will come in quite useful.
I assume ConcurrentHashMap is not supported yet, is that correct?
On a related note, when I invoke "Contained Objects"->"Keys and Values", I would expect to get a table view that combines what I would get for "Contained Objects"->"Keys" and "Contained Objects"->"Values". In other words, a tabular presentation of the map. This seems not to be the case - is that intentional?
Kind regards,
Taras
It seems to work as expected, thanks for implementing this so quickly. This will come in quite useful.
I assume ConcurrentHashMap is not supported yet, is that correct?
On a related note, when I invoke "Contained Objects"->"Keys and Values", I would expect to get a table view that combines what I would get for "Contained Objects"->"Keys" and "Contained Objects"->"Values". In other words, a tabular presentation of the map. This seems not to be the case - is that intentional?
Kind regards,
Taras
-
- Posts: 6172
- Joined: Wed Aug 11, 2004 8:37 am
Re: bucket distribution for HashMap instances
Hi Taras,
As a workaround, please open entries instead. They show the key and the value together.
Best regards,
Anton
Correct. I hope we'll add its support in the next build.I assume ConcurrentHashMap is not supported yet, is that correct?
The tabular presentation is not available yet, and we have to suspend its addition until the next EAP, as it would require serious code changes undesirable in the release branch.On a related note, when I invoke "Contained Objects"->"Keys and Values", I would expect to get a table view that combines what I would get for "Contained Objects"->"Keys" and "Contained Objects"->"Values". In other words, a tabular presentation of the map. This seems not to be the case - is that intentional?
As a workaround, please open entries instead. They show the key and the value together.
Best regards,
Anton
-
- Posts: 6172
- Joined: Wed Aug 11, 2004 8:37 am
Re: bucket distribution for HashMap instances
An update:
We've done the following:
- added ConcurrentHashMap in the "HashMap hash code distribution" inspection;
- added ConcurrentHashMap and TreeMap in the action "Memory | Contained Objects"
Please write the following yjp.jar over <build 14092 directory>/lib/yjp.jar to try the changes before the build 14094 is published:
http://www.yourkit.com/download/tmp/20140818/yjp.jar
We've done the following:
- added ConcurrentHashMap in the "HashMap hash code distribution" inspection;
- added ConcurrentHashMap and TreeMap in the action "Memory | Contained Objects"
Please write the following yjp.jar over <build 14092 directory>/lib/yjp.jar to try the changes before the build 14094 is published:
http://www.yourkit.com/download/tmp/20140818/yjp.jar
-
- Posts: 314
- Joined: Thu Jun 02, 2005 8:36 pm
Re: bucket distribution for HashMap instances
Hi Anton,
Thanks for the improvements, I'll give the jar a try.
On a related note, it would be very nice to have a "size" label for ConcurrentLinkedQueue as well.
Kind regards,
Taras
Thanks for the improvements, I'll give the jar a try.
On a related note, it would be very nice to have a "size" label for ConcurrentLinkedQueue as well.
Kind regards,
Taras
-
- Posts: 6172
- Joined: Wed Aug 11, 2004 8:37 am
Re: bucket distribution for HashMap instances
Hi Taras,
Build 14094 is ready for download.
Best regards,
Anton
Build 14094 is ready for download.
Done.On a related note, it would be very nice to have a "size" label for ConcurrentLinkedQueue as well.
Best regards,
Anton