Project to improve hs_err files
mattis.castegren at oracle.com
Mon Feb 17 09:25:23 PST 2014
Thanks for the comments on this thread lately. I have added all comments to my tracking page, and I have filed bugs for the new suggestions.
Now that JDK8 is all but done, I would like to get this project moving again. However, before I ask someone in my team to start working on these bugs, I would like to make one last round on the mailing list to see if anyone have a strong opinion against any of the feature requests (all labeled with hs_err_improvements):
I expect there to be some discussions about the robustness of the implementation, but that can be handled in the code reviews. What I want to know is if anyone have any larger objections that we should sort out before we even start implementation.
The biggest change is https://bugs.openjdk.java.net/browse/JDK-8026324 - Add summary section to hs_err file
Overall, the feedback I have got on this feature has been positive, but the last time I asked everyone was busy with JKD8 Zero Bug Bounce, so I thought it best to ask one last time.
We plan to start working on this sometime next week
PS: Still gathering suggestions, so send them if you have them.
From: John Rose
Sent: den 12 februari 2014 20:15
To: Mattis Castegren
Cc: hotspot-dev at openjdk.java.net
Subject: Re: Project to improve hs_err files
The hs_err file has grown to include lots of handy information. I agree that it would be reasonable to add more, and I'm really glad that you are thinking about it in this level of detail. This is especially good as you are an experienced consumer of these files.
The typical size of such a file is currently about 40kb. As long as the most useful information is kept near the top, there is (IMO) room for this file to grow 2x or more in typical size.
Some of the configuration information you mention may be present at the top of the hotspot.log file, before the big <tty> element. It might be fruitful to ensure that such preamble information is always captured at startup, and dumped into the log file.
I don't think dump-time disassembly is practical, since we don't have an engine bundled in the JVM, but we should make it possible with post-processing to get a good disassembly from hex dumps in the error dump file. This has been done before; perhaps it needs reviving or refinement.
Here's another thought, along the lines of symbolic disassembly, but for data rather than code:
One thing I would like to see more of is memory contents, along with a way to interpret their meaning. The memory blocks around current PC and SP is supplied. It might be worth while dumping additional memory blocks one or two indirections away from the (apparent) pointers in those initial memory blocks. I often wonder, "is that the object I care about?" when looking at those memory dumps. I am guessing that there is a cheap, robust way to put more clues into the dump, without getting entangled in object parsing (which as David points out could cause further crashing). Perhaps there is a way to classify data words in a post-processing tool, like we can pull out disassembled code. At least, we can observe whether an apparent point refers into a live part of the heap (assuming we have the right few words of heap boundary info).
We could also (maybe) identify Klass pointers in the headers of objects and output a little bit of data in the crash log to make it possible to identify the (apparent) classes of (apparent) object pointers in the regions dumped. At least the values of well-known classes (in SystemDictionary::_something), if they occur as the (apparent) classes of hex dump addresses, could be supplied as an extra hint. Clearly this could scale beyond the reasonable size of a crash dump, so some sort of size limit would need to be applied. (The size limit could be set to zero, or the log file section removed, if customers are nervous about memory dumps.) I think there is scope for tasteful engineering here, especially if we push fancy formatting work into a post-pass tool.
Perhaps there is a way to join hands with the SA (serviceability agent) infrastructure, and run a tiny SA instance out of a relatively limited supply of hex dump from the crash file, instead of out of the full picture supplied by the core file or a live process. It's at least an interesting thought experiment.
Please keep up this good work!
On Sep 9, 2013, at 10:38 AM, Mattis Castegren <mattis.castegren at oracle.com> wrote:
> Hi. I sent this email to serviceability and runtime, but I got a request to forward the mail to all of hotspot dev as hs_err files affects all areas. Please let me know if you have any feedback. Don't worry about if the suggestions are feasible or not, that will come in a second step.
More information about the hotspot-dev