A couple of days ago I noticed some unreliable output when stepping through the SnailLife server. I use GoLand and Delve debugger, but in the course of looking into this I picked up some tips from trial and error plus from some very helpful people which I think might be useful regardless of the debugger you use.
First, the repro.
- Fedora 28
- Go version 1.11 rc1
- Delve version 1.0.0, build 22af383
- GoLand version 2018.2.1
I started a client test in debug mode stopped at a breakpoint. The test spins up my server binary. I attached a debugger to the server and started stepping through the client breakpoints. Eventually my server breakpoint is hit and I step through the server from there.
Instead of seeing accurate variable information from the debugger, I was getting dodgy output. Variables look fine on one line and then turn
nil the next. Structs which were not empty turn empty. Strings which were not empty turn empty as you step through. Check out the
msg variable here - I get it using
fmt.Sprintf("%v", jarReq). The
msg variable gets an accurate looking value even though
jarReq is displayed as
Then, stepping one more line forward,
msg turns into an empty string:
Also note that if I step through to and past a line that runs
jar := jarReq.Jar I do not get a nil pointer dereference panic and jar looks like it should, so
jarReq is obviously not
One step forward and
jar is now a zero value struct:
This had the signs of debugging an optimized binary all over it, but I thought I built with compiler optimizations and inlining disabled (
-N -l in
I originally posted about this on a Go forum hoping a fresh set of eyes might spot something I’m doing wrong, but no one had any obvious insights there and the suggestion was to file a Delve bug. I was kind of reluctant to file a Delve bug because I was almost confident that this was not a Delve issue. After digging around some more I resigned myself to filing a Delve bug.
As suspected, it was not a Delve bug. I never did get to the root of exactly what caused this as it required some random crap like a machine reboot and a Go update to 1.11 release to fix along with the stuff that was obviously wrong (which I go into below). So there was definitely some local weirdness going on (and no, it is extremely unlikely that this was some unreported Go issue related to rc1 which required an upgrade). Nevertheless, I learned some things that I was doing wrong and I also learned some tips to better troubleshoot this kind of issue going forward.
-gcflags=‘all=-N -l’, NOT -gcflags ‘-N -l’ (if you are on Go >= 1.10)
This was my biggest mistake. I thought I was running
go build (and alternatively
go install) with compiler optimizations disabled by running
go build -gcflags '-N -l'. This is wrong. This command only disabled optimizations and inlining for the main package. To disable these for the entire program, you need to run
go build -gcflags='all=-N -l'.
Update: A reader emailed and pointed out to me that this behaviour was implemented in Go 1.10 and the above advice is only relevant if you are on 1.10 or newer. If you are on older versions of Go, this does not apply. You can read more about the change here. I dug up the original discussion about this also, which can be found here
Remember to pass the compiler flags to
Luckily I did not make this error this time, but I did have a scare for a minute thinking I might have! When I started learning Go I had a misconception about
go build and
go install. I would always run
go build && go install, thinking build will build and output the binary to my current location and install will basically just move it into my
$GOPATH. This is not what happens.
go install also builds, it just happens to output the binary into
$GOPATH. So if one were to run
go build with all the necessary flags to disable optimizations and then just run
go install right after that, they’d be left with another - optimized - version of the binary. If you use
go install, you need to also give it the same compiler flags that you would if you were using
objdump –dwarf your-exe | grep DW_AT_producer
I was using binary size to roughly try and see if my binary was built with optimizations disabled or not. After I went through the magical machine-reboot-and-Go-update steps this did indeed work - the binary was now a couple of MB larger. But that’s not a very reliable test, really. Alessandro Arzilli of Delve pointed out this command, which I could use to confirm if my binary was built with the aforementioned
-N -l flags or not. Running
objdump --dwarf your-exe | grep DW_AT_producer should produce output in which every line is followed by
-N -l, if the compilation had the expected result.
Of course just parroting back a command given to you isn’t where we want to be. Instead of using the command totally blindly it would be nice to know the general gist of what it actually does:
objdump displays information about specified object files.
--dwarf displays contents of the file’s debug sections. DWARF (Heh, heh, get it? To go with ELF? :D) is a debugging data format which stands for “Debugging With Attributed Record Format” (although the name was apparently made up after the acronym!) It explained in more detail here and here, I recommend checking these out.
Go adds compilation flags to the
DW_AT_producer attribute in the DWARF debug information. After a few minutes of digging around, I found this Go issue about recording the optimization level in the binary, filed and resolved in 2017. There were a few changes associated with this but here’s where they seem to add the actual information to the attribute: https://go-review.googlesource.com/c/go/+/71430/5/src/cmd/link/internal/ld/dwarf.go (check out the rest of the commit as well)
Better output from GoLand
Although the issue has nothing to do with GoLand itself, Florin Pățan of GoLand joined in to discuss the insufficient output one gets from GoLand when debugging an optimized binary with Delve. Delve apparently provides a warning when you debug an optimized binary, but GoLand does not consume this or report it to the user. Instead, you get the kinds of output I saw - nil and otherwise zero-value variables. This is pretty misleading. Florin provided and followed up on an issue already in existence to improve this on the JetBrains side: https://youtrack.jetbrains.com/issue/GO-5311
Overall, even though I didn’t end up with all the answers in the end, I learned some interesting things when looking into this problem. I also appreciated how members of both the Delve and JetBrains communities jumped on this issue so quickly to provide useful insights and follow up on potential IDE improvements here, even though this was obviously not directly a Delve or GoLand problem.
I guess I can remove all those
fmt.Println()s that I fell back on to track down that original bug I was debugging now…