I continually (every time shortly after boot, and then periodically thereafter while the system is up) get reports of segfaults and kernel traps in my syslog. The offending seems to be related to xfce4 panel, but isn't the actual panel process ID - and I can't find out what process it is to try and convince it to core dump or attach to it (there's never a coredump lying around in my home folder, which would be the CWD for panel (and yes, ulimit is set to allow coredumps)). Here's a sample: > Jan 21 17:02:49 gentoo kernel: traps: panel-18-xfce4-[3001] general protection ip:7f590dfedabb sp:7ffe743a4630 error:0 in libxfce4sensors.so.4.0.0[7f590dfe9000+b000] > Jan 21 17:12:49 gentoo kernel: panel-18-xfce4-[5701]: segfault at 21 ip 00007f554cb3b466 sp 00007ffdac822fd8 error 4 in libc-2.26.so[7f554ca9d000+1ba000] > Jan 21 17:22:49 gentoo kernel: panel-18-xfce4-[6509]: segfault at 550068746469 ip 00007f58af6b7466 sp 00007ffd7ac8b038 error 4 in libc-2.26.so[7f58af619000+1ba000] > Jan 21 17:32:49 gentoo kernel: panel-18-xfce4-[7258]: segfault at 1 ip 00007f72466e5466 sp 00007ffe823cc6b8 error 4 in libc-2.26.so[7f7246647000+1ba000] > Jan 21 17:42:49 gentoo kernel: panel-18-xfce4-[8400]: segfault at 71 ip 00007f7d9d302ab8 sp 00007fff9d8d61b0 error 4 in libxfce4sensors.so.4.0.0[7f7d9d2fe000+b000] > Jan 21 17:52:49 gentoo kernel: panel-18-xfce4-[9421]: segfault at 31 ip 00007f7ff1a16466 sp 00007ffff5700c78 error 4 in libc-2.26.so[7f7ff1978000+1ba000] > Jan 21 17:58:25 gentoo kernel: traps: panel-18-xfce4-[10886] general protection ip:7fab77cb2466 sp:7fff163973d8 error:0 in libc-2.26.so[7fab77c14000+1ba000] I could provide more information possibly if someone could give me a clue to work out what these PIDs relate to, since they're obviously not the panel's main process: > famine@gentoo ~ (0) $ ps -eo pid,lstart,cmd |grep panel > 2911 Sun Jan 21 18:08:50 2018 xfce4-panel > 3026 Sun Jan 21 18:08:51 2018 /usr/lib64/xfce4/panel/wrapper-2.0 /usr/lib64/xfce4/panel/plugins/libsystray.so 6 16777225 systray Notification Area Area where notification icons appear > 6038 Sun Jan 21 18:18:51 2018 /usr/lib64/xfce4/panel/wrapper-2.0 /usr/lib64/xfce4/panel/plugins/libxfce4-sensors-plugin.so 18 16777224 xfce4-sensors-plugin Sensor plugin Show sensor values. > 6429 Sun Jan 21 18:22:01 2018 grep --colour=auto panel > famine@gentoo ~ (2) $ ls -l /proc/{2911,3026,6038}/cwd > lrwxrwxrwx 1 famine users 0 Jan 21 18:22 /proc/2911/cwd -> /home/famine > lrwxrwxrwx 1 famine users 0 Jan 21 18:20 /proc/3026/cwd -> /home/famine > lrwxrwxrwx 1 famine users 0 Jan 21 18:22 /proc/6038/cwd -> /proc/acpi > famine@gentoo ~ (0) $ cat /proc/{2911,3026,6038}/limits |grep core > Max core file size unlimited unlimited bytes > Max core file size unlimited unlimited bytes > Max core file size unlimited unlimited bytes > famine@gentoo ~ (0) $ Since this is so repeatable, I'm in a good position to test any fixes you come up with.
I should point out the error messages in the first post seem to be from before a reboot (I just copied them out of my mailbox). However, as usual, it did log at least one after the reboot which may be more relevant to the process list above: > Jan 21 18:18:52 gentoo kernel: panel-18-xfce4-[3017]: segfault at 3d0000007f ip 00007fcc09b9e466 sp 00007ffe54fd6538 error 4 in libc-2.26.so[7fcc09b00000+1ba000] One interesting thing is this is timestamped 1s before the start from of the sensors plugin wrapper, which has a start time 10 minutes after boot time - so perhaps it is the sensors plugin crashing and restarting? If so, maybe that explains why I can't find a core file, since its CWD appears to be /proc/acpi, to which it won't be able to write.
From my own hints above, I guessed it was that panel wrapper that has CWD as /proc/acpi, so I set kernel.core_pattern to dump to somewhere writable in /tmp, and indeed I have found a core file now. Sadly, it seems that even though I use the split debug feature on gentoo to keep debug symbols separately from the executables, in this case it is empty - it must be that the gentoo ebuild is forcing symbol stripping regardless of my settings. These new failures: > Jan 21 19:10:17 gentoo kernel: traps: panel-18-xfce4-[3032] general protection ip:7fe0f2956466 sp:7ffdde254df8 error:0 in libc-2.26.so[7fe0f28b8000+1ba000] > Jan 21 19:20:17 gentoo kernel: traps: panel-18-xfce4-[7002] general protection ip:7f730dc36466 sp:7ffc4277fa18 error:0 in libc-2.26.so[7f730db98000+1ba000] > Jan 21 19:30:17 gentoo kernel: panel-18-xfce4-[9001]: segfault at 21 ip 00007f5ad38a9466 sp 00007fffecbb5b98 error 4 in libc-2.26.so[7f5ad380b000+1ba000] Seem to match these cores: > gentoo /tmp (0) # ls -l /tmp/core* > -rw------- 1 famine users 22536192 Jan 21 19:10 /tmp/core-panel-18-xfce4--1000-3032-1516561817 > -rw------- 1 famine users 22274048 Jan 21 19:20 /tmp/core-panel-18-xfce4--1000-7002-1516562417 > -rw------- 1 famine users 22274048 Jan 21 19:30 /tmp/core-panel-18-xfce4--1000-9001-1516563017 > gentoo /tmp (0) # But unfortunately, the debug symbol file is an empty object :( > gentoo /tmp (0) # gdb -c /tmp/core-panel-18-xfce4--1000-3032-1516561817 > GNU gdb (Gentoo 8.0.1 vanilla) 8.0.1 > Copyright (C) 2017 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64-pc-linux-gnu". > Type "show configuration" for configuration details. > For bug reporting instructions, please see: > <https://bugs.gentoo.org/>. > Find the GDB manual and other documentation resources online at: > <http://www.gnu.org/software/gdb/documentation/>. > For help, type "help". > Type "apropos word" to search for commands related to "word". > [New LWP 3032] > [New LWP 3040] > [New LWP 3038] > Core was generated by `/usr/lib64/xfce4/panel/wrapper-2.0 /usr/lib64/xfce4/panel/plugins/libxfce4-sens'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x00007fe0f2956466 in ?? () > [Current thread is 1 (LWP 3032)] > (gdb) symbol-file /usr/lib/debug/usr/lib64/xfce4/panel/wrapper-2.0.debug > Reading symbols from /usr/lib/debug/usr/lib64/xfce4/panel/wrapper-2.0.debug...(no debugging symbols found)...done. > (gdb) shell objdump -a /usr/lib/debug/usr/lib64/xfce4/panel/wrapper-2.0.debug > > /usr/lib/debug/usr/lib64/xfce4/panel/wrapper-2.0.debug: file format elf64-x86-64 > /usr/lib/debug/usr/lib64/xfce4/panel/wrapper-2.0.debug > > (gdb) bt > #0 0x00007fe0f2956466 in ?? () > #1 0x00007fe0f2f07405 in ?? () > #2 0x0000000000000000 in ?? () > (gdb)
I had to manually force "-O1 -ggdb" in the makefiles, since doing configure debug-enable doesn't actually compile apparently (generates warnings, but sets Werror). After far too much faffing around, I found that __strlen_sse2 is being passed bad data from sensors_write_config. A couple of core dumps show different data both times, but it appears that ptr_sensors passed to sensors_write_config is either bad, or the data it contains is completely corrupt. > (gdb) bt > #0 0x00007fdc6c854466 in __strlen_sse2 () from /lib64/libc.so.6 > #1 0x00007fdc6ce05405 in g_string_chunk_insert_len () from /usr/lib64/libglib-2.0.so.0 > #2 0x00007fdc6d0b243f in simple_add_entry.isra () from /usr/lib64/libxfce4util.so.7 > #3 0x00007fdc6d0b2600 in _xfce_rc_simple_write_entry () from /usr/lib64/libxfce4util.so.7 > #4 0x00007fdc648a6a4c in sensors_write_config (ptr_sensors=0x56240a9c8260) at configuration.c:138 > #5 0x00007fdc6d4cdccd in g_closure_invoke () from /usr/lib64/libgobject-2.0.so.0 > #6 0x00007fdc6d4dfaee in signal_emit_unlocked_R () from /usr/lib64/libgobject-2.0.so.0 > #7 0x00007fdc6d4e81b5 in g_signal_emit_valist () from /usr/lib64/libgobject-2.0.so.0 > #8 0x00007fdc6d4e8b7a in g_signal_emit () from /usr/lib64/libgobject-2.0.so.0 > #9 0x00007fdc6ec5e570 in xfce_panel_plugin_save () from /usr/lib64/libxfce4panel-2.0.so.4 > #10 0x0000562408b5fa88 in wrapper_gproxy_g_signal () > #11 0x00007fdc67fc1342 in ffi_call_unix64 () from /usr/lib64/libffi.so.6 > #12 0x00007fdc67fc03a1 in ffi_call () from /usr/lib64/libffi.so.6 > #13 0x00007fdc6d4ce493 in g_cclosure_marshal_generic () from /usr/lib64/libgobject-2.0.so.0 > #14 0x00007fdc6d4cdccd in g_closure_invoke () from /usr/lib64/libgobject-2.0.so.0 > #15 0x00007fdc6d4dfaee in signal_emit_unlocked_R () from /usr/lib64/libgobject-2.0.so.0 > #16 0x00007fdc6d4e81b5 in g_signal_emit_valist () from /usr/lib64/libgobject-2.0.so.0 > #17 0x00007fdc6d4e8b7a in g_signal_emit () from /usr/lib64/libgobject-2.0.so.0 > #18 0x00007fdc6d7d5bac in on_signal_received () from /usr/lib64/libgio-2.0.so.0 > #19 0x00007fdc6d7c5b74 in emit_signal_instance_in_idle_cb () from /usr/lib64/libgio-2.0.so.0 > #20 0x00007fdc6cde3645 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 > #21 0x00007fdc6cde39e8 in g_main_context_iterate.isra () from /usr/lib64/libglib-2.0.so.0 > #22 0x00007fdc6cde3cf2 in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0 > #23 0x00007fdc6e521fc5 in gtk_main () from /usr/lib64/libgtk-3.so.0 > #24 0x0000562408b5f396 in main () > (gdb) > (gdb) frame 4 > #4 0x00007fdc648a6a4c in sensors_write_config (ptr_sensors=0x56240a9c8260) at configuration.c:138 > 138 > (gdb) print ptr_sensors > $9 = (t_sensors *) 0x56240a9c8260 > (gdb) info symbol 0x56240a9c8260 > No symbol matches 0x56240a9c8260. > (gdb) print *ptr_sensors > $10 = {plugin = 0x56240abc9010, eventbox = 0x2, widget_sensors = 0x56240aac4100, panel_label_data = 0x56240a9c8170, > panel_label_text = 0x56240a9c8150, timeout_id = 178028864, str_fontsize = 0x56240a9c8130 "", val_fontsize = 178028736, > scale = (unknown: 22052), panel_size = 178029696, lines_size = 22052, cover_panel_rows = 0, > panel_size = 178029696, lines_size = 22052, cover_panel_rows = 0, orientation = XFCE_PANEL_PLUGIN_MODE_HORIZONTAL, > bars_created = 0, tachos_created = 884, show_title = 16779268, show_labels = 16711680, > show_units = 65280, show_smallspacings = 255, show_colored_bars = 0, display_values_type = 885, suppressmessage = 16779268, > suppresstooltip = 16711680, sensors_refresh_time = 65280, num_sensorchips = 255, panels = {{ I cut the rest because it's all complete junk - obviously most of these values are invalid (e.g. scale should ne 0 or 1, and all those booleans too), and so it's quite easy to see why we get a segfault from here. The question is, what is trashing this pointer (or its contents)? Both the other threads are sat in poll() inside g_main_loop_run(), which doesn't look too suspicious.
Adding some debug sent to the syslog shows that the pointer is incorrect during the callback, > Jan 22 23:16:57 gentoo wrapper-2.0[2990]: csc: ptr: 0x5606f3d4a000 > Jan 22 23:26:57 gentoo wrapper-2.0[2990]: swc: ptr: 0x5606f3ad9260 > Jan 22 23:26:57 gentoo kernel: panel-18-xfce4-[2990]: segfault at 21 ip 00007f2081d0a466 sp 00007ffeeaaf70b8 error 4 in libc-2.26.so[7f2081c6c000+1ba000] Where csc is the pointer value at allocation (create_sensors_control), and swc is the value passed to the callback (sensors_write_config). The coredump backs this up showing sane data at the original location (as opposed to the obvious junk at the passed location): > (gdb) print ptr_sensors > $7 = (t_sensors *) 0x5606f3ad9260 > (gdb) print *(t_sensors *)0x5606f3d4a000 > $8 = {plugin = 0x5606f3ad9260, eventbox = 0x5606f3adb130, widget_sensors = 0x5606f3d551a0, > panel_label_data = 0x5606f3a67590, panel_label_text = 0x5606f3a673f0, timeout_id = 8, > str_fontsize = 0x5606f3bd9b00 "medium", val_fontsize = 2, scale = CELSIUS, .... Somehow the pointer on the callback is junk. I need someone else who knows this software better than me to provide some insight now, I can't waste any more time on this.
I put lots of time into this, it's a shame nobody seems at all bothered to even look at it!
I can confirm this. It is a bug in the xfce4-sensors-plugin, version 1.2.98, so not directly a bug in the panel. The problem seems to be already fixed in master, as I cannot reproduce it with the latest code from 1.3.0-alpha.
Can you provide any more clarity? A patch I can use, or a pointer where to download the alpha? http://goodies.xfce.org/projects/panel-plugins/xfce4-sensors-plugin#releases doesn't mention it and http://archive.xfce.org/src/panel-plugins/xfce4-sensors-plugin/ doesn't have a 1.3 folder, so I'm not sure where you're getting it from.
You can get the sources from GitHub: https://github.com/xfce-mirror/xfce4-sensors-plugin I have also posted a bug on the gentoo bugtracker: https://bugs.gentoo.org/653964 As mentioned there, I guess the fix for the segfault comes with this commit: https://github.com/xfce-mirror/xfce4-sensors-plugin/commit/f9904f1771b6538ca4b2f7e81b51ef1d8a6fdc6f You can try that as a patch against 1.2.98.
Ah, ok - I found the mirror myself, but I didn't realise it was that up to date - I checked for a 1.3 alpha release, and there isn't one, but since you've identified the commit, I'll give it a go as a patch. Thanks!
Re-assigning to the sensors plugin.
famine, I can only care when being the receiver due to complete information in the bug data. Simon cared for it now, so thanks. Will make a release soon, I guess. For the future, please do always remember to provide version information, i.e, sensors plugin 1.2.98 in your case. Frequently, people reported already closed bugs again ... And as noted in the commit message, this fixes 2 already existing bugs ...
Release 1.3.0 is out now; with not all the code cleanup I had initially intended, but due to this bug the release seemed really necessary.