Created attachment 3135 xfce4-mpc-plugin log when compiled with -DDEBUG=1 Hi, I often have xfce4-mpc-plugin segfaulting on my desktop when I put my mouse on the panel (the crashes at least once per day, and I have to restart the plugin). I have recompiled the plugin with debug information, and this leads me to the attached logs and backtrace. As I understand the log, it seems that mpd didn't return the correct information to an "outputs" request, and this leads to the crash. Should the plugin made more robust against this kind of mpd errors ? Thanks, Jocelyn
Created attachment 3136 gdb backtrace
Yes, it should be made more robust. - what if you compile against libmpd ? This way, simple-libmpd.c won't be used. That's a temporary workaround. - what MPD server version are you running ? I find the returned value strange. What does it reply to an "outputs" command ? (echo 'outputs' | nc <host> 6600)
(In reply to comment #2) > Yes, it should be made more robust. I will try to make a patch so that it works even if g_strsplit() returns null strings > - what if you compile against libmpd ? This way, simple-libmpd.c won't be used. > That's a temporary workaround. I didn't try this. > - what MPD server version are you running ? I find the returned value strange. > What does it reply to an "outputs" command ? (echo 'outputs' | nc <host> 6600) I'm running mpd 0.15.12-1.1 from Debian. What is strange is that the "outputs" command returns a correct result everytime, except in some very rare case, where it gives an incorrect value. I don't know if the problem is somewhere in mpd or in simple-libmpd.c, and I don't know how to debug this. Should I recompile mpd with more debug information ? In any case, I tried to run the command you propose more than 10.000 times, and it always gave correct results... % echo 'outputs' | nc localhost 6600 OK MPD 0.15.0 outputid: 0 outputname: My ALSA Device outputenabled: 1 OK Thanks, Jocelyn
Created attachment 3138 Patch to fix parse_outputs_answer() when response is incorrect This patch only fixes function parse_outputs_answer(). Other functions should also be modified to make the plugin more robust.
Patch is a bit wrong.. i try to be 'compatible' with libmpd, and 18 corresponds to MPD_ERROR_ACK +#define MPD_ERROR_RESPONSE 18 /* incorrect response */ /usr/local/include/libmpd-1.0/libmpd/libmpdclient.h:#define MPD_ERROR_ACK 18 /* ACK returned! */ I'll try to cook a better fix for all functions.
ok, so here's my analysis : - in your case, either mpd returned totally bogus info, or the buffer contained info from a previous mpd reply, which shouldn't happen as the buffer is zeroed at the end of mpd_send_single_cmd()/send_complex_cmd(), but maybe not in error cases. Do you have the previous part of the -DDEBUG=1 log, before "DBG[xfce4-mpc-plugin.c:366] mpc_update_outputs(): !" ? - as it contained bogus info it didn't correctly detect the 'ok' line, tried to parse it and crashed - this can only happen in parse_outputs_answer() and parse_playlistinfo_answer(), which contain two while loops, the inner one doesn't check if it finds an 'ok'. Other parsing functions should be able to handle it.
I've pushed a possible fix. can you try git master, or just manually apply http://git.xfce.org/panel-plugins/xfce4-mpc-plugin/commit/?id=2a58794661890cc004ee8f07bd4e29cfc44b07c1 and see if it still happens ?
Created attachment 3148 larger log from panel debug outputs This took some time, but I have at least the log you wanted. (this is with my patch, which prevents the segfault, but still have xfce4-mpc-plugin existing). If you want a longer log, I will keep the whole log in my home directory, but I think only the last lines are interesting. I can see this line some commands before the faulting one: DBG[simple-libmpd.c:222] mpd_wait_for_answer(): ERROR @select(), timeoute'ed -> err=Ressource temporairement non disponible
(In reply to comment #8) > Created attachment 3148 > larger log from panel debug outputs > > This took some time, but I have at least the log you wanted. (this is with my > patch, which prevents the segfault, but still have xfce4-mpc-plugin existing). > > If you want a longer log, I will keep the whole log in my home directory, but I > think only the last lines are interesting. > > I can see this line some commands before the faulting one: > > DBG[simple-libmpd.c:222] mpd_wait_for_answer(): ERROR @select(), timeoute'ed -> > err=Ressource temporairement non disponible Great, now i have a better understanding of what's wrong. What happens is for an unknown reason select() fails/timeouts when sending 'play' command, and then sends the next command ('status') without waiting for the answer to 'play'. There, it gets the reply 'OK' for the 'play' command, then it sends 'currentsong' and gets the reply for the 'status' command, then it sends 'outputs' and gets the reply for the 'currentsong' command, etc etc etc. The command/reply chain is off by one. The logic is wrong somewhere, either the read buffer should be discarded/flushed in case of error, or the failed command should be resent.. I'll have to think about it, too many codepaths/error handling needed. What if you use git master ? I suppose the desynch will still happen, but it shouldn't crash anymore.
(In reply to comment #9) > What if you use git master ? I suppose the desynch will still happen, but it > shouldn't crash anymore. I will try your change. I'm wondering if a better fix could be to drop the mpd connection and restart another one on each timeout. I see no other way to resynchronize everything. I think I hit this problem because my mp3 radio is sometime quite slow to answer, and this might cause the timeouts. Thanks, Jocelyn
I've just released 0.4.0. Can you test whether your issue is still present in that version ?
Oups, I can see that I forgot to comment since my last comment :) I haven't seen the problem since your patch. I'm now using the latest version from Debian testing (0.3.6-1 at the current time), and it still works correctly. Do you want me to especially try 0.4.0, or can it wait until it enters Debian testing ?
0.4.0 changes nothing wrt those codepaths, it's the same code as 0.3.6, so i'll assume your issue was fixed by the commit in comment #7.