[Thinlinc-technical] vsmserver process unexpectedly died

Rob De Langhe rob.de.langhe at twistfare.be
Tue Oct 20 19:10:57 CEST 2015


  hi Karl,

thanks for your reply, highly appreciated !

I checked:
- no CRON jobs are configured on this master server:
# ls -l /var/spool/cron/crontabs
total 0

- no AT jobs are configured on this master server:
# ls -l /var/spool/cron/at*
/var/spool/cron/atjobs:
total 0

/var/spool/cron/atspool:
total 0

- the logrotate-status shows that there was indeed some log-rotation going
on at 6:25 Oct-18th:
# grep '2015-10-18-6' /var/lib/logrotate/status
"/var/log/auth.log" 2015-10-18-6:25:8
"/var/log/tlwebaccess.log" 2015-10-18-6:25:8
"/var/log/kern.log" 2015-10-18-6:25:8
"/var/log/vsmagent.log" 2015-10-18-6:25:8
"/var/log/vsmserver.log" 2015-10-18-6:25:8
"/var/log/apache2/error.log" 2015-10-18-6:25:8
"/var/log/tlwebadm.log" 2015-10-18-6:25:8

So, despite that I do not understand what is kicking off the logrotation,
it is indeed happening, and was the likely cause of the SIGHUP/stop/start
of the VSM processes.

However, indeed it remains a mistery why the "vsmserver" process did not
correctly startup.
That timestamp in the /var/lib/logrotate/status file is 6:25:08
# grep vsmserver /var/lib/logrotate/status
"/var/log/vsmserver.log" 2015-10-18-6:25:8

At that timestamp, there is nothing in the logs indicating the stop of
"vsmserver" , only at 06:27:00
# tail /var/log/vsmserver.log.1
2015-10-18 06:24:54 DEBUG vsmserver: Scheduled load update of 127.0.0.1 at
Sun Oct 18 06:25:34 2015
2015-10-18 06:25:34 DEBUG vsmserver: Scheduled load update of 127.0.0.1 at
Sun Oct 18 06:26:14 2015
2015-10-18 06:26:14 DEBUG vsmserver: Scheduled load update of 127.0.0.1 at
Sun Oct 18 06:26:54 2015
2015-10-18 06:26:54 WARNING vsmserver.loadinfo: Connection refused
(ECONNREFUSED) talking to VSM Agent 127.0.0.1:904 in request for loadinfo.
Marking as down.
2015-10-18 06:26:55 DEBUG vsmserver: Scheduled load update of 127.0.0.1 at
Sun Oct 18 06:27:35 2015
2015-10-18 06:27:00 INFO vsmserver: Got SIGTERM, signaling process to quit

Its startup is logged in a new file "/var/log/vsmserver.log"
#head /var/log/vsmserver.log
2015-10-18 06:27:03 INFO vsmserver: VSM Server version 4.4.0 build 4775
started
2015-10-18 06:27:03 INFO vsmserver.license: Updating license data from disk
to memory
2015-10-18 06:27:03 INFO vsmserver.license: License summary: 10 concurrent
users. Hard limit of 11 concurrent users.
2015-10-18 06:27:03 ERROR vsmserver: Unable to bind to ('', 9000) - address
already in use. Is another VSM Server running?
2015-10-18 06:27:03 ERROR vsmserver: Exiting

So apparently the SIGTERM did not really stop the program ? Something had
port 9000 still in use.

At this moment, I see that port TCP/9000 is in use by the "vsmserver"
process :
# netstat -an|grep 9000
tcp        0      0 0.0.0.0:9000           
0.0.0.0:*               LISTEN
# lsof +c 15 -i tcp:9000
COMMAND          PID USER   FD   TYPE   DEVICE SIZE/OFF NODE
NAME
python-thinlinc 3150 root    5u  IPv4 22446417      0t0  TCP
*:9000 (LISTEN)
# ps -aef|grep 3150
root      3150     1  0 Oct18 ?        00:00:12
python-thinlinc /opt/thinlinc/sbin/vsmserver

When I send it a TERM signal, it seems to cleanly shutdown :

2015-10-20 19:06:18 INFO vsmserver: Got SIGTERM, signaling process to quit
2015-10-20 19:06:18 INFO vsmserver: Terminating. Have a nice day!

And port tcp/9000 is gone as well, as expected :
# lsof +c 15 -i tcp:9000
#

Startup works normally as well, as expected :
# service vsmserver start
Starting ThinLinc vsmserver
# lsof +c 15 -i tcp:9000
COMMAND          PID USER   FD   TYPE   DEVICE SIZE/OFF NODE
NAME
python-thinlinc 6907 root    5u  IPv4 24693741      0t0  TCP
*:9000 (LISTEN)

However, during the 'logrotate' actions, it did not properly terminate...

=> has anybody else encountered that same problem , where "vsmserver" did
not properly 'terminate' ?

Rob

Quoting Karl Mikaelsson <derfian at cendio.se>:

> Rob De Langhe <rob.de.langhe at twistfare.be> writes:
>> hi all,
>>
>> I found that we could no longer login, because the "vsmserver" process
on
>> the master was gone.
>> Checking its logs (/var/log/vsmserver.log), I found messages about its
>> dying, which I cannot recognize from anything in the documentation :
>>
>> ...
>> 2015-10-18 06:20:54 DEBUG vsmserver: Scheduled load update of 127.0.0.1
>> at Sun Oct 18 06:21:34 2015
>> 2015-10-18 06:21:34 DEBUG vsmserver: Scheduled load update of 127.0.0.1
>> at Sun Oct 18 06:22:14 2015
>> 2015-10-18 06:22:14 DEBUG vsmserver: Scheduled load update of 127.0.0.1
>> at Sun Oct 18 06:22:54 2015
>> 2015-10-18 06:22:54 DEBUG vsmserver: Scheduled load update of 127.0.0.1
>> at Sun Oct 18 06:23:34 2015
>> 2015-10-18 06:23:34 DEBUG vsmserver: Scheduled load update of 127.0.0.1
>> at Sun Oct 18 06:24:14 2015
>> 2015-10-18 06:24:14 DEBUG vsmserver: Scheduled load update of 127.0.0.1
>> at Sun Oct 18 06:24:54 2015
>> 2015-10-18 06:24:54 DEBUG vsmserver: Scheduled load update of 127.0.0.1
>> at Sun Oct 18 06:25:34 2015
>> 2015-10-18 06:25:34 DEBUG vsmserver: Scheduled load update of 127.0.0.1
>> at Sun Oct 18 06:26:14 2015
>> 2015-10-18 06:26:14 DEBUG vsmserver: Scheduled load update of 127.0.0.1
>> at Sun Oct 18 06:26:54 2015
>> 2015-10-18 06:26:54 WARNING vsmserver.loadinfo: Connection refused
>> (ECONNREFUSED) talking to VSM Agent 127.0.0.1:904 in request for
>> loadinfo. Marking as down.
>> 2015-10-18 06:26:55 DEBUG vsmserver: Scheduled load update of 127.0.0.1
>> at Sun Oct 18 06:27:35 2015
>> 2015-10-18 06:27:00 INFO vsmserver: Got SIGTERM, signaling process to
>> quit
>>
>> Any idea what might have caused this ?
>>
>> cheers and thanks all in advance for any feedbacks !
>> Rob
>
> Hi Rob,
>
> That looks like a perfectly normal shutdown (i.e. service vsmserver
> stop). Seeing as the vsmagent also stopped at approximately the same
> time, I'm guessing that logrotate might have something to do with it
> since that's the only thing I know that will restart vsmserver/vsmagent
> in an standard install.
>
> Are there any logs from cron or logrotate that can tell you why the
> services aren't starting again?
>
> regards,
>
> --
> Karl Mikaelsson     derfian at cendio.se
> System Developer    +46 (0)13-290863Cendio AB         
>  https://www.cendio.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cendio.se/pipermail/thinlinc-technical/attachments/20151020/6788f469/attachment-0003.html>


More information about the Thinlinc-technical mailing list