Page 1 of 1

Phone System stops

Posted: Fri Jan 27, 2017 1:09 pm
by mwsmith
Our phone system does a automatic full backup every night to our server. Recently, the backup could not finish completely causing the phone system to stop running. The only way to bring it back was a reboot. The system appears to run normally but not able to do backups anymore. Any suggestions?

This is part of the error message we are seeing:

Jan 21 00:00:32 Critical(2) tBackMain: IDE: checkDrvError() DMARW:Device=0 Error Reg=0x40

Jan 21 00:00:32 Critical(2) tBackMain: IDE: checkDrvError() DMARW: Device=0 Error Location LBA=3256410

Jan 21 00:00:32 Debug(7) tBackMain: +++ IEC [8.1.6.4:ideSiI0680.c,3327] 00758d84 00758ed8 0070b3c0 0070ca18 0071447c 0070ad30 0054c230 007bc6fc 007bc6c8

Jan 21 00:00:32 Critical(2) tBackMain: IDE XFER: Disk Error on READ StartLBA=5505130 Secs=16 - Retry 0

Re: Phone System stops

Posted: Sat Feb 11, 2017 10:28 am
by tetcom
I would suggest replacing the CF card, make sure it is at least 25X read/write speed. Seems doing a backup every night is excessive, once a week would be fine, possible saved messages should be the only potential loss.

Re: Phone System stops

Posted: Mon Feb 20, 2017 9:51 am
by doom1701
I'd love to get some more detail from what you see in your event log to see if the symptom may be related to something we're seeing. Can you answer two questions for me:

1. Is the system still pingable when it stops running?
2. Can you expand the filter on the event log after booting it back up to include debug messages, and see if you have a ton of debug messages?

I have a theory that there's a bug in at least v8 where, if too many messages get written to the event log in a very short period of time (a couple of seconds, possibly less), it causes the system to basically die. We've seen two instances of this--the first is some sort of Denial of Service attack generating a ton of SIP Debug events in a very short period of time. The second was an SMTP configuration issue that prevented voicemails from going out for a week. Every few hours the system would try sending everything out again, generating an event log message for each email attempt.

In both of these instances (and we seem to deal with the DoS issue at least once a week), the web admin interface becomes inaccessible, phones lose their registration, but calls that are already in process continue to work and the system is pingable. Only fix is a hard reboot.