Post new topic Reply to topic  [ 17 posts ] 

Board index : TeleFlow Forums : TeleFlow Server & Monitor

Author Message
 Post subject: TFServer crash! How to avoid and force a clean app startup
PostPosted: Mon Jun 08, 2009 1:58 pm 
Offline
User avatar

Joined: Fri Feb 06, 2004 11:05 am
Posts: 115
Location: Costa Rica
I just noticed 4 channels down when observing TFMonitor (screenshot attached).

I reviewed the last messages in one of them:

Quote:
2009/06/08 16:13:40.421: Application restart 6980


2009/06/08 16:13:40.421: 370E71: EXCEPTION_ACCESS_VIOLATION attempting to write at location 3A313335.
2009/06/08 16:13:40.421: Exception occurred! Thread exiting.
2009/06/08 16:13:43.421:


2009/06/08 16:13:43.421: Application restart 6981


2009/06/08 16:13:43.421: 370E71: EXCEPTION_ACCESS_VIOLATION attempting to write at location 3A313335.
2009/06/08 16:13:43.421: Exception occurred! Thread exiting.
2009/06/08 16:13:46.421:


2009/06/08 16:13:46.421: Application restart 6982


2009/06/08 16:13:46.421: 370E71: EXCEPTION_ACCESS_VIOLATION attempting to write at location 3A313335.
2009/06/08 16:13:46.421: Exception occurred! Thread exiting.


I know these messages are not very informative, but I'm not sure where to look for more detailed information about the exception or how to reproduce it.

I tried restarting the failed channels with TFMonitor, by right-clicking on them and selecting "Reload Abort", but then TFMonitor lost its connection to TFServer... actually, TFServer just crashed and I had to restart the service.

Can you help me diagnose this and suggest ways to avoid it?


Attachments:
File comment: TFMonitor showing channels 2, 3, 5 and 6 stalled (crashed).
TFMonitor.png
TFMonitor.png [ 42.7 KB | Viewed 17174 times ]
Back to top
 Profile  
 
 Post subject: Re: TFServer crash! How to avoid and force a clean app startup
PostPosted: Mon Jun 08, 2009 3:19 pm 
Offline
Site Admin

Joined: Wed Dec 31, 1969 5:00 pm
Posts: 329
Location: Vancouver, BC
Exceptions of this sort usually indicate a problem with TFServer itself. I would like to know what build of TFServer you are on. You can get the build number from the top of the TFServer log file. It will be in your TeleFlow log folder, and named similar to the following example:

TFServer_2009-06-08.log

The build number will be at the top of the file. The most recent release build of TFServer is 2009.4.8.

Also, if you could find the point in the log where the exception first occurred, that would be very helpful. I can tell by your application restart count that your log files will be huge. It could take you a while to find it!


Back to top
 Profile WWW 
 
 Post subject: Re: TFServer crash! How to avoid and force a clean app startup
PostPosted: Tue Jun 09, 2009 3:09 pm 
Offline
User avatar

Joined: Fri Feb 06, 2004 11:05 am
Posts: 115
Location: Costa Rica
The log says:

Engenic TeleFlow: Build 2009.3.19
(c) 1994 - 2008 Engenic
All rights reserved.

Where can I download the latest stable build?

I have 5.79 GB taken by log files, stored in a NTFS-compressed folder (consuming just 1.67 GB of disk space), on a data partition with 46.9 GB still free... I hope that will suffice for some time :D

As you may guess it will take me some time to digg into these files to search for the exception...


Back to top
 Profile  
 
 Post subject: Re: TFServer crash! How to avoid and force a clean app startup
PostPosted: Tue Jun 09, 2009 5:11 pm 
Offline
Site Admin

Joined: Wed Dec 31, 1969 5:00 pm
Posts: 329
Location: Vancouver, BC
Where you get the new version depends on what edition you have: Open (formerly Hype), Standard or Enterprise. There's no way to tell from the log files. You have to know what you need based on what features you are using. I can see from your TFMonitor screen-shot that you are using faxing, and therefore you have Enterprise. Enterprise updates are usually arranged through your representative here. I will send you a private message with more information. For the benefit of others who may be reading this thread, you can also send an email to sales{at}engenic.com to contact your representative.

We have not fix a lot of these types of errors from 2009.3.19 to the current version (2009.4.8). I honestly don't think it will help. We've rolled out a new Beta more recently than that. I'd like to give you that instead.

On to the log issue. I have a suggestion to make things easier. Try this:
1. Stop TFServer
2. Delete all the logs in your log folder
3. Run TeleFlow LineList
4. For your first few inbound ports, set "Restart Count" to 6 and un-check the "No limit" checkbox
5. Restart TFServer

This will limit the number or restarts to 6. You will want to watch the system carefully, because once an application has restarted for 6th time, that port will stop answering the phone all together. You will see a red light next to it, instead of yellow. The benefit to this method is that the log files will be limited in how large they can get, and you should be able to find the original error easily. They would also be small enough that could probably email them to us. Send to support{at}engenic.com.

After you have the logs you need, you can reverse the changes you made to your TF LineList.


Back to top
 Profile WWW 
 
 Post subject: Re: TFServer crash! How to avoid and force a clean app startup
PostPosted: Wed Jun 10, 2009 8:03 am 
Offline
User avatar

Joined: Fri Feb 06, 2004 11:05 am
Posts: 115
Location: Costa Rica
Yes, it's the Enterprise release.

I'll perform the upgrade and the temporary setup you suggest and if it happens again, I'll forward what I find in the logs.

Thanks a lot!


Back to top
 Profile  
 
 Post subject: Re: TFServer crash! How to avoid and force a clean app startup
PostPosted: Wed Jun 10, 2009 9:19 am 
Offline
User avatar

Joined: Fri Feb 06, 2004 11:05 am
Posts: 115
Location: Costa Rica
Hey! I just entered the IVR's console to perform the upgrade and noticed that channel 1 was stalled with the same exception. I was able to capture its log (via right-click / Show Messages) and noticed this:

Code:
2009/06/10 10:51:59.703:  fc_SayAmount<==fc_ReadElementCatalog
2009/06/10 10:51:59.703:     [46] Connect Point
2009/06/10 10:51:59.703:     [2] FINISH
2009/06/10 10:51:59.703:  fc_CreditCardMenu<==fc_SayAmount
2009/06/10 10:51:59.703:     [663] Compare
2009/06/10 10:51:59.703:        First Value '@vCreditAccounts' evaluates to '2'
2009/06/10 10:51:59.703:        Second Value is '2'
2009/06/10 10:51:59.703:           @vCreditAccounts = 2?
2009/06/10 10:51:59.703:           (integer):2 = (integer):2?
2009/06/10 10:51:59.703:        TRUE
2009/06/10 10:51:59.703:     [667] Play
2009/06/10 10:51:59.703:        Script code is 'LB-O'
2009/06/10 10:51:59.703:        Script code 'LB-O' corresponds to file 'Vox\Spn\Prompts\o.WAV'
2009/06/10 10:51:59.703:        Playing 'E:\Applications\IVR_RD\Vox\Spn\Prompts\o.WAV'
2009/06/10 10:51:59.703:        ERROR: Port not off hook
2009/06/10 10:51:59.703:        FATAL
2009/06/10 10:51:59.703:           Attempting to jump to Fatal Error Global Event label: ''
2009/06/10 10:51:59.703:           Fatal label undefined!
2009/06/10 10:51:59.703:  fc_IVR_MenuManager<==fc_CreditCardMenu
2009/06/10 10:51:59.703:        Setting local variable '@VCLIENTFAX' to '0'
2009/06/10 10:51:59.703:        Setting local variable '@VRESULTPROC' to 'OK'
2009/06/10 10:51:59.703:  fc_MainApp<==fc_IVR_MenuManager
2009/06/10 10:51:59.703:        Setting local variable '@RESULTPROC' to ''
2009/06/10 10:51:59.703:  fc_MainApp<==fc_MainApp
2009/06/10 10:51:59.703:  fc_AccountMenu<==fc_MainApp
2009/06/10 10:51:59.703:  16F42E2: EXCEPTION_ACCESS_VIOLATION attempting to  read at location 3A303031.
2009/06/10 10:51:59.703:  Exception occurred!  Thread exiting.
2009/06/10 10:52:02.703: 
                         
                         
2009/06/10 10:52:02.703:  Application restart 31
                         
                         
2009/06/10 10:52:02.703:  370E71: EXCEPTION_ACCESS_VIOLATION attempting to  write at location 3A303035.
2009/06/10 10:52:02.703:  Exception occurred!  Thread exiting.
2009/06/10 10:52:05.703: 


Analysis:

  • This is still v 2009.3.19; I have not yet performed the upgrade
  • Notice that the first error condition arrives when the application, while in the Credit Card Operations submenu (fc_CreditCardMenu), tries to play a WAV file on an on-hook port (the customer hanged up prematurely while listening to some dictated information).
  • The application then tries to jump to the FATAL error handler, but the log says "Fatal label undefined!", which is false because my source does indeed declare a Fatal Global Event and a Jump label to handle it
  • Then the application seems to backtrack to the main flowchart (called fc_MainApp), loops in there ("fc_MainApp<==fc_MainApp") and wrongly jumps to the Debit Account Operations submenu ("fc_AccountMenu<==fc_MainApp") before the definitive exception that disables the port.

I will send the full log along with screenshots of the flowchart where you can see the Global Event defined and its corresponding label, but my guess is that at some point a buffer overflow happened while in fc_CreditCardMenu that wiped the address of the Fatal Error handler and messed up with the application's stack to the point of entering to another submenu (fc_AccountMenu) from the main flowchart, something completely wrong!

Obviously I cannot give you the full source code but I assure you I've tried to program this as fail-safe as possible: Every flowchart has its own Hangup & Fatal error handlers, I try to avoid loops or "spaguettis" when calling flowcharts and getting out of them in an orderly and clean fashion, communicating results or status to the calling flowchart in order to take appropiate actions as hanging up and waiting for more calls, etc.

I will still proceed with the upgrade and monitoring of the new build as soon as my customer gives explicit permission to do so and report back any issues on this thread.

Thanks in advance for your help.


Back to top
 Profile  
 
 Post subject: Re: TFServer crash! How to avoid and force a clean app startup
PostPosted: Wed Jun 10, 2009 10:17 am 
Offline
Site Admin

Joined: Wed Dec 31, 1969 5:00 pm
Posts: 329
Location: Vancouver, BC
Quote:
I will send the full log along with screenshots of the flowchart where you can see the Global Event defined and its corresponding label...


Please zip these files and email to support{at}engenic.com

If you include in the message or subject line, "at Tim's request". That will let the others in the support group know this one is for me.


Back to top
 Profile WWW 
 
 Post subject: Re: TFServer crash! How to avoid and force a clean app startup
PostPosted: Wed Jun 10, 2009 11:41 am 
Offline
Site Admin

Joined: Wed Dec 31, 1969 5:00 pm
Posts: 329
Location: Vancouver, BC
Files received. Thanks.

This log is interesting, but I don't think it really tells the whole story.

I started at the top, auditing the path of the fatal error handler. This is the first time an error appears in the log, but at this point, the application is already on its 29th restart:

Code:
...
2009/06/10 10:51:05.093:  fc_MainApp<==fc_Logger
2009/06/10 10:51:05.093:     [69] Wait for Call
2009/06/10 10:51:05.093:        TIMEOUT is '0'
2009/06/10 10:51:05.093:        WARNING: TIMEOUT '0' evaluates to 0 and is interpreted as no timeout
2009/06/10 10:51:05.093:        Rings is '0'
2009/06/10 10:51:05.093:        Connection Delay is '500'
2009/06/10 10:51:05.093:        Wait for 0 rings and then pickup
2009/06/10 10:51:05.093:        ERROR: Port already in use
2009/06/10 10:51:05.093:        FATAL
2009/06/10 10:51:05.093:           Attempting to jump to Fatal Error Global Event label: 'FINISH_ERR'
2009/06/10 10:51:05.093:        Setting system variable '@FATAL' to ''
2009/06/10 10:51:05.093:     [67] Label
2009/06/10 10:51:05.093:        Label: FINISH_ERR
...


It concerns me that Wait for Call could not access the telephony port, and wondering how it got to that state. The app dutifully jumps to the error handler.

Notice that after TeleFlow finds your error handler, it clears the @FATAL setting. This prevents a subsequent application error from calling the same error handler recursively. At the end of this log, before the access violations start, the warning "Fatal label undefined!" could be an indication that the error handler had been cleared. Ordinarily, this is not a problem, but a catch-all to ensure that an un-handled error winds its way back to the final exit. The larger concern is why that un-winding got confused.

Returning to the log example above, the label 'FINISH_ERR' runs some clean-up code, and then calls the flowchart 'fc_Exit' is called, which sets new a error handler:

Code:
...
2009/06/10 10:51:05.093:  fc_MainApp==>fc_Exit
2009/06/10 10:51:05.093:     [1] START
2009/06/10 10:51:05.093:     [8] Global Event
2009/06/10 10:51:05.093:        Label is 'EXIT_END'
2009/06/10 10:51:05.093:        Setting system variable '@FATAL' to 'EXIT_END'
2009/06/10 10:51:05.093:     [11] Global Event
2009/06/10 10:51:05.093:        Label is 'EXIT_END'
2009/06/10 10:51:05.093:        Setting system variable '@HANGUP' to 'EXIT_END'
...


This is legitimate code, and will only cause you problems if 'EXIT_END' points back into the main application flow. Our recommended best practice is to ensure all Fatal errors find their way to the exit of the application. Based on the name of your label, I suspect all is well here.

Your exit code eventually gets to this point:

Code:
...
2009/06/10 10:51:05.093:     [13] Compare
2009/06/10 10:51:05.093:        First Value '@HOOK_STATUS' evaluates to 'OFF-HOOK'
2009/06/10 10:51:05.093:        Second Value is 'OFF-HOOK'
2009/06/10 10:51:05.093:           @HOOK_STATUS = OFF-HOOK?
2009/06/10 10:51:05.093:           (string):OFF-HOOK = (string):OFF-HOOK?
2009/06/10 10:51:05.093:        TRUE
2009/06/10 10:51:05.093:     [3] Play
2009/06/10 10:51:05.093:        Script code is 'ER-DificulTec'
2009/06/10 10:51:05.093:        Script code 'ER-DificulTec' corresponds to file 'vox\spn\scripts\er-dificultec.WAV'
2009/06/10 10:51:05.093:        Playing 'E:\Applications\IVR_RD\vox\spn\scripts\er-dificultec.WAV'
2009/06/10 10:51:05.109:        ERROR: QueryDigit::adiPeekDigit failed
2009/06/10 10:51:05.109:              (5) CTAERR_INVALID_CTAHD
2009/06/10 10:51:05.109:        FAILURE
2009/06/10 10:51:05.109:        Hanging up
2009/06/10 10:51:08.109: 
                         
                         
2009/06/10 10:51:08.109:  Application restart 30
...


This chunk of code is very concerning. @HOOK_STATUS should not be 'OFF-HOOK' at this point, because Wait For Call never got to process a call. The subsequent Play gets an error because none of the Port setup code got to execute. This is essentially a continuation of the first error above in the Wait For Call step. The line that reads "Hanging up" is an indication that TeleFlow is trying to close the port that it never got to open, and then the app just unceremoniously exits. This, I think, is the source of the later access violation exceptions. At minimum, its a clue.

What we don't know is just how the port got into this state in the first place. To answer that, I need to see the first few application crashes. Please follow my earlier suggestion to capture the first 6 failures. Hopefully that will shed more light on this situation.


Back to top
 Profile WWW 
 
 Post subject: Re: TFServer crash! How to avoid and force a clean app startup
PostPosted: Wed Jun 10, 2009 3:44 pm 
Offline
User avatar

Joined: Fri Feb 06, 2004 11:05 am
Posts: 115
Location: Costa Rica
Well, it's ready according to your instructions. I'll be monitoring and report back as soon as I have any crash.


Back to top
 Profile  
 
 Post subject: Re: TFServer crash! How to avoid and force a clean app startup
PostPosted: Thu Jun 11, 2009 10:52 am 
Offline
User avatar

Joined: Fri Feb 06, 2004 11:05 am
Posts: 115
Location: Costa Rica
I just found channel 1 stalled after the sixth restart; so I captured the log file and sent them by e-mail.

You can see all the events prior to the first restart. I'd like to know if I'm doing something wrong in my code, so as to fix it A.S.A.P.

Thanks for your help.


Back to top
 Profile  
 
 Post subject: Re: TFServer crash! How to avoid and force a clean app startup
PostPosted: Thu Jun 11, 2009 12:05 pm 
Offline
Site Admin

Joined: Wed Dec 31, 1969 5:00 pm
Posts: 329
Location: Vancouver, BC
I've received the log files. I'm reviewing them now.


Back to top
 Profile WWW 
 
 Post subject: Re: TFServer crash! How to avoid and force a clean app startup
PostPosted: Thu Jun 11, 2009 1:32 pm 
Offline
Site Admin

Joined: Wed Dec 31, 1969 5:00 pm
Posts: 329
Location: Vancouver, BC
There may be more than one issue in these logs for all I know, but this is the first one I found cronologically in the two logs you sent me. I've clipped some of the log output that is not relevant to this discussion and bolded the lines that have a problem.

Code:
...
2009/06/10 19:31:24.265:     [32] Set Local Variable
2009/06/10 19:31:24.265:        Setting local variable '@VCATALOGO' to '<?xml version="1.0" encoding="utf-8" ?>
                          <MOVTYPES>
                             <CCC12>lb-reverso.wav</CCC12>           <!-- REVERSIONdeDEBITOS                   -->
                             <CCC21>lb-deposito.wav</CCC21>          <!-- DEPOSITO A CUENTAS                   -->
...
                             <PRO68>lb-pagodeprestamo.wav</PRO68>    <!-- Adelanto parcialdeuna cuota          -->
                             <TCO103>sc-REVERSODEB.WAV</TCO103>      <!-- REVERSO DEBITO                       -->
                          </MOVTYPES>'
2009/06/10 19:31:24.265:     [42] Math
2009/06/10 19:31:24.265:        Base value is 'MOVTYPES->'
2009/06/10 19:31:24.265:        Modifying value '@pElementCode' evaluates to ''
2009/06/10 19:31:24.265:           @vTag = MOVTYPES-> CAT @pElementCode
2009/06/10 19:31:24.265:           @vTag = MOVTYPES-> CAT
2009/06/10 19:31:24.265:        Setting local variable '@VTAG' to 'MOVTYPES->'
2009/06/10 19:31:24.265:     [38] Connect Point
2009/06/10 19:31:24.265:     [3] XML Load
2009/06/10 19:31:24.265:        Handle is 'XML001'
2009/06/10 19:31:24.265:        Variable / file to load '@vCatalogo' evaluates to '<?xml version="1.0" encoding="utf-8" ?>
                          <MOVTYPES>
                             <CCC12>lb-reverso.wav</CCC12>           <!-- REVERSIONdeDEBITOS                   -->
                             <CCC21>lb-deposito.wav</CCC21>          <!-- DEPOSITO A CUENTAS                   -->
...
                             <PRO68>lb-pagodeprestamo.wav</PRO68>    <!-- Adelanto parcialdeuna cuota          -->
                             <TCO103>sc-REVERSODEB.WAV</TCO103>      <!-- REVERSO DEBITO                       -->
                          </MOVTYPES>'
2009/06/10 19:31:24.265:        WARNING: Handle 0 still open.  Closing handle.
...
2009/06/10 19:31:24.265:     [7] XML Fetch
2009/06/10 19:31:24.265:        Handle is 'XML001'
2009/06/10 19:31:24.265:        Node '@vTag' evaluates to 'MOVTYPES->'
2009/06/10 19:31:24.265:        Attributes is ''
2009/06/10 19:31:24.265:        Save node value to '@vFileName'
2009/06/10 19:31:24.265:        Variable assignment scope: LOCAL
2009/06/10 19:31:24.265:        Handle 0: Retrieving node 'MOVTYPES->'
2009/06/10 19:31:24.265:        Fetching first node...
2009/06/10 19:31:24.265:  16F28C9: EXCEPTION_ACCESS_VIOLATION attempting to  read at location 0.
2009/06/10 19:31:24.265:  Exception occurred!  Thread exiting.
...


I can see that you've been affected by an old TeleFlow bug that remains to this day: you cannot access the very top node of an XML object.

Your MOVTYPES node doesn't have a value or any attributes to get, so the easiest way to work around this problem is to change your code to initialize your '@pElementCode' to the first known tag under the MOVTYPES node.

It is unfortunate that TeleFlow excepts in this way. Later in this same log, there are other types of errant behaviour related to this issue. TeleFlow did not get a chance to cleanup properly, and the subsequent Wait For Call Step has the same problem as in my earlier post. It cannot access the port again. I hope to see a fix to this XML issue in an upcoming version of TeleFlow, but for now, you will have to implement a work around as I suggested.


Back to top
 Profile WWW 
 
 Post subject: Re: TFServer crash! How to avoid and force a clean app startup
PostPosted: Thu Jun 11, 2009 2:21 pm 
Offline
Site Admin

Joined: Wed Dec 31, 1969 5:00 pm
Posts: 329
Location: Vancouver, BC
I recommend that you leave your application restarts to a maximum of 6 while you debug this issue. There could be more of these reference to the top-level node of XML, and possibly other reasons why an exception is thrown.


Back to top
 Profile WWW 
 
 Post subject: Re: TFServer crash! How to avoid and force a clean app startup
PostPosted: Thu Jun 11, 2009 2:50 pm 
Offline
User avatar

Joined: Fri Feb 06, 2004 11:05 am
Posts: 115
Location: Costa Rica
I see the issue: An XML Fetch for the "movement code" (type of transaction that must exist in a catalog) returns blank (no code).

I made no validation for this condition, so when the fc_ReadElementCatalog flowchart uses the catalog to look up a blank code, bingo!

Ok, the type of transaction should not be blank, that's an error on my part (well, from the webservices guys actually); I should check for the code not to be blank after the fetch, which is an error on my part, one that would require lots of steps everywhere considering the vast amount of XML fetches I perform throughout the application...

That's where a fix for this bug would help.

I will track the missing code so as to avoid it from happening again.


Back to top
 Profile  
 
 Post subject: Re: TFServer crash! How to avoid and force a clean app startup
PostPosted: Fri Jun 12, 2009 9:59 am 
Offline
User avatar

Joined: Fri Feb 06, 2004 11:05 am
Posts: 115
Location: Costa Rica
Just a quick update: We found the omission in the web service code and fixed it, so as to avoid the application exception from happening again. So, I expect a more stable run from now on.

I still expect a new build of TeleFlow Server Enterprise which recovers more gracefully from this bug; just let me know when it's available.

Thanks a lot for your assistance.


Back to top
 Profile  
 
 Post subject: Re: TFServer crash! How to avoid and force a clean app startup
PostPosted: Fri Jun 12, 2009 10:10 am 
Offline
Site Admin

Joined: Wed Dec 31, 1969 5:00 pm
Posts: 329
Location: Vancouver, BC
This bug will be fixed in the next general release of TeleFlow. There will be an announcement when it is available.


Back to top
 Profile WWW 
 
 Post subject: Re: TFServer crash! How to avoid and force a clean app startup
PostPosted: Thu Jul 16, 2009 10:09 am 
Offline
Site Admin

Joined: Wed Dec 31, 1969 5:00 pm
Posts: 329
Location: Vancouver, BC
The new version of TeleFlow has been released! The stable build is 2009.7.13. It is available from teleflow.org's download page: http://teleflow.org/download.html.


Back to top
 Profile WWW 
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 17 posts ] 

Board index : TeleFlow Forums : TeleFlow Server & Monitor


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Style by Midnight Phoenix & N.Design Studio
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.