TeleFlow Forums http://teleflow.org/phpbb/ |
|
TFServer crash! How to avoid and force a clean app startup http://teleflow.org/phpbb/viewtopic.php?f=11&t=915 |
Page 1 of 1 |
Author: | fesquivel [ Mon Jun 08, 2009 1:58 pm ] | ||
Post subject: | TFServer crash! How to avoid and force a clean app startup | ||
I just noticed 4 channels down when observing TFMonitor (screenshot attached). I reviewed the last messages in one of them: Quote: 2009/06/08 16:13:40.421: Application restart 6980 2009/06/08 16:13:40.421: 370E71: EXCEPTION_ACCESS_VIOLATION attempting to write at location 3A313335. 2009/06/08 16:13:40.421: Exception occurred! Thread exiting. 2009/06/08 16:13:43.421: 2009/06/08 16:13:43.421: Application restart 6981 2009/06/08 16:13:43.421: 370E71: EXCEPTION_ACCESS_VIOLATION attempting to write at location 3A313335. 2009/06/08 16:13:43.421: Exception occurred! Thread exiting. 2009/06/08 16:13:46.421: 2009/06/08 16:13:46.421: Application restart 6982 2009/06/08 16:13:46.421: 370E71: EXCEPTION_ACCESS_VIOLATION attempting to write at location 3A313335. 2009/06/08 16:13:46.421: Exception occurred! Thread exiting. I know these messages are not very informative, but I'm not sure where to look for more detailed information about the exception or how to reproduce it. I tried restarting the failed channels with TFMonitor, by right-clicking on them and selecting "Reload Abort", but then TFMonitor lost its connection to TFServer... actually, TFServer just crashed and I had to restart the service. Can you help me diagnose this and suggest ways to avoid it?
|
Author: | Tim [ Mon Jun 08, 2009 3:19 pm ] |
Post subject: | Re: TFServer crash! How to avoid and force a clean app startup |
Exceptions of this sort usually indicate a problem with TFServer itself. I would like to know what build of TFServer you are on. You can get the build number from the top of the TFServer log file. It will be in your TeleFlow log folder, and named similar to the following example: TFServer_2009-06-08.log The build number will be at the top of the file. The most recent release build of TFServer is 2009.4.8. Also, if you could find the point in the log where the exception first occurred, that would be very helpful. I can tell by your application restart count that your log files will be huge. It could take you a while to find it! |
Author: | fesquivel [ Tue Jun 09, 2009 3:09 pm ] |
Post subject: | Re: TFServer crash! How to avoid and force a clean app startup |
The log says: Engenic TeleFlow: Build 2009.3.19 (c) 1994 - 2008 Engenic All rights reserved. Where can I download the latest stable build? I have 5.79 GB taken by log files, stored in a NTFS-compressed folder (consuming just 1.67 GB of disk space), on a data partition with 46.9 GB still free... I hope that will suffice for some time As you may guess it will take me some time to digg into these files to search for the exception... |
Author: | Tim [ Tue Jun 09, 2009 5:11 pm ] |
Post subject: | Re: TFServer crash! How to avoid and force a clean app startup |
Where you get the new version depends on what edition you have: Open (formerly Hype), Standard or Enterprise. There's no way to tell from the log files. You have to know what you need based on what features you are using. I can see from your TFMonitor screen-shot that you are using faxing, and therefore you have Enterprise. Enterprise updates are usually arranged through your representative here. I will send you a private message with more information. For the benefit of others who may be reading this thread, you can also send an email to sales{at}engenic.com to contact your representative. We have not fix a lot of these types of errors from 2009.3.19 to the current version (2009.4.. I honestly don't think it will help. We've rolled out a new Beta more recently than that. I'd like to give you that instead. On to the log issue. I have a suggestion to make things easier. Try this: 1. Stop TFServer 2. Delete all the logs in your log folder 3. Run TeleFlow LineList 4. For your first few inbound ports, set "Restart Count" to 6 and un-check the "No limit" checkbox 5. Restart TFServer This will limit the number or restarts to 6. You will want to watch the system carefully, because once an application has restarted for 6th time, that port will stop answering the phone all together. You will see a red light next to it, instead of yellow. The benefit to this method is that the log files will be limited in how large they can get, and you should be able to find the original error easily. They would also be small enough that could probably email them to us. Send to support{at}engenic.com. After you have the logs you need, you can reverse the changes you made to your TF LineList. |
Author: | fesquivel [ Wed Jun 10, 2009 8:03 am ] |
Post subject: | Re: TFServer crash! How to avoid and force a clean app startup |
Yes, it's the Enterprise release. I'll perform the upgrade and the temporary setup you suggest and if it happens again, I'll forward what I find in the logs. Thanks a lot! |
Author: | fesquivel [ Wed Jun 10, 2009 9:19 am ] |
Post subject: | Re: TFServer crash! How to avoid and force a clean app startup |
Hey! I just entered the IVR's console to perform the upgrade and noticed that channel 1 was stalled with the same exception. I was able to capture its log (via right-click / Show Messages) and noticed this: Code: 2009/06/10 10:51:59.703: fc_SayAmount<==fc_ReadElementCatalog 2009/06/10 10:51:59.703: [46] Connect Point 2009/06/10 10:51:59.703: [2] FINISH 2009/06/10 10:51:59.703: fc_CreditCardMenu<==fc_SayAmount 2009/06/10 10:51:59.703: [663] Compare 2009/06/10 10:51:59.703: First Value '@vCreditAccounts' evaluates to '2' 2009/06/10 10:51:59.703: Second Value is '2' 2009/06/10 10:51:59.703: @vCreditAccounts = 2? 2009/06/10 10:51:59.703: (integer):2 = (integer):2? 2009/06/10 10:51:59.703: TRUE 2009/06/10 10:51:59.703: [667] Play 2009/06/10 10:51:59.703: Script code is 'LB-O' 2009/06/10 10:51:59.703: Script code 'LB-O' corresponds to file 'Vox\Spn\Prompts\o.WAV' 2009/06/10 10:51:59.703: Playing 'E:\Applications\IVR_RD\Vox\Spn\Prompts\o.WAV' 2009/06/10 10:51:59.703: ERROR: Port not off hook 2009/06/10 10:51:59.703: FATAL 2009/06/10 10:51:59.703: Attempting to jump to Fatal Error Global Event label: '' 2009/06/10 10:51:59.703: Fatal label undefined! 2009/06/10 10:51:59.703: fc_IVR_MenuManager<==fc_CreditCardMenu 2009/06/10 10:51:59.703: Setting local variable '@VCLIENTFAX' to '0' 2009/06/10 10:51:59.703: Setting local variable '@VRESULTPROC' to 'OK' 2009/06/10 10:51:59.703: fc_MainApp<==fc_IVR_MenuManager 2009/06/10 10:51:59.703: Setting local variable '@RESULTPROC' to '' 2009/06/10 10:51:59.703: fc_MainApp<==fc_MainApp 2009/06/10 10:51:59.703: fc_AccountMenu<==fc_MainApp 2009/06/10 10:51:59.703: 16F42E2: EXCEPTION_ACCESS_VIOLATION attempting to read at location 3A303031. 2009/06/10 10:51:59.703: Exception occurred! Thread exiting. 2009/06/10 10:52:02.703: 2009/06/10 10:52:02.703: Application restart 31 2009/06/10 10:52:02.703: 370E71: EXCEPTION_ACCESS_VIOLATION attempting to write at location 3A303035. 2009/06/10 10:52:02.703: Exception occurred! Thread exiting. 2009/06/10 10:52:05.703: Analysis:
I will send the full log along with screenshots of the flowchart where you can see the Global Event defined and its corresponding label, but my guess is that at some point a buffer overflow happened while in fc_CreditCardMenu that wiped the address of the Fatal Error handler and messed up with the application's stack to the point of entering to another submenu (fc_AccountMenu) from the main flowchart, something completely wrong! Obviously I cannot give you the full source code but I assure you I've tried to program this as fail-safe as possible: Every flowchart has its own Hangup & Fatal error handlers, I try to avoid loops or "spaguettis" when calling flowcharts and getting out of them in an orderly and clean fashion, communicating results or status to the calling flowchart in order to take appropiate actions as hanging up and waiting for more calls, etc. I will still proceed with the upgrade and monitoring of the new build as soon as my customer gives explicit permission to do so and report back any issues on this thread. Thanks in advance for your help. |
Author: | Tim [ Wed Jun 10, 2009 10:17 am ] |
Post subject: | Re: TFServer crash! How to avoid and force a clean app startup |
Quote: I will send the full log along with screenshots of the flowchart where you can see the Global Event defined and its corresponding label... Please zip these files and email to support{at}engenic.com If you include in the message or subject line, "at Tim's request". That will let the others in the support group know this one is for me. |
Author: | Tim [ Wed Jun 10, 2009 11:41 am ] |
Post subject: | Re: TFServer crash! How to avoid and force a clean app startup |
Files received. Thanks. This log is interesting, but I don't think it really tells the whole story. I started at the top, auditing the path of the fatal error handler. This is the first time an error appears in the log, but at this point, the application is already on its 29th restart: Code: ... 2009/06/10 10:51:05.093: fc_MainApp<==fc_Logger 2009/06/10 10:51:05.093: [69] Wait for Call 2009/06/10 10:51:05.093: TIMEOUT is '0' 2009/06/10 10:51:05.093: WARNING: TIMEOUT '0' evaluates to 0 and is interpreted as no timeout 2009/06/10 10:51:05.093: Rings is '0' 2009/06/10 10:51:05.093: Connection Delay is '500' 2009/06/10 10:51:05.093: Wait for 0 rings and then pickup 2009/06/10 10:51:05.093: ERROR: Port already in use 2009/06/10 10:51:05.093: FATAL 2009/06/10 10:51:05.093: Attempting to jump to Fatal Error Global Event label: 'FINISH_ERR' 2009/06/10 10:51:05.093: Setting system variable '@FATAL' to '' 2009/06/10 10:51:05.093: [67] Label 2009/06/10 10:51:05.093: Label: FINISH_ERR ... It concerns me that Wait for Call could not access the telephony port, and wondering how it got to that state. The app dutifully jumps to the error handler. Notice that after TeleFlow finds your error handler, it clears the @FATAL setting. This prevents a subsequent application error from calling the same error handler recursively. At the end of this log, before the access violations start, the warning "Fatal label undefined!" could be an indication that the error handler had been cleared. Ordinarily, this is not a problem, but a catch-all to ensure that an un-handled error winds its way back to the final exit. The larger concern is why that un-winding got confused. Returning to the log example above, the label 'FINISH_ERR' runs some clean-up code, and then calls the flowchart 'fc_Exit' is called, which sets new a error handler: Code: ... 2009/06/10 10:51:05.093: fc_MainApp==>fc_Exit 2009/06/10 10:51:05.093: [1] START 2009/06/10 10:51:05.093: [8] Global Event 2009/06/10 10:51:05.093: Label is 'EXIT_END' 2009/06/10 10:51:05.093: Setting system variable '@FATAL' to 'EXIT_END' 2009/06/10 10:51:05.093: [11] Global Event 2009/06/10 10:51:05.093: Label is 'EXIT_END' 2009/06/10 10:51:05.093: Setting system variable '@HANGUP' to 'EXIT_END' ... This is legitimate code, and will only cause you problems if 'EXIT_END' points back into the main application flow. Our recommended best practice is to ensure all Fatal errors find their way to the exit of the application. Based on the name of your label, I suspect all is well here. Your exit code eventually gets to this point: Code: ... 2009/06/10 10:51:05.093: [13] Compare 2009/06/10 10:51:05.093: First Value '@HOOK_STATUS' evaluates to 'OFF-HOOK' 2009/06/10 10:51:05.093: Second Value is 'OFF-HOOK' 2009/06/10 10:51:05.093: @HOOK_STATUS = OFF-HOOK? 2009/06/10 10:51:05.093: (string):OFF-HOOK = (string):OFF-HOOK? 2009/06/10 10:51:05.093: TRUE 2009/06/10 10:51:05.093: [3] Play 2009/06/10 10:51:05.093: Script code is 'ER-DificulTec' 2009/06/10 10:51:05.093: Script code 'ER-DificulTec' corresponds to file 'vox\spn\scripts\er-dificultec.WAV' 2009/06/10 10:51:05.093: Playing 'E:\Applications\IVR_RD\vox\spn\scripts\er-dificultec.WAV' 2009/06/10 10:51:05.109: ERROR: QueryDigit::adiPeekDigit failed 2009/06/10 10:51:05.109: (5) CTAERR_INVALID_CTAHD 2009/06/10 10:51:05.109: FAILURE 2009/06/10 10:51:05.109: Hanging up 2009/06/10 10:51:08.109: 2009/06/10 10:51:08.109: Application restart 30 ... This chunk of code is very concerning. @HOOK_STATUS should not be 'OFF-HOOK' at this point, because Wait For Call never got to process a call. The subsequent Play gets an error because none of the Port setup code got to execute. This is essentially a continuation of the first error above in the Wait For Call step. The line that reads "Hanging up" is an indication that TeleFlow is trying to close the port that it never got to open, and then the app just unceremoniously exits. This, I think, is the source of the later access violation exceptions. At minimum, its a clue. What we don't know is just how the port got into this state in the first place. To answer that, I need to see the first few application crashes. Please follow my earlier suggestion to capture the first 6 failures. Hopefully that will shed more light on this situation. |
Author: | fesquivel [ Wed Jun 10, 2009 3:44 pm ] |
Post subject: | Re: TFServer crash! How to avoid and force a clean app startup |
Well, it's ready according to your instructions. I'll be monitoring and report back as soon as I have any crash. |
Author: | fesquivel [ Thu Jun 11, 2009 10:52 am ] |
Post subject: | Re: TFServer crash! How to avoid and force a clean app startup |
I just found channel 1 stalled after the sixth restart; so I captured the log file and sent them by e-mail. You can see all the events prior to the first restart. I'd like to know if I'm doing something wrong in my code, so as to fix it A.S.A.P. Thanks for your help. |
Author: | Tim [ Thu Jun 11, 2009 12:05 pm ] |
Post subject: | Re: TFServer crash! How to avoid and force a clean app startup |
I've received the log files. I'm reviewing them now. |
Author: | Tim [ Thu Jun 11, 2009 1:32 pm ] |
Post subject: | Re: TFServer crash! How to avoid and force a clean app startup |
There may be more than one issue in these logs for all I know, but this is the first one I found cronologically in the two logs you sent me. I've clipped some of the log output that is not relevant to this discussion and bolded the lines that have a problem. Code: ... 2009/06/10 19:31:24.265: [32] Set Local Variable 2009/06/10 19:31:24.265: Setting local variable '@VCATALOGO' to '<?xml version="1.0" encoding="utf-8" ?> <MOVTYPES> <CCC12>lb-reverso.wav</CCC12> <!-- REVERSIONdeDEBITOS --> <CCC21>lb-deposito.wav</CCC21> <!-- DEPOSITO A CUENTAS --> ... <PRO68>lb-pagodeprestamo.wav</PRO68> <!-- Adelanto parcialdeuna cuota --> <TCO103>sc-REVERSODEB.WAV</TCO103> <!-- REVERSO DEBITO --> </MOVTYPES>' 2009/06/10 19:31:24.265: [42] Math 2009/06/10 19:31:24.265: Base value is 'MOVTYPES->' 2009/06/10 19:31:24.265: Modifying value '@pElementCode' evaluates to '' 2009/06/10 19:31:24.265: @vTag = MOVTYPES-> CAT @pElementCode 2009/06/10 19:31:24.265: @vTag = MOVTYPES-> CAT 2009/06/10 19:31:24.265: Setting local variable '@VTAG' to 'MOVTYPES->' 2009/06/10 19:31:24.265: [38] Connect Point 2009/06/10 19:31:24.265: [3] XML Load 2009/06/10 19:31:24.265: Handle is 'XML001' 2009/06/10 19:31:24.265: Variable / file to load '@vCatalogo' evaluates to '<?xml version="1.0" encoding="utf-8" ?> <MOVTYPES> <CCC12>lb-reverso.wav</CCC12> <!-- REVERSIONdeDEBITOS --> <CCC21>lb-deposito.wav</CCC21> <!-- DEPOSITO A CUENTAS --> ... <PRO68>lb-pagodeprestamo.wav</PRO68> <!-- Adelanto parcialdeuna cuota --> <TCO103>sc-REVERSODEB.WAV</TCO103> <!-- REVERSO DEBITO --> </MOVTYPES>' 2009/06/10 19:31:24.265: WARNING: Handle 0 still open. Closing handle. ... 2009/06/10 19:31:24.265: [7] XML Fetch 2009/06/10 19:31:24.265: Handle is 'XML001' 2009/06/10 19:31:24.265: Node '@vTag' evaluates to 'MOVTYPES->' 2009/06/10 19:31:24.265: Attributes is '' 2009/06/10 19:31:24.265: Save node value to '@vFileName' 2009/06/10 19:31:24.265: Variable assignment scope: LOCAL 2009/06/10 19:31:24.265: Handle 0: Retrieving node 'MOVTYPES->' 2009/06/10 19:31:24.265: Fetching first node... 2009/06/10 19:31:24.265: 16F28C9: EXCEPTION_ACCESS_VIOLATION attempting to read at location 0. 2009/06/10 19:31:24.265: Exception occurred! Thread exiting. ... I can see that you've been affected by an old TeleFlow bug that remains to this day: you cannot access the very top node of an XML object. Your MOVTYPES node doesn't have a value or any attributes to get, so the easiest way to work around this problem is to change your code to initialize your '@pElementCode' to the first known tag under the MOVTYPES node. It is unfortunate that TeleFlow excepts in this way. Later in this same log, there are other types of errant behaviour related to this issue. TeleFlow did not get a chance to cleanup properly, and the subsequent Wait For Call Step has the same problem as in my earlier post. It cannot access the port again. I hope to see a fix to this XML issue in an upcoming version of TeleFlow, but for now, you will have to implement a work around as I suggested. |
Author: | Tim [ Thu Jun 11, 2009 2:21 pm ] |
Post subject: | Re: TFServer crash! How to avoid and force a clean app startup |
I recommend that you leave your application restarts to a maximum of 6 while you debug this issue. There could be more of these reference to the top-level node of XML, and possibly other reasons why an exception is thrown. |
Author: | fesquivel [ Thu Jun 11, 2009 2:50 pm ] |
Post subject: | Re: TFServer crash! How to avoid and force a clean app startup |
I see the issue: An XML Fetch for the "movement code" (type of transaction that must exist in a catalog) returns blank (no code). I made no validation for this condition, so when the fc_ReadElementCatalog flowchart uses the catalog to look up a blank code, bingo! Ok, the type of transaction should not be blank, that's an error on my part (well, from the webservices guys actually); I should check for the code not to be blank after the fetch, which is an error on my part, one that would require lots of steps everywhere considering the vast amount of XML fetches I perform throughout the application... That's where a fix for this bug would help. I will track the missing code so as to avoid it from happening again. |
Author: | fesquivel [ Fri Jun 12, 2009 9:59 am ] |
Post subject: | Re: TFServer crash! How to avoid and force a clean app startup |
Just a quick update: We found the omission in the web service code and fixed it, so as to avoid the application exception from happening again. So, I expect a more stable run from now on. I still expect a new build of TeleFlow Server Enterprise which recovers more gracefully from this bug; just let me know when it's available. Thanks a lot for your assistance. |
Author: | Tim [ Fri Jun 12, 2009 10:10 am ] |
Post subject: | Re: TFServer crash! How to avoid and force a clean app startup |
This bug will be fixed in the next general release of TeleFlow. There will be an announcement when it is available. |
Author: | Tim [ Thu Jul 16, 2009 10:09 am ] |
Post subject: | Re: TFServer crash! How to avoid and force a clean app startup |
The new version of TeleFlow has been released! The stable build is 2009.7.13. It is available from teleflow.org's download page: http://teleflow.org/download.html. |
Page 1 of 1 | All times are UTC - 8 hours [ DST ] |
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/ |