I am attempting to upgrade our web site to new hardware. I am in a situation where everything works well for a while, then all my FoxWeb channels get stuck in a "Busy" state with 0 (zero) connections. All subsequent hits then hang in the browser, with no return.
Confused, I ran my hits through a diagnostic proxy server, HTTPTracer. (which I highly recommend!)
When I had three FoxWeb channels running and hit a single POST page with FireFox, I saw three hits within one second get sent to the server. The first two would be answered with:
Does this appear to be generated by FoxWeb, or the web server (Sun Java Web Server 7.0)?
It appears FireFox automatically resends requests in response to a 408 HTTP status. With three channels running, it would send three requests inside a second. The first two get the 408 back. The third hit never returns. Later, I tried it with eight channels, and with a single click, FireFox occupied all eight within two seconds.
When I do the same thing from Internet Explorer, I get the same result, but it does not resend the request, so only one channel gets locked up, and the browser actually shows the 408 error.
Looking at the server end of this transaction, I watched the status in FoxWeb Channel Status and had FoxWeb running as an application. Channel Status seemed to be aware of the activity, changing the status of channels to Busy/1 and then Busy/0 accordingly when the cascading hits rolled in.
However, the individual channel VFP windows never show any indication that they are being invoked. The title bar does not change from "FoxWeb1" (or 2, 3, etc.), and nothing appears in the window. Normally, a few dozen debug statements are printed with every hit. In fact, the first line of fw_enter.prg is a print statement. It appears that this lockup occurs before fw_enter.prg starts executing.
I can close all the FoxWeb application windows on the server one at a time, and the browser continues to hang, even after each has been closed multiple times, but each at different times. If I close all channels at once by right-clicking the Windows task bar and selecting "Close Group", then the browser immediately gets the standard "No Channels Active" message from FoxWeb. I get a similar result when running as a Service, except when it gets hung and I restart the service, the browser immediately shows a "FoxWeb Service Has Been Temporarily Paused" error.
I restarted the web server service (Sun Java Web Server) with no change in behavior.
I rebooted the server computer, and the problem persisted.
It appears to me that the Broker is getting the request and having some communications with the channels, but the channel is not quite starting on the request, or aborts early. The Broker remains connected to the browser, as it can send the "No Channels"/"Service Paused" message in the end.
The test page that I can reliably make this fail with happens to be the result of a POST form. Other pages work fine at the time...I can get 500 FoxWeb hits per minute under load testing with no errors. At least one other POST form works successfully at the same time this page is failing. This is code that has not changed recently, and in fact is running flawlessly on the old hardware, which is still our live server.
I just reduced the Script Timeout setting from 300 seconds to 15 seconds. This didn't change the problem, but at least the hung channels went away sooner...I believe 15 seconds to timeout to the browser, then another 15 seconds for the channel to reset.
...Now it gets even weirder...
I attempted to clarify the double-15 second timeout, when lo and behold, the page starts working again! This is after restarting FoxWeb, Sun Java Web Server, and rebooting the computer did not fix it. I tried a variety of hits at this point, and none fail.
Then I turn on our load test script, which keeps 3 FoxWeb channels busy and maintains about 2-6 Pending Requests in the Broker. This went well for quite a while. Then I hit the POST form again, and the site is foobar again.
FWIW, when we first deployed the live site to the new hardware, it ran over 8 hours overnight without a lockup. It would only last 30-60 minutes of prime-time live traffic before locking up. After the third lock up that first morning, we rolled back to the old hardware, clueless as to what was going on. Most of the information in this trouble ticket I only learned tonight.
I've been working on this for a few weeks and have just about run out of ideas. Can you offer any insight?
Thanks in advance,
Internet Database Administrator
Kentucky Educational Television
(859)258-7164 - (800)333-9764