Hydaelyn Role-Players
Recent forum error pages - Printable Version

+- Hydaelyn Role-Players (https://ffxiv-roleplayers.com/mybb18)
+-- Forum: Final Fantasy 14 (https://ffxiv-roleplayers.com/mybb18/forumdisplay.php?fid=41)
+--- Forum: FFXIV News (https://ffxiv-roleplayers.com/mybb18/forumdisplay.php?fid=9)
+--- Thread: Recent forum error pages (/showthread.php?tid=8873)

Pages: 1 2


Recent forum error pages - FreelanceWizard - 11-04-2014

Some of you have, I'm sure, noticed the seemingly random error pages you've been getting on the forums over the last couple of weeks. This isn't a server performance issue, but rather a PHP incompatibility buried deep in our MyBB installation. I'm currently trying to debug the issue, which is challenging because it's seemingly random. PHP crashes in the same place each time, but what's provoking the crash isn't something I've been able to repeat.

In the meantime, I'm tweaking our server to "hide" the issue by forcing it to retry requests that fail, since that generally causes the error to go away (remember, it's random Smile ). This tweaking is a bit delicate and might result in some weird errors (like the "out of memory" ones yesterday night Tongue ). I'll update this thread with additional information as I get it. Meanwhile, if you can repeatedly produce an nginx error page, I'd love to know what you're doing to do it. Just drop me a PM with the details.

Thanks as always for your patience!


RE: Recent forum error pages - FreelanceWizard - 11-06-2014

I believe I've isolated the issue and corrected it; we're not getting tons of segmentation faults now. Smile Without going into all the brutal details, it turns out the reputation system, of all things, does some weird stuff with shared memory that I had to tweak some settings to handle now that people's reputation numbers are getting into triple digits.

I'm going to do a bit more tweaking on our FastCGI pools to undo the things I did for debugging, and I'll of course continue to monitor the situation. For now, though, I think we're back to normal.


RE: Recent forum error pages - Tiergan - 11-06-2014

Glad to hear that things are okay now! I was really starting to worry.


RE: Recent forum error pages - Unnamed Mercenary - 11-06-2014

Interesting.

If it's not a security hole, could we get the brutal details? There's a fondness for seeing the explanation for these things, even if the only language I can program effectively in is Java.


RE: Recent forum error pages - FreelanceWizard - 11-06-2014

Sure. Smile Gory technical details ahoy:

Basically, the post reputation system tries to compute the reputations of everyone it sees on a page, and it does this not by loading a counter from the database, but by pulling in the whole database table for reputation entries and running a computation on that. Since I loosened up the reputation system, this table has grown, and it was exceeding the APC (the opcode/data cache) shared memory limit. PHP's response to this was to throw a fatal error and crash with a segmentation fault (because in the world of PHP, any fatal error is a segmentation fault). It was seemingly random because the reputation code is called in a lot of different places, some rather unexpected, and you could actually stuff the table into APC if not a lot of other things had already been put in there (like, say, user sessions).

I was only able to work this out by tweaking the FastCGI process manager and nginx to try to capture those errors and the core dumps. The solution was to increase PHP's maximum memory and the size of the shared memory space for APC. Merely turning off APC wasn't sufficient (since the board was grabbing the table more than once and blowing itself up), nor was just turning up the maximum script memory.

Now, in terms of how I'd do it if I were writing the code, I'd do one of two things, depending on how my database server behaved. Keeping the reputation entries in a table is a fine idea, and if the DB server can do quick computations on query (as, say, MS SQL Server and Oracle can -- not that anything in Oracle is "quick" per se Smile ), I'd just have it do the aggregation and return a single number to me, something like:

select count(postrep.*) from users
inner join postrep on users.user_id = postrep.user_id
group by postrep.user_id
where postrep.user_id = @user

I could then kick that over to the server using ExecuteScalar or the equivalent and quickly get an answer back. For getting multiple reputation counts at once, I might instead put postrep.user_id in the select statement and filter the results in the script. LINQ, for instance, makes that super-easy, depending on whether you want to use the Join, Intersect, or Where operators.

For something like MySQL, where aggregations at the database aren't fast, or a NoSQL/in-memory store where you really can't do any aggregations, I'd create a field in the users table to hold the current post reputation count for each user and increment it whenever someone got a new reputation bump. Then, I'd just pull the reputation counts out of the user table.

Of course, this being MyBB, I can't really do it how I'd want to and still be able to upgrade the code, so I'm stuck with watching logs and debugging when weird stuff happens. Smile


RE: Recent forum error pages - FreelanceWizard - 11-07-2014

And to follow up...

The errors and extreme slowness we experienced this afternoon were due to a MySQL thing I've fixed. (As they say in the Lean world, every time you lower the water level, you end up hitting some new rocks. Smile )

Show Content



RE: Recent forum error pages - Blue - 12-06-2014

I seem unable to access the wiki in any shape or form at the moment. Does that have anything to do with these errors?


RE: Recent forum error pages - FreelanceWizard - 12-06-2014

It's likely related to our switch to https. What browser and OS are you using?


RE: Recent forum error pages - Blue - 12-06-2014

(12-06-2014, 07:18 PM)FreelanceWizard Wrote: It's likely related to our switch to https. What browser and OS are you using?

Opera 12.17 , Windows 7

I noticed that if I copy and paste the wiki address on google chrome, I can browse it, but if I try to copy a link to a picture in my wiki (on Chrome) and paste it into opera, it won't display it.


RE: Recent forum error pages - FreelanceWizard - 12-07-2014

Yep, it's related to the TLS cipher issue I discussed on the suggestions thread. I'm currently collecting a list of ciphers to use and will be updating the settings in short order, now that I'm back in town and have the time to do that. Smile


RE: Recent forum error pages - FreelanceWizard - 12-07-2014

Give it a go now and let me know if works or not. The server should now fall back to a cipher suite that's compatible with Opera 12 and older versions of Android and IE while simultaneously offering super-high security on the latest versions of IE, Chrome, and Firefox.


RE: Recent forum error pages - Blue - 12-11-2014

Yup, it's doing great now, thanks!


RE: Recent forum error pages - Warren Castille - 12-11-2014

Not sure if it's just me, but I keep getting this to pop up randomly today when I try to click things.

Quote:MyBB has experienced an internal SQL error and cannot continue.

SQL Error:
2002 - php_network_getaddresses: getaddrinfo failed: Name or service not known
Query:
[READ] Unable to connect to MySQL server



RE: Recent forum error pages - Atoli - 12-11-2014

Just got that as well, but it seems to have gone away


RE: Recent forum error pages - FreelanceWizard - 12-11-2014

Yeah, that's an AWS SQL error. I don't see anywhere where our RDS instance has fallen over, but random RDS network outages aren't unknown over in the AWS world.