Antidote du jour (and 9/19/09 Links to come)

Dear patient readers, this is a bit of sentence first, verdict afterwards, but it is 6:00 AM and I have spent all day and all night dealing with copy edits and am still behind the eight ball and need to sleep too, or my productivity will go from poor to non-existent.

So you get an antidote now, and if you check back later, I will fill in some links for your delectation, and hopefully at least a wee post too.

Sorry about this, I feel bad about neglecting the blog, particularly after the technical difficulties of last week, but the WordPress problems put me further behind schedule. And to be honest, they weren’t just WP.

A brief recap: even though WP has a good reputation in the field, few people run it on blogs with this level of traffic. Even when optimized, it takes more resources than the vast majority of webhosts will tolerate, their marketing claims to the contrary. Get a spike and your bandwidth or database use will get you shut down.

Now this is something that never occurred to me. I ran on Blogger for free. Blogger admittedly has less functionality, but if Blogger can handle my traffic for nothing, it is quite surprising that WP is so badly designed (which it is as far as the way it uses database resources is concerned) that a high traffic blog using it overloads a webhost.

We kept trying to troubleshoot to buffer the spikes, but got shut down at my initial webhost. The tech guy who was doing the implementation put it on his server, in part as an interim solution, in part so he could monitor usage better and figure out options.

Things were fine for a week, he left town, the blog went down, and it was not clear that he would get my message or be able to resolve the problem remotely (and he was not returning for 3 days).

He does restore the blog after about 18 hours. But separately, I have a comments problem due to some rogue hidden characters in a single post, and that somehow messed up comments, broke my links, and interfered with my RSS and Atom feeds. A very helpful reader spent HOURS isolating the problem. A sample of his correspondence:

This has been the most bizarre bug to date.

I found non-printing characters in two places in the text from the quoted article.

That said, there are clearly other non-printing, which I cannot locate, contained within the post and cannot get to display. I even pulled in the paste into bitmaps (binary type of files) trying to display whatever is hidden there, couldn’t dig it out.

Attached is what I managed to get to display and work with the EP RSS feeds.

I pulled out entire paragraphs from the quoted article to isolate these special characters, but I couldn’t find out what they are explicitly.

I checked for newlines, tabs, all of the XHTML is correct, etc. I even put it into Word with “display all formatting tags, outline mode” and Word only could catch two of these non-printing chars contained within this text. But freaky non-printing chars are like roaches, you found two, you know there are more! That is just too freaky because I do copy and paste in almost every single post!

Below is what I managed to parse to get to display with the RSS feed and not give a XML parsing error: (I also removed all HTML special characters when I was checking this so you’ll have to model this from your own post vs. copy the below).

What I would suggest doing is either try limiting the article quote to the below or retyping the actual text by hand. Or plain don’t quote them hardly at all (bastards
with weird stuff in their copy!)

I also had a guest post with an ampersand, and that is a forbidden character in XML, so that messed up my feed.

You can infer what my week was like.

And if you think other people who have a lot of traffic have sorted out WP, I would not be so certain. I think they just compensate better than I have managed to thus far. For instance, I pinged Barry Ritholtz while this was afoot to find out who his webhost was. He had got shut down by his host over the weekend, and like me, was running on a temporary basis on his techie’s server. And I infer he has had a fair number of WP problems and feels it was misrepresented to him.

Antidote du jour:

Unknown-6

Print Friendly, PDF & Email

28 comments

  1. Ina Pickle

    Great antidote, at least for me, as the cat in question looks just like one I loved dearly as a child – right doen to the brown spot below the nose.

    Good luck with the copy edits! And the sleep.

  2. Marc

    Sorry about the ampersand, that was me. I wasn’t aware of the problem. I hate to have caused you grief. I’ll try to remember it for the future.

  3. Gerry

    Don’t worry about it, this is a free site and to my mind all free sites or on a best efforts basis. It’s the weekend, get out and enjoy the end of summer and everyone can find their own links for this one day.

  4. tim

    it’s a hosting thing, not a wordpress thing.
    the ampersand issue i don’t think is “accurate”.
    sounds like you need the weekend off to recharge!

  5. David

    I agree w/ Tim about this sounding like a hosting issue. WP may be a resource pig but it can be scaled…Techcrunch still runs on WP. You might try switching to a virtual dedicated server from their host, MediaTemple (BTW, I’ve also had excellent experience w/ MT). There’s also a plugin or app (sorry I’m not an expert on WP) for high traffic WP sites that caches much of the content to improve performance and reduce DB hits…if you’re not using that already, probably worth a look. BTW, love the site and the antidotes…so don’t sweat the downtime and hang in there!

  6. LeeAnne

    Luv the pink highlighter. Sorry about your WP hassle Yves and relieved to learn a few days ago you weren’t out with the flu. Thank you so much for keeping us up to date.

    I had just begun setting up WP and being technically challenged ran into problems immediately. Its good to know I can just drop it with nothing lost and go back to my Blogger challenges.

  7. jimmy james

    I don’t know how cats get comfortable sometimes. An answering machine as a pillow? Well, whatever works, kitty.

    Good luck on the problems, Yves.

  8. lark

    I too am skeptical about the character issue.

    Too bad you don’t have access to a linux system, it is easier to look at the actual text, free of file formats and multibyte character handling. Consider getting a linux person to help you.

  9. David Merkel

    Yves, I run on WordPress. Now, granted, as blogs go, you are roughly ten times my size. I’m not sure that WordPress itself is the difficulty — I suspect it is your hosting service.

    I use Netfirms, and though they have not been trouble free for me, they have been pretty good. I don’t use a bare bones package for Alephblog, but paid up for something that an intermediate-sized business would use.

    Just my thoughts. I’m my own tech guy, so I deal with the WP issues myself. Sincerely, David

  10. K.

    With all due respect Yves, it’s free software, and if you find some part of the functionality lacking then you are _obliged_ to modify the code and post the fixes back to WP. This is the “price” of the free software. Given that you are a financial analyst and not a programmer, perhaps a professional paid CMS would be better suited to your needs?

  11. emca

    I came across this related observation from a Hank Williams:

    “Venture capital has totally distorted the market. VCs are investing billions of dollars in companies with instructions to get big fast and to worry about advertising revenue later. As a result the competition is for users and not paying customers.”

    http://whydoeseverythingsuck.com/2008/04/free-is-killing-us-blame-vcs.html

    Although dated from last year, this post is relevant to yours and the general WordPress situation. It appears the best source of revenue on the Internet is to become a household name through offers of free “stuff”, something of a middle to late 90’s redux or an ideal that just wasn’t tanked in glorious inequity with the dot com bust.

    This leaves me to question, “Is Finance really the source of all evil?” or if you want to find the source of blame, follow the money.

  12. Robert Oak

    Firstly, your webhost should have certain features, or find another one. It should have enough memory and disk space for the site as well as fast enough mySQL servers to handle the site. It should have unlimited bandwidth, or “transfer”. So if you get a hit spike, that would be no problem with the webhost for they do not limit bandwidth, including over a period of time (i.e. hit spikes). Then, both NC and BP are big enough they should consider using mirrors. These are “synced up” sites (servers) which have real time exact duplicates of the content but in different locations. So, if one server crashes or something happens, the mirrored server is there and your site is still online.

    You should have control over php.ini and make sure your application does not eat up too much memory past the account limits OR use up too much CPU. You must check mySQL queries to make sure those are optimized and not taking up too much CPU or are too slow.

    Finally, a good webhost should give warning if they are shutting down a site and it should only be something like a DDoS attack. They should notify you or have at least 24/7 phone support in case of a shut down so you can quickly fix the problem.

    But if they are limiting bandwidth, it is time to find another hosting service.

    WordPress is good, it can scale but like any open source distribution, you must have competent technical person (people) behind it to understand all of these issues.

    This is way beyond modifying WordPress itself, one must do all sorts of things (listed above) to scale a website to handle large traffic.

    Google has technical teams behind blogger who fix all of these technical issues and as a result you have little control over the blog as well as options by using Google. But you are getting all of this technical stuff for free (how nice of ’em!)

    But it can be done, WP can scale. Just today for example. Intuit bought mint.com for a heavy chunk of millions. A free “personal finances” sites that is done in WordPress.

  13. Average John

    You are probably already aware of this, but maybe you need to validate your xml post-backs and your content against an xsd schema setting on the server before program processing? If an error is thrown during validation you can at least trap it and respond with a user friendly error mesaage.

    There are also simple pre-built javascript validation snippets on the client side for escaping illegal characters and mark-up before form submission, but a hacker can alter the script if he or she is intent on attacking your site?

  14. Robert Oak

    Firstly, if you’re trying to help NC with technical stuff, it needs to be WordPress specific. So for example, this plugin, validated, might do the job
    http://wordpress.org/extend/plugins/validated/

    If you just say “some script” that isn’t going to help a finance expert and economics blogger, who has been busy digging around in toxic assets, therefore does not have a PhD in computer science….to figure out which javascript to add. ;)

    But on this particular bug, the hidden chars were in another article, which was a copy & paste excerpt. It passed the XHTML validation that goes into the post creation not only on WP, but also in other corrector and validation scripts. So, this particular weirdness flew right by the existing validation schemes when creating content.

    Very strange indeed and implies that major press site isn’t doing a very good job with their own “clean up crew” to have it even posted in a published article.

    The other was a weird fluke, how many people use an & in their name? Minor bug, although WP should check that and modify for HTML special characters frankly (bad WP….bad WP)….

  15. K Ackermann

    I am amazed anyone who offered database hosting services would put constraints on scale. Some of the virtualization services charge pennies per hour for hosting.

    Some division of Amazon is starting to do this, and I heard their rate is 7 cents an hour for some ridiculously large amount of storage. They provide a virtual machine with various resources such as a DB, but your tech guy has to configure.

  16. David R

    WordPress can scale, wordpress can’t scale, it’s your host, it’s your tech guy, there were problem characters, you should use linux, you should use Joomla (!).

    Yves, it’s funny that you get the same range of opinions RE: your blog’s technical problems that one finds in the economics press… some really good, some laughably bad. Unfortunately, it’s just as hard to figure out who is for real, and going to give you good advice.

    My 2 cents, if it’s not clear yet: It’s the combination, WordPress + High-traffic blog + a SHARED web host. Feel free to ping me if you find you still need help, not that you should trust me any more than the rest of these folks. (Sounds like I should have said something earlier, if you’ve got someone going through your posts trying to manually find nonprinting characters. Not a good sign…)

  17. Mario

    WordPress is not the problem. With all due respect, I don’t think you have the traffic of Techcrunch or other heavy traffic sites running on WP. Try using a plugin like WP-Supercache, it can manage Digg-like traffic.

  18. Yves Smith Post author

    To address the comments briefly:

    1. I have implemented Hypercache, which I am told is better than Supercache. Even with Hypercache, I got shut down by my webhost almost immediately for going over my database resource allotment.

    2. My bandwidth and DB use are too high for a dedicated virtual server

    3. I am told by two readers who have given this a once over and run in high volume environments that my traffic level is indeed pushing the envelope for WP. It is a resource hog even when optimized.

    4. Techcrunch may run WP on its own servers, and have its own T1…not comparable to my situation. I have no interest in being in the IT business.

    5. Even in a lower traffic setting, WP takes much more babysitting than Blogger

    6. Drupal is probably a better idea, but I am not up for learning another program or putting myself through another transition

  19. William Mitchell

    Sorry to hear about all the trouble, and thanks for working so hard on it. We really appreciate it.

    Data point: FourHourWorkWeek (http://fourhourworkweek.com/blog) runs reliably on WP with much higher traffic (says Alexa). The home page says they use MediaTemple dedicated hosting.

    Page caching: running over your database allotment suggests caching isn’t working. Properly cached pages are served by Apache before the request ever reaches WordPress or the database.

    More generally, dedicated hosting is a simpler debugging problem, because there are fewer variables. You control all resources except bandwidth, so performance profiles are easy to interpret.

    Then again, if Blogger works reasonably well, for free, maybe that’s not so bad!

  20. Michael Z

    Another site I visit has had mountains of problems with wordpress but for some reason they continue. Partly because of the investment already undertaken i’m sure (also for legal reasons they wish to ensure they own the data). It mostly works now for normal usage, and usually it is only DOS attacks which bring it down. But it is an on-going effort not for the faint of heart nor one not willing or able to invest the time or money in the expertise required.

    I’m afraid your reader trying to diagnose technical issues using ‘bitmaps’ and ‘microsoft word’ was wasting his or her time, well-intentioned as it was. In a unix or similar environment viewing/scanning binary data is much easier, although all the same tools are available for microsoft windows too. I would’ve started with ‘less’ which highlights control characters and gets most such cases (very quickly), and failing that ‘od’ or ’emacs’ to view the hexadecimal character codes (which is more tedious but thorough).

    Those blaming ‘free’ software for it’s shortcomings almost certainly haven’t tried proprietary products themselves, or sell them (‘commercial’ software can be free software too). Scalability is fairly hard and has to be designed in from the start no matter who does it. e.g. apache is free and it has no problems, compared with iis which has always been an also-ran (and that’s being generous). Free software is often better simply because it is being written to serve a purpose, and not concerned with adding features (which _always_ means more bugs) simply to try to compete for popularity and sales (plenty of free software projects seem to be in the popularity race too of course).

    Google can do it with blogger because they have giant data centres and a particularly scalable architecture that simply makes it easier to do – actually you can’t even write something unscalable within it. Remember also that whilst your WP setup is running a single blog and can’t even manage that, blogger is running all the blogger blogs at once with nary a hiccup.

    Having said that, WP is a pretty crappy 2-tier database application that simply isn’t designed with scalability in mind. It’s fine for a moderate-volume blog or information site. But you shouldn’t need a ‘plugin’ to try to address a fundamental issue like scalability (and boy, one for input validation? ouch!!) – if you do it’s a pretty definitive indicator that scalability (security!) wasn’t designed in from the start and it WILL be an issue.

    Personally, as a rule I don’t trust anything written in php or python or perl (or visual basic) for scalability (amongst other things); basically these are simple languages designed to be easy to learn and write. The problem is when someone learns them they think they’re ready to take on any task, not realising that you need to know what you’re doing with anything that scales or involves security (e.g. funny characters in the input) or internet standards (e.g. people just use what works for them and don’t follow the documented procedures). And if you learnt a simple language simply because it was ‘easy’ you probably don’t know or even acknowledge some problems are actually ‘hard’ (and thus beyond your current skill level). This applies to proprietary and free software alike. Not to say that you cannot write good software in these languages, it’s just that many of the people with enough skills to would choose another language first.

    The downside of `fog’ services like blogger is you effectively no longer own your own data. And there’s no-one you can really turn to in the case of problems.

Comments are closed.