Tuesday, March 01, 2005

InfoWorld: Value servers feel the strain

Image hosted by Photobucket.com

======================================================================== ENTERPRISE STRATEGIES: AHEAD OF THE CURVE http://www.infoworld.com ======================================================================== Tuesday, March 1, 2005

VALUE SERVERS FEEL THE STRAIN

By Tom Yager

Posted February 25, 2005 3:00 PM Pacific Time

In the off-the-shelf world of value servers, surmounting challenges to high availability is your job. Management solutions make remote observation easier, and clustering is getting closer to standard fare for OSes.

ADVERTISEMENT -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Aligning Data Strategy with Business Strategy Attend the Meta-Data Conference & DAMA International Symposium May 22-26 in Orlando. This year's theme is Aligning Data Strategy with Business Strategy. Topics include metadata management, data quality, warehousing, compliance and governance, enterprise architectures, semantic technologies and unstructured data management. Keynotes include data warehousing guru Bill Inmon, and CNN Analyst Bruce Weinstein discussing Ethics in Data. World's largest vendor-neutral data management event. http://newsletter.infoworld.com/t?ctl=BEABBF:353CA35 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

But I wonder: Will a rack of value servers with the equivalent computing power of a large, multiprocessor monolithic server ever be able to sense and respond to availability problems the way big iron and their OSes can?

The OS on a value server is likely to blast alerts out to applications when something goes wrong in the hardware, drivers, or OS. No application should hear about an availability issue that it didn't create or can't address.

IBM knows exactly what's in every Power5 server it sells. That level of understanding of each component results in a major boost in availability -- and in the ability to capture errors. If something goes wrong with an IBM server at the OS or hardware platform level, an underlying protective layer protects against data corruption, connection loss, and errant behavior. An application should typically run forever. Mastery of the art of graceful failure under all conditions shouldn't be a prerequisite for enterprise application developers. And it's work that's done over and over for project after project.

Consider that user-mode applications are often saddled with the responsibility of sorting out arcane, platform-peculiar errors related to availability. Does a disk write fault mean that the OS has already done all it can, or might it be in the midst of fail-over and I'm supposed to keep trying? What do I do in my application code when I get a memory parity error? If a TCP socket closes and the same IP requests a new connection a few seconds later, how can I tell whether it's a new session or the continuation of an old one after a fail-over? I've got two files open and the system says it's out of file handles. It isn't my fault so it shouldn't be my problem! With so many possibilities -- I like to ponder the reason an OS notifies an application of a fatal error -- it's not surprising that the standard response to a problem is log and abort. That's a cop-out in some large projects, but it's the right way to get an application through QA in most.

One example of lousy handling of an availability challenge is the sequence of events that occurs when a network interface fails over. Network hardware and drivers do very smart things that make the spare network interface a perfect duplicate of the one that failed. But it takes time, and in applications throughout the system, network connections close and it's left to the application to decide how to respond. It would be smarter to pause threads that have open network connections so they at least have a chance of staying connected. Or perhaps the OS should attempt to reconnect on the application's behalf after the fail-over.

An availability-related error that makes it to an application is a sign that a system designer, device driver developer, or OS engineer punted. But I can't rag on them too much.

The fact that value server OSes and drivers can't rely on minimum capabilities in underlying hardware is an unintended consequence of open systems. That tradeoff means that, for now, value servers and portable OSes will burden applications with availability issues more than they should. Value servers can't be referred to as high-availability systems until they see to availability themselves.

Tom Yager is technical director of the InfoWorld Test Center.

======================================================================== Stay On Top of Open Source Want to keep up on the hottest developments in the open source world? Want more information about Linux? Let InfoWorld bring you cutting-edge coverage every week in the Linux and Open Source Report. For all the news that will keep you on top, go to http://newsletter.infoworld.com/t?ctl=BEABBC:353CA35

ADVERTISE ======================================================================== For information on advertising, contact Elisabeth_raphel@infoworld.com.

UNSUBSCRIBE/MANAGE NEWSLETTERS ======================================================================== To subscribe, unsubscribe or change your e-mail address for any of InfoWorld's e-mail newsletters, go to: http://newsletter.infoworld.com/t?ctl=BEABBD:353CA35

To subscribe to InfoWorld.com, or InfoWorld Print, or both, or to renew or correct a problem with any InfoWorld subscription, go to http://newsletter.infoworld.com/t?ctl=BEABC0:353CA35

To view InfoWorld's privacy policy, visit: http://newsletter.infoworld.com/t?ctl=BEABBE:353CA35

Copyright (C) 2005 InfoWorld Media Group, 501 Second St., San Francisco, CA 94107

This message was sent to: GDEWILDE@GMAIL.COM