You know how it goes, where you'll sometimes put in 'assert(!"blarg");' or somesuch in your code to get irrefutable evidence of what code path is being taken? Rumor has it that a certain reputable database company had to fly a senior engineer, at great expense, out to a German bank to placate them when their database started aborting with the singularly helpful message of "BONER" :)
I remember a big "to do" a lonnnnng time ago when I had only been with IBM about a year or so and was working in AIX tech support. For some reason, a customer had decided to do an octal dump of the AIX error report (errpt command). As it turns out, they had a lot of errors from this particular device driver for an ethernet card. When you looked at the error data with octal dump (normally, this is in hex format) the data read like this:
"Knock Knock. Whos there. Pig. Pig who."
Well, the customer first called in about this because they thought that they had a hacker that had compromised their system. So, development got involved trying to figure out where this message was comming from. As it turns out, it was not a hacker at all. Some device driver developer at IBM put that message there on purpose. I guess maybe they thought that no one would ever look at this with octal dump since the AIX error report information is intended to be looked at with the errpt command which formats everything out in a particular way based on the flag options. The customer was also very upset because they had systems all over the world and the message would be offensive in some cultures and they didn't want one of the customers to accidentally run into this message.
Imagine trying to explain that one to the customer. Needless to say, a defect was opened and "fix" delivered quite promptly once they figured out what was going on. I'm not sure if they ever really admitted directly what the problem was but I would imagine that they had to in order to convince the customer that their system had not been hacked.
To this day (about 7 years later)... I can't think of a single instance of someone trying to look at this log with octal dump.
I was once accused of perpetrating a virus. The overlay manager we wrote for NetHack contained a last ditch out of RAM error message with the text (not unduly irrational in context, if you've played the game) "The little dog eats your memory ... you die!" It really was one of those impossible assertion messages; there was no way for NetHack to get itself into the state where you'd see it anyway.
Well, guess what? WordPerfect used an overlay manager, too. But it didn't just install its handler, nooooooo. It looked to see if a handler was installed, and just blithely used it if there was one there already, only using its own if none was found. I suppose that might have made sense if it was a standard system facility; but it wasn't. WP's wasn't like ours, ours wasn't like MS's, MS's wasn't like WP's.
So ... if you started NetHack, recursively shelled out, fired up a WordPerfect (as if you're going to have enough memeory for that to work ... this was back in the days of the 640K limit and NetHack had 2M of executable code - whence the need for the custom overlay manager), it would get half way loaded and then try to invoke our overlay manager with its parameter block and BLAMMO!
And even then "The little dog eats your memory. You die!" is the closest I can come to an accurate non-technical explanation of what went wrong. Other than perhaps "optimism kills...".
hahahaa. what an obscure situation. how could you have ever anticipated something like that happening. i guess the only "politically correct" type of message that a manager could have asserted that you could used (since you should never be able to get your program into that state) would be something like "unexpected nethack error. process killed". but really, it's these kinds of errors that makes one scratch their head.
the only reason that i mention the 'what could an acceptable message have been' thing because I spent the day analyzing why we found the defects that we did in our last test cycle, how we could have caught them by DCUT, and doing root cause analysis of post-release defect found by the customer. along the same line as scratching one's head wondering how you could have ever anticipated such a problem... we recently had one of those situations that i had to analyze. basically, when we (well the guy who actually codes things was) were updating our programs to support a new level of the OS. Everything was working well at the new level, detailed function tests were done on the new level, and then regression tests were done on the earlier OS levels. Well, in one of our programs the programmer used '==' which i am sure is common syntax in programming. However, as it turns out, we had not happened to have used that syntax in that particular program in a place that mattered to the OS. The syntax change was in a very common program (well, really more like a library) that many of our functions call. However, as it turns out, in this scenario for this part of the OS that it operates in... the OS doesn't like '==' at the two earlier release levels. It was just one of those obscure scenarios that I don't think that we could have ever anticipated created a test case to look for it. Normally, such a mistake would not be a big deal... but because of where this syntx was located, it kept people from restoring their system from our backup images. *sigh*. such a tiny thing caused such a big problem and took all over about 30 seconds to fix. go figure
no subject
Date: 2005-04-27 08:07 pm (UTC)no subject
Date: 2005-04-27 08:11 pm (UTC)Hooray for co-op students!
no subject
Date: 2005-04-28 12:21 am (UTC)Good Lord. Where did you get that icon? :^O
no subject
Date: 2005-04-28 02:21 am (UTC)no subject
Date: 2005-04-27 08:20 pm (UTC)no subject
Date: 2005-04-27 09:39 pm (UTC)And we once supported a web-based product where the Help link took you to a porn site.
no subject
Date: 2005-04-27 10:04 pm (UTC)"Knock Knock. Whos there. Pig. Pig who."
Well, the customer first called in about this because they thought that they had a hacker that had compromised their system. So, development got involved trying to figure out where this message was comming from. As it turns out, it was not a hacker at all. Some device driver developer at IBM put that message there on purpose. I guess maybe they thought that no one would ever look at this with octal dump since the AIX error report information is intended to be looked at with the errpt command which formats everything out in a particular way based on the flag options. The customer was also very upset because they had systems all over the world and the message would be offensive in some cultures and they didn't want one of the customers to accidentally run into this message.
Imagine trying to explain that one to the customer. Needless to say, a defect was opened and "fix" delivered quite promptly once they figured out what was going on. I'm not sure if they ever really admitted directly what the problem was but I would imagine that they had to in order to convince the customer that their system had not been hacked.
To this day (about 7 years later)... I can't think of a single instance of someone trying to look at this log with octal dump.
no subject
Date: 2005-04-27 10:40 pm (UTC)Well, guess what? WordPerfect used an overlay manager, too. But it didn't just install its handler, nooooooo. It looked to see if a handler was installed, and just blithely used it if there was one there already, only using its own if none was found. I suppose that might have made sense if it was a standard system facility; but it wasn't. WP's wasn't like ours, ours wasn't like MS's, MS's wasn't like WP's.
So ... if you started NetHack, recursively shelled out, fired up a WordPerfect (as if you're going to have enough memeory for that to work ... this was back in the days of the 640K limit and NetHack had 2M of executable code - whence the need for the custom overlay manager), it would get half way loaded and then try to invoke our overlay manager with its parameter block and BLAMMO!
And even then "The little dog eats your memory. You die!" is the closest I can come to an accurate non-technical explanation of what went wrong. Other than perhaps "optimism kills...".
no subject
Date: 2005-04-28 01:08 am (UTC)the only reason that i mention the 'what could an acceptable message have been' thing because I spent the day analyzing why we found the defects that we did in our last test cycle, how we could have caught them by DCUT, and doing root cause analysis of post-release defect found by the customer. along the same line as scratching one's head wondering how you could have ever anticipated such a problem... we recently had one of those situations that i had to analyze. basically, when we (well the guy who actually codes things was) were updating our programs to support a new level of the OS. Everything was working well at the new level, detailed function tests were done on the new level, and then regression tests were done on the earlier OS levels. Well, in one of our programs the programmer used '==' which i am sure is common syntax in programming. However, as it turns out, we had not happened to have used that syntax in that particular program in a place that mattered to the OS. The syntax change was in a very common program (well, really more like a library) that many of our functions call. However, as it turns out, in this scenario for this part of the OS that it operates in... the OS doesn't like '==' at the two earlier release levels. It was just one of those obscure scenarios that I don't think that we could have ever anticipated created a test case to look for it. Normally, such a mistake would not be a big deal... but because of where this syntx was located, it kept people from restoring their system from our backup images. *sigh*. such a tiny thing caused such a big problem and took all over about 30 seconds to fix. go figure
no subject
Date: 2005-04-28 01:10 am (UTC)