silver linings
Aug. 23rd, 2001 01:52 amOn the good side...I now have some good, solid evidence as to why running a production server on redhat/sparc is not the brightest idea ever.
Geek ranting. Don't bother reading if you don't like reading technical stuff. I'll save the non-technical parts (ie. how this relates to the future of my job) for another time.
Our lovely samba server which was crashing every four days decided today that it now felt like crashing about every 4 hours. Joy.
So I tried switching to 2.0.10 compiled from source (rather than the RPM version I'd been using). It hasn't been four hours, or four days, so I can't say whether it helped or not.
Reading through the log files in hopes of discovering what is causing these crashes, I discovered that there seemed to be extensive filesystem corruption on one of the disks, and that this guy I know in another department had tried to log into the box (no user accounts, no reason he'd try to log in that I can think of) (the guy trying to log in has no relevance to the rest of the story. It's just odd).
Since it was conveniently the middle of the night, I went ahead, stopped NFS, unmounted the partition, and fsck'd it. Or attempted to, anyways. It died with a bus error partway through. Tried again, with the same result. Did a web search and discovered that this was a problem with big endian systems that was fixed back in May. Checked the redhat site for updates. Did they have any? Of course not. Why would they bother updating when there are absolutely critical bugfixes in programs?
So, downloaded the source. I was (and still am - haven't rebooted yet, as there isn't a whole lot I can do right now if it doesn't come back up correctly) a bit nervous about removing critical programs and reinstalling from source which may or may not drop everything needed into the right places. For fsck'ing purposes, it seems to have worked fine, though.
fsck'd the drive, and another one that had been whingeing a bit a few days ago. I just hope that there was no actual file damage. It would quite suck having to reload all those CDs. Oh well, if I do, it's my fault for copying the files over to that partition without checking it first.
You know, I bet this was caused by that loose cable back a few months ago. I bet all the other partitions that haven't already been checked will have similar issues. *Mental note to self - fsck the rest of the unused partitions one of these days soon*
Anyways, the e2fsprogs deal should be enough grounds to prove that redhat/sparc is not being maintained well enough to be an appropriate platform for a production server.
Though I just realized that since we're rethinking our software distribution plans anyways, it's quite possible that she'd rather leave well enough alone (though "well enough" is highly subjective).
Now I must sleep. I'm already running on a sleep deficit that is getting progressively worse.
Geek ranting. Don't bother reading if you don't like reading technical stuff. I'll save the non-technical parts (ie. how this relates to the future of my job) for another time.
Our lovely samba server which was crashing every four days decided today that it now felt like crashing about every 4 hours. Joy.
So I tried switching to 2.0.10 compiled from source (rather than the RPM version I'd been using). It hasn't been four hours, or four days, so I can't say whether it helped or not.
Reading through the log files in hopes of discovering what is causing these crashes, I discovered that there seemed to be extensive filesystem corruption on one of the disks, and that this guy I know in another department had tried to log into the box (no user accounts, no reason he'd try to log in that I can think of) (the guy trying to log in has no relevance to the rest of the story. It's just odd).
Since it was conveniently the middle of the night, I went ahead, stopped NFS, unmounted the partition, and fsck'd it. Or attempted to, anyways. It died with a bus error partway through. Tried again, with the same result. Did a web search and discovered that this was a problem with big endian systems that was fixed back in May. Checked the redhat site for updates. Did they have any? Of course not. Why would they bother updating when there are absolutely critical bugfixes in programs?
So, downloaded the source. I was (and still am - haven't rebooted yet, as there isn't a whole lot I can do right now if it doesn't come back up correctly) a bit nervous about removing critical programs and reinstalling from source which may or may not drop everything needed into the right places. For fsck'ing purposes, it seems to have worked fine, though.
fsck'd the drive, and another one that had been whingeing a bit a few days ago. I just hope that there was no actual file damage. It would quite suck having to reload all those CDs. Oh well, if I do, it's my fault for copying the files over to that partition without checking it first.
You know, I bet this was caused by that loose cable back a few months ago. I bet all the other partitions that haven't already been checked will have similar issues. *Mental note to self - fsck the rest of the unused partitions one of these days soon*
Anyways, the e2fsprogs deal should be enough grounds to prove that redhat/sparc is not being maintained well enough to be an appropriate platform for a production server.
Though I just realized that since we're rethinking our software distribution plans anyways, it's quite possible that she'd rather leave well enough alone (though "well enough" is highly subjective).
Now I must sleep. I'm already running on a sleep deficit that is getting progressively worse.