After a quick stop at Dunkin' Donuts to grab a box of munchkins for today's crew, I
got in about 7:10 am and opened up the workstations, ready to begin the day! And I
immediately discovered that one of the two workstations was in a hung state; it was
totally unresponsive. Ugh. And I got in early today; the SOGSMGR (the system manager for
the SOGS computer system used by the Operations Center here) won't be in for another 40
minutes or so.
I begin prepping the windows in the workstation that is responsive, only to discover
two of them are hung! Wunnerful. Okay, so there are two windows I can't work with just
yet; I put those away. In one of the windows that I can use I read the shift report from
last night. Hmmmm....new datadisk for data evaluation was put in...they received science
data and processed it...no new engineering telemetry received last night...oops, problems
with processing one of last night's observations...hmmmm, problems with repairing another
observation that went to trouble (make a note of that to work on today).... problems with
the pipeline cleanup last night, need to investigate that this morning, too...and problems
with one of our software tools that needs trouble-shooting. Well, this could prove to be
a fun morning.
I start up the shift report, check the disk space, and check when the STRs are (STR =
Science Tape Recorder dumps; when the spacecraft dumps the recorded science data to the
ground and ultimately up to us). Hmmmm...it seems that the new datadisk isn't showing up
on our little disk-check tool (it actually has another name, but for simplicity we'll
just refer to it as the "disk-check" tool). A quick investigation revealed that our
disk-check tool wasn't properly updated when the new disk was installed, and therefore
the disk-check tool isn't picking up on the new disk. Great, only a minor problem! A
simple fix.
Slowly my co-workers filter in, and the SOGSMGR arrives! Woo hoo! I immediately corner
her about the problem with the hung workstation and with the hung windows in the other
workstation. She gets back to us a few minutes later. Apparently the workstation that's
hung has gotten itself into a Bad State (tm) due to over-allocation of resources. I try
logging out and back into the workstation, but this doesn't alleviate the problem. The
workstation's resources are still overtaxed. This is Not Good (tm), as we need this
workstation to assist in processing the data. I'm expecting to get a dump of science data
in the next half hour or so. We try logging out and in of the workstation again, but this
doesn't help. Okay, the SOGSMGR decides to perform the ultimate act, and reboots the
machine. This works!
Quickly I bring up the workstation, and get the process manager up and going The
process manager is the pipeline through which our science data processes through; it
needs to be up and operating to process any data we might get. And a few minutes later,
the data from the spacecraft hits our system and processes through, with nary a hitch.
While that occurs, I turn my attentions next to the data that is sitting in trouble,
waiting to be fixed (you see, when an observation fails to process properly, for whatever
reason, it is sent to a trouble area for us to attend to as time will allow; we try and
do this as quickly as possible so we can get the data out to the person who originally
requested it as soon as possible). A couple of the support team members join in and we
investigate the different observations in trouble for an hour or so. A couple of them we
were able to readily fix and reinsert for processing. They processed just fine. There was
a problem that needed further detailed investigation. The two support team members
drifted off in thought and would try and get back with me later on them. On to the next
item on the list!
The SOGSMGR came back to tell me that the problem with the two hung windows had been
corrected (there was a problem with the disk driver, which was hung). My two frozen
windows were freed up. Great! More windows to work with (I just love multi-tasking).
I reread the shift report, to make sure I didn't miss anything. And then I read the
message waiting from PASSOPS (down at Goddard) that there is more engineering telemetry
data from last night ready to be copied. PASSOPS deals with the engineering telemetry
data processing for us, in addition to a half dozen other things (such as satellite
uplink/downlink requests). Okay, I call up another window that I'm not using and start
the tool to copy up the data. Unfortunately, I get the following message:
-SYSTEM-F-UNREACHABLE, remote node is not currently reachable
Ack!! Okay, past experience I know something's up with the line. I jog over to the
SOGSMGR area to ask if she knows anything about line problems, and overhear her on the
phone with someone else, explaining to them that there is a problem with the data line
and the repair crew from Bell Atlantic is looking into it. Okay, so, no data copying for
me at the moment. I'll check back on the line status later.
I return my attentions to one of the observations in trouble as one of the support
guys comes back with a potential solution. We implement his solution, and try reinserting
the one observation...and wait...in the background I can hear the SOGSMGR discussing with
the system manager down at Goddard the line problem...the data enters the pipeline...
processes...and goes through! Whew.
Well, that was the morning. It's time for lunch!
I strolled out with a couple other co-workers and wandered onto campus for a quick
pizza lunch. After enjoying the warm spring sun and discussing various classes we had
taken in our past (or currently were taking), both good and bad aspects, we returned to
work.
I slid back in and coordinated with one of the support team members about a potential
fix for the other observations that were in trouble. I spent the next hour or so going
through the repair procedure and...voila'! Data processed.
Throughout the day I kept tabs on the status of the lines between here and Goddard.
Being that they're down means no data. No data means...well, you can figure that out. But
finally the lines came up, and I began copying up the engineering telemetry data. At the
same time I checked what data had gotten through the DADS archive system and could be
archived. Quite a bit of it, it turned out. So I started a batch of data archiving.
Finally I gathered my notes from the shift, and finished editing my shift report, and
handed everything over to the evening shift. Let them know that the PASSOPS data was
still coming up, and that archiving was going on, and that they should expect to have an
otherwise quiet evening. All the problems from last night and today amazingly were taken
care of; nothing for the evening shift to do but normal activities.
Instead of going straight home, being that it was *such* a nice day, I opted to go out
for an hour or so climbing at a local crag. After that, I returned home, logged on to
check what email I had accumulated during the evening, and began dinner. Oh, yeah,
foooood....mmmmm!
Finally I kicked back to relax to watch the latest episode of Babylon 5 that I recorded
last week (and hadn't had a chance to view yet), and then turned in to sleep. It was
late, and I had to be back in early again tomorrow...