Field Journal from Mark Kochte - 3/13/96


After a quick stop at Dunkin' Donuts to grab a box of munchkins for today's crew, I got in about 7:10 am and opened up the workstations, ready to begin the day! And I immediately discovered that one of the two workstations was in a hung state; it was totally unresponsive. Ugh. And I got in early today; the SOGSMGR (the system manager for the SOGS computer system used by the Operations Center here) won't be in for another 40 minutes or so.

I begin prepping the windows in the workstation that is responsive, only to discover two of them are hung! Wunnerful. Okay, so there are two windows I can't work with just yet; I put those away. In one of the windows that I can use I read the shift report from last night. datadisk for data evaluation was put in...they received science data and processed new engineering telemetry received last night...oops, problems with processing one of last night's observations...hmmmm, problems with repairing another observation that went to trouble (make a note of that to work on today).... problems with the pipeline cleanup last night, need to investigate that this morning, too...and problems with one of our software tools that needs trouble-shooting. Well, this could prove to be a fun morning.

I start up the shift report, check the disk space, and check when the STRs are (STR = Science Tape Recorder dumps; when the spacecraft dumps the recorded science data to the ground and ultimately up to us). seems that the new datadisk isn't showing up on our little disk-check tool (it actually has another name, but for simplicity we'll just refer to it as the "disk-check" tool). A quick investigation revealed that our disk-check tool wasn't properly updated when the new disk was installed, and therefore the disk-check tool isn't picking up on the new disk. Great, only a minor problem! A simple fix.

Slowly my co-workers filter in, and the SOGSMGR arrives! Woo hoo! I immediately corner her about the problem with the hung workstation and with the hung windows in the other workstation. She gets back to us a few minutes later. Apparently the workstation that's hung has gotten itself into a Bad State (tm) due to over-allocation of resources. I try logging out and back into the workstation, but this doesn't alleviate the problem. The workstation's resources are still overtaxed. This is Not Good (tm), as we need this workstation to assist in processing the data. I'm expecting to get a dump of science data in the next half hour or so. We try logging out and in of the workstation again, but this doesn't help. Okay, the SOGSMGR decides to perform the ultimate act, and reboots the machine. This works!

Quickly I bring up the workstation, and get the process manager up and going The process manager is the pipeline through which our science data processes through; it needs to be up and operating to process any data we might get. And a few minutes later, the data from the spacecraft hits our system and processes through, with nary a hitch.

While that occurs, I turn my attentions next to the data that is sitting in trouble, waiting to be fixed (you see, when an observation fails to process properly, for whatever reason, it is sent to a trouble area for us to attend to as time will allow; we try and do this as quickly as possible so we can get the data out to the person who originally requested it as soon as possible). A couple of the support team members join in and we investigate the different observations in trouble for an hour or so. A couple of them we were able to readily fix and reinsert for processing. They processed just fine. There was a problem that needed further detailed investigation. The two support team members drifted off in thought and would try and get back with me later on them. On to the next item on the list!

The SOGSMGR came back to tell me that the problem with the two hung windows had been corrected (there was a problem with the disk driver, which was hung). My two frozen windows were freed up. Great! More windows to work with (I just love multi-tasking).

I reread the shift report, to make sure I didn't miss anything. And then I read the message waiting from PASSOPS (down at Goddard) that there is more engineering telemetry data from last night ready to be copied. PASSOPS deals with the engineering telemetry data processing for us, in addition to a half dozen other things (such as satellite uplink/downlink requests). Okay, I call up another window that I'm not using and start the tool to copy up the data. Unfortunately, I get the following message:

-SYSTEM-F-UNREACHABLE, remote node is not currently reachable

Ack!! Okay, past experience I know something's up with the line. I jog over to the SOGSMGR area to ask if she knows anything about line problems, and overhear her on the phone with someone else, explaining to them that there is a problem with the data line and the repair crew from Bell Atlantic is looking into it. Okay, so, no data copying for me at the moment. I'll check back on the line status later.

I return my attentions to one of the observations in trouble as one of the support guys comes back with a potential solution. We implement his solution, and try reinserting the one observation...and the background I can hear the SOGSMGR discussing with the system manager down at Goddard the line problem...the data enters the pipeline...processes...and goes through! Whew.

Well, that was the morning. It's time for lunch!

I strolled out with a couple other co-workers and wandered onto campus for a quick pizza lunch. After enjoying the warm spring sun and discussing various classes we had taken in our past (or currently were taking), both good and bad aspects, we returned to work.

I slid back in and coordinated with one of the support team members about a potential fix for the other observations that were in trouble. I spent the next hour or so going through the repair procedure and...voila'! Data processed.

Throughout the day I kept tabs on the status of the lines between here and Goddard. Being that they're down means no data. No data means...well, you can figure that out. But finally the lines came up, and I began copying up the engineering telemetry data. At the same time I checked what data had gotten through the DADS archive system and could be archived. Quite a bit of it, it turned out. So I started a batch of data archiving.

Finally I gathered my notes from the shift, and finished editing my shift report, and handed everything over to the evening shift. Let them know that the PASSOPS data was still coming up, and that archiving was going on, and that they should expect to have an otherwise quiet evening. All the problems from last night and today amazingly were taken care of; nothing for the evening shift to do but normal activities.

Instead of going straight home, being that it was *such* a nice day, I opted to go out for an hour or so climbing at a local crag. After that, I returned home, logged on to check what email I had accumulated during the evening, and began dinner. Oh, yeah, foooood....mmmmm!

Finally I kicked back to relax to watch the latest episode of Babylon 5 that I recorded last week (and hadn't had a chance to view yet), and then turned in to sleep. It was late, and I had to be back in early again tomorrow...