I had just packed away my wife's snowboots (we're moving the end of this month)
and this morning it snowed. Two inches or so, but enough to mess things up and close some
of the schools.
We think the hawks are back! Last year we had a small family of red-tailed hawks that
lived in a nearby park. A couple times a week they would come over onto the campus and go
hunting. We'd see them perched halfway up a tree, or up on the gutters just below the roof
line of the building. The female is pretty big, the male was about the size of a large
crow, and the juveniles (there were one or two, we were never sure which) got to be the
size of the male by the end of the summer. A couple of people think they saw the female
return in the last few days. I've been keeping an eye out for her.
Today I'm doing two things at once: testing the changes I described in my last
journal, and doing some database work.
The optical platters we use cost about $300 each, so we try to make sure things are
working pretty well before we actually start writing data. I've been "burning aluminum"
all day, and I'm pretty confident that things are working correctly. Suzanne (another
DADS developer) and I have been working on this project since about Halloween, and we're
both relieved to see it coming close to the end. And we're anxious to get on to the next
phase of work for SM-97.
I mentioned "burning aluminum" above. I call it that because when we write to an
optical disk, a laser in the drive blows little pits in a very thin sheet of aluminum
trapped between two layers of transparent plastic. When we write, a high-powered laser
burns out the pits. When we read, a lower-powered laser looks to see what those pits look
like. Once you've burned the pits into the aluminum, it's permanent. You can't erase it,
and we expect the disks to be good for at least 20 years, and maybe as much as 100 years.
CD-ROMs (and music CDs) work basically the same way, but the pits are "stamped" using
a pressing machine, rather than blowing them out with a laser. The low-power laser in
your CD player works the same as our optical disks drives.
To test a new version of the programs that add data to the archive, we run a standard
set of test data through the system. There are about 700 files, for a total of 305
megabytes of data. That's about half of a typical CD-ROM, or about one twentieth of our
big disks. It takes a couple hours to run all the data through, and that leaves me time
to work on my other problem.
The database work I'm doing involves figuring out how much space we should reserve
when we are making tapes to send to astronomers.
I'm trying to figure out just how big HST Datasets are. A dataset is a collection of
files that together hold all the data for an image or spectrum. For WFPC-II (Wide-Field
and Planetary Camera Two - the camera that takes most HST pictures) this is a pretty
constant number: about 25 megabytes in 10 files. It's a pretty constant because the
camera takes the same sort of pictures all the time. Each picture is four "chips" in an
800x800 array. (A typical PC screen has 1024x768 pixels -- a single WFPC-II chip is just
slightly smaller, but square.) There are a total of about 40 bytes of information about
each pixel, including calibrated and uncalibrated values, quality information, and other
stuff. Since the size of the picture doesn't change, the size of the dataset doesn't
change either.
For the spectrographs, the size of the dataset can vary a lot. This is because a
single dataset can contain multiple spectra. In the case of the Goddard High Resolution
Spectrograph, it can vary from just 38 kilobytes to over 300 megabytes!
But what I want is a "pretty good" estimate of each kind of dataset, and I can use
that to plan how much space I'll need to retrieve a particular set of data. To get a
statistical look at the data, I have this nice complicated query that gets the minimum,
maximum, and average size of "Z-CAL" datasets. "Z-CAL" datasets are CALibrated science
data for the GHRS. (Each instrument has a letter associated with it: U is for WFPC-II,
X is for FOC, Z is for GHRS.) Once I have all that data, I can also compute the "standard
deviation", which is a kind of average difference in sizes. That gives me an idea of how
much variation there is in size.
Here's another example: If ten people take a test, and they all score between forty
and sixty points, with an average of fifty points, that's a pretty low standard
deviation. If another group of ten take the test, and half of them score about 20, while
the other half score about 80, the average would still be 50, but the standard deviation
would be pretty big.
When you see a large standard deviation like that, you have to decide if you're seeing
different "populations". For example, if you have a test aimed at eighth graders, and you
get five people who score about 20, and five who score about 80, the fact that you have a
large deviation makes you wonder if maybe the five who scored 20s were perhaps second
graders!
In my case, I've discovered there are two types of GHRS observations: short, small
observations with one or a few spectra, and large observations that have many spectra.
The "mode" I see for those observations is "RAPID", and I'll have to get one of the
astronomer types to explain that operating mode to me.
That's the kind of math I do pretty regularly: Statistical analysis of the contents of
the archive. I rarely need to do any calculus, though I know enough to understand how the
mathematical "tools" I use work. But I do a lot of algebra, and use programs that have
statistical functions.
Well, my big test is finished, and while most things are working, there are a couple
of problems I need to work on. I'm going to take a break, get something to drink, and see
if I can spot that hawk before I tackle them.