The HyperReader script attempts to read
Bookreader format documents and present them as browser-based documents.
Sometimes it is very successful, other times less-so! The reasons are few
but significant:
- Bookreader documents are not plain text
They are not even marked-up text (in the same sense as HTML documents).
Essentially they are a series of images pasted onto an X window. Most of
the images happen to be from fonts, but even then characters are not
handled as text. There is essentially no information within the file on
the documentary function of any text in a book. This may be one reason
for the poor performance of the internal Bookreader print
functionality (it must adopt a similar mapping then "images" into
PostScript "pages").
- Bookreader format is not public
At least not that I could find. Hence considerable effort at reverse-
engineering the structure (most originally during January 1992
fortunately, I wouldn't have the patience, time or interest now!) In any
case, the structure is complex and I don't pretend to understand it all
... so the server tends to break occasionally.
- Development time available
Zero. However not being able to get to the wealth of Bookreader
documentation from the hypertext enviroment in some fashion (no matter how
primitive) seemed too great a deficiency not to make some effort.
Corollary: the server is not as fully developed (or debugged) as
might be desired.
Limitations
- Client/Server Use
Bookreader documents were never really designed to be accessed using
client/server technology. There are some "book opening"
overheads best done only once (as with the original Bookreader
application). With the HTTP (hypertext) environment these must be
performed with each access, sometimes introducing noticable latency.
- Mapping of document text
The server attempts to map a bit-addressed format (Bookreader) onto a
character-cell format (text-page). Bookreader formatting permits parts of
a page to be generated non-sequentially, for example to place normal text,
then place bolded text within it. This can occur on a per-line, per-
section or per-page basis. So the entire "page" for a hypertext
enviroment must be created by mapping into a "page" in memory. Sometimes
this mapping is not perfect and character strings get garbled or poorly
proportioned.
- Fixed-Font presentation
This is due to two requirements:
-
The mapping from bit-addressed to cell-addressed described above requires
a fixed character-cell matrix.
-
Maintaining correct layout for examples, tables, etc. As the Bookreader
format specifies the positioning of the document's components, rather than
information on their documentary function, relative location must be
maintained, generally eliminating the use of the more aesthetically
pleasing proportional fonts.
- Image conversion and compression
Figures and graphics within Bookreader documents are stored as
single-plane bitmaps. HyperReader supports these using the GIF
format. Figure positioning on a page can sometimes be clumsy due to the
character-cell layout.
- Page layout
Apart from the text-mapping process mentioned above, page layout can
sometimes be a little odd, particularly if graphics are involved.
Some massaging of the resulting page occurs. This often involves
educated guesswork. For instance, single blank lines preceding a line
beginning with a lower-case alphabetic are most often an artifact of the
mapping process rather than intended line breaks, and can be eliminated to
produce a better layout. Similarly, bulleted and numbered lists can be
enhanced by ensuring a blank line exists between elements. Unfortunately
there are not many instances where the original intent can be clearly
identified.
Finally ...
Even considering all the limitations of the approach it generally works
remarkably well, and does also give access to Bookreader documentation
from a character-cell terminal (via a browser such as Lynx).
I hope it's useful.