A dynamic book reading system for computers
Since the invention of the printing press we have seen little progress in the reading habits of humans. A physical book once published on paper or imitated on a computer, would not allow for its content to interact with the reader. No book, for example, would let the reader to turn any of its tables into an interactive chart despite the fact that it is quite difficult for humans to interpret large bodies of numbers. This immutable nature of paper curbs mental development of humans, but the situation is being changed by the arrival of computers. Humans(and other animals) learn by interacting with their environment and an environment rich in live elements would definitely boost the intellectual abilities of its reader. A computer is a medium that provides necessary tools needed to build a dynamic environment for any situation and there is an urgent need for lush, rich, and dynamic learning environments.
readaratus is a tiny proof of concept and computerized experiment to introduce some dynamism into the realm of books. Ultimately, readaratus aims to be a "self-decoding", "dynamic", and "machine-friendly" protocol for learning.
The following list demonstrates the current features of readaratus:
A minimal mechanism for converting units between SI and Imperial systems of measuring units.
It is common for pieces of text to be devoted to a some graphical representation. If a piece of text references a figure which does not reside in the same page, the reader has to manually find that particular figure in order to understand the point of the text. In physical books this is achieved by means of hands: groups of fingers dedicated to particular pages with intermittent activation of the page of interest. In a typical computer reading software this is done by frequent linking from the text to the figure and vice versa provided that the author has provided the link and the software provides facilitates for going back to where the user was. This process causes distortion in concentration whether the book is physical or digital. We have developed a mechanism for an on-demand invocation of figures wherever they are referenced. This mechanism is activated whenever the user moves his mouse pointer on top of the text that references a figure. Upon activation, the figure gets displayed on an overlaid box.
Users have the freedom to jump to various objects within the book: Page numbers, page labels, TOC items(introduction, epilogue, chapter I, Section 2.4, ...), immediate neighbor TOC items(next chapter, previous subsection, ...), and figures. A dialog is prepared that lets the user to find any object.
We have developed an enhanced TOC subsystem that lets users to efficiently navigate within the book. TOC is read, decoded, grouped, and then absolute and relative means for navigation are provided. By absolute navigation we mean tools that facilitate jumping, for example, from "chapter 3" to "subsection 8.2", and by relative navigation we mean tools that aid in navigating to the neighboring "chapters", "sections", ... .
Dual-page text lookup
A thin probabilistic proof of concept is provided in the "Find text" dialog where the user has the option of extending her textual search to two pages since it is common for a sentence to have some of its words in page "X" and the rest in page "X+1".
Real page numbers
Page numbers in PDF documents are either provided by the the authors or are indexed from 1. Indexed page numbers are often some pages ahead or behind the real page number(e.g. you are on page 45 but your software reports 72). We have developed means to overcome this issue and the page numbers you see or request are real as they are on paper.
Some PDFs lack digitized "Table of Contents". If a book fails to provide a TOC, we try to create a synthetic TOC out of it. This is achieved by some analysis performed on initial pages of a PDF.
In future versions we intend to introduce the following features(mainly for PDF documents):
Table to chart
Charts are easier to understand compared to tables. We need to have mechanisms that would allow us to convert a table into an interactive chart.
Live code snippets
A book about programming is full of code snippets. Someone learning to program in a computer language should be able to tinker with the code she is studying. Books accompanied with tiny on-demand interpreters for programming languages would make it easier to learn any programming language.
Keyboard-averse gesture commanding
Keyboards are low pass filters that translate the rich variety of signals issued from our hands into mild keystrokes. We are working on a gesture input system that goes beyond simple shortcut keys.
What formats do you support?
As of version 2.1 we only support the PDF format.
What is PDF?
PDF is an ISO standard for document storage and stands for "Portable Document Format". PDF has a page description scheme(borrowed from Ghostscript) that treats the page as a 2-D coordinate system where the top-left corner is assumed as the origin and objects(text, figure, ...) are geographically positioned across the page in this addressing manner. For more information please take a look at the PDF standard specification.
Is readaratus a software for reading books or a protocol?
Ultimately, readaratus would be a protocol for learning. This means that it stores the content and the decoder for that content in the same unit of storage. Think of it as a content ready virtual computer that can run on any machine and eliminates the need for an external reading tool.
It is a reading program since we have already a lot of books in the PDF format that need to be dealt with.
What do you mean by "machine-friendly"?
Current content storage systems are not designed to be retrieved by computerized routines. The PDF standard, for example, provides no directions for retrieving figures of a page, there is no way of distinguishing between header, body, and footer of a page, there is no way of extracting drawn charts, tables are difficult to extract and ... . To fully grasp he tyranny of the PDF standard just look how many projects are out there trying to extract information buried in PDFs. Word processors and type setting systems which are in turn used to produce PDFs burn all these useful meta-data about objects simply because the PDF scheme does not provide any means for storing them.
A protocol that provides the content is obliged to aid potential readers in decoding the content.
What do you mean by "self-decoding"?
A readaratus unit of storage is divided into several parts. One of the important parts contains the instructions used to prepare a decoding tool. This decoding tool is similar to the mainstream book reading softwares and is in turn used to display the book contained in the package.
What common features of mainstream PDF readers are missing in readaratus?
Annotations, form filling, password protection of files, and object(text, figure, ...) selection/copying are not incorporated in readaratus. We need to discuss these issues with people before incorporating them in our protocol.
What languages are supported?
As of version 2.1, we support books written in English.
What is the license?
GNU General Public License V3 or later.
How do I run the downloaded program?
Extract the file you've downloaded, make `readaratus` executable, and run. You would also need to install poppler-glib and Gtk3 runtime libraries.
Source code: https://github.com/rezahsnz/readaratus