Digital Scholarship in the Humanities:
I've had a longstanding, friendly debate with a colleague about whether it is sufficient to provide page images of books, or whether text should be converted to a machine- and human-readable format such as XML. She argues that converting scanned books to text is expensive and that the primary goal should be to provide access to more material. True, but converting books into a textual format makes them much more accessible, allowing users to search, manipulate, organize, and analyze them. Here's my summary of what you can do with an electronic text. Most of these advantages are pretty obvious, but worth articulating.It's not digital text if it's an image file. It's just an image, that might contain anything at all. Vannevar Bush's Memex was an idea for a text storage-and-retrieval system that worked by storing and linking microfilm images of pages of text, but his vision was purely analog. Page images do provide a certain amount of information, and today it's not too hard to find tools that convert page images to text, but an archival project is incomplete if the digitization process stops at simply supplying images of the the material to be archived.
Leave a comment