Text to Speech (Jerz’s Literacy Weblog)

Over the summer when I spend little time in the office and a lot of time outdoors, I often fall behind in my reading. The past few weeks I have been using TextAloud, a fairly simple but interesting program that converts text files to MP3s. I then put the MP3s on my PDA, and have listened to student papers that were submitted to finish off incomplete grades, a dissertation chapter that touched on a subject I know a little bit about, an administrative planning document on assessment, a 93-page article of mine that I’ve been developing, on and off, for about five years; and today when I drive to work briefly I’ll be listening to a Gamasutra article on Zork.

TextAloud offers a free version, which was good enough for short and routine stuff, but the AT&T professional voices sound excellent — far better than anything I had ever experienced before, and I figure they’re well worth the cost of about a DVD movie each (one male, one female).

I have been toying with the idea of having my journalism students practice taking notes from audio recordings, and I figure a tool like this will let me work a little more efficiently, since I won’t have to get a voice actor to record the dialogue each week. Of course, once I get a sense of what kinds of mistakes the students make, I can firm up the scripts and get someone to record them more dramatically.

I can imagine, with this text-to-speech program, setting up an RSS feed of all my student’s overnight blogging on a given topic, converting it to an audio file, and then listening it on the drive in to work.

It almost makes me wish I had a longer commute.

Post was last modified on 26 Jan 2018 10:58 am

View Comments

  • I'd say that reading a book is as much a spatial experience as a temporal one, and that fixing in time is more what I expect from an audio recording (from the Latin recordare, "to bring back to the heart," with "heart" meaning something similar to what we mean when we say we know something "by heart" -- more deeply than knowing it by rote). Just as reading moved from being a public experience (i.e. the masses being read to in formal situations) to a private one, music has in the last generation moved from being a communal experience to a private one (first with the Sony Walkman a generation ago, and of course now with those ubiquitous white iPod earbuds).

    Since the MLA style format standardizes the form of an academic essay, students in a lit class don't get to do much with typefaces and page layout, though, so I don't think a text-to-speech conversion is as violent a remediation as you seem to suggest. It's only a little more exaggerated version of what happens when a student reads a paper word-for-word and calls it an oral presentation, or when a speaker fills each slide of a presentation with a paragraph of text, and then fills up the presentation time by reading each and every word from the slide.

    In class I regularly have students get together in groups of three, and student A is supposed to listen in silence while student B reads student A's draft aloud to student C, and student C and student B have a conversation based on the draft. I like this arrangement because even a student who comes to class without a draft can still participate meaningfully, and the source of much of the exercise's value is the very decontextualization that you warn about.

    I'm enjoying this, too... you have a wonderfully penetrating way of getting me to examine my assumptions, and I value your candor and wit. I'm also looking forward to the return of Pedablogue.

  • THAT AUDIO FILE MADE ME DO A SPIT TAKE!!! Hilarious!
    (I love the Kirk moment...hah!)

    Great counter examples about great poets. Poetry always "doesn't count" because it always already experiments with form and medium, however, and is inherently assumed to find truth value in the oral first (ergo it rhymes) and the written second (ergo, there are line breaks)...though the modernists opened their eyes to the page, so to speak.

    This is fascinating stuff.

    In any case, since you asked about paper on screen vs. on paper, that's obviously a shifting variable. But what remains constant between them? Text as a visual symbol (e.g. font) that embodies the ideation, or even the utterance. I'm not sure where literary theory comes into play here, but I'm all in favor of erring on the side of an open definition of text... however students might not, when it comes to evaluating their work.

    So off the top of my head... While e-text on screen or paper is obviously mutable, if characters, etchings, or cuniforms are involved, it's writing, as far as I'm concerned. To de-inscribe that into audio is to remove it from its context to some degree, whereas to transcribe it into braille is not. The act of inscription comes into play when writing is involved. The urge to fix in time. I'm sure Derrida would have something to say about privileging utterance in this dialogue, but if we open that can of worms, then anything goes and you might as well grade papers by having a computer tap them out in morse code or graphically depicting semaphore with a dancing, flag-waving icon.

    Hey, whatever floats your boat or steers your airplane....if you're listening in conjunction with reading, then I can see the usefulness that you find in it. But I think there's some wishful thinking involved here. As with all media -- which we know literally means to come "between" -- you're inviting something to mediate between the writer and the reader that wasn't present when the paper was written or submitted. Obviously, I think relying on text-to-speech audio only would make a poor substitute for critiquing the papers (or the screens) themselves. Podteaching, as it were, may not be the best way to help a writing student improve, though it would be perfectly fine for a communications class or some other course where information was more important than writing skills. In any case, I don't mean to be too argumentative; I'm actually enjoying thinking about this out loud with you. I just bought a microphone that plugs in my ipod and am likewise enthralled by all these audio possibilities (as my upcoming CD, Audiovile, clearly attests). You've clarified your points quite well now and I see your viewpoint. Or should I say 'I hear you' loud and clear?

    Flag left, flag right....flag shrug. (I am travelling this week so might not be able to reply for awhile.)

    -- Mike A.

  • Am I still streaming through your mp3 player? How does this sound...

    A film reviewer doesn't need access to the script, but access to the script makes a different kind of cinema criticism possible. And indeed, a good music reviewer does not "just" read the lyrics -- he or she uses the lyrics to supplement the experience that comes from listening to the music.

    I should clarify that I didn't listen to the student papers *instead* of reading them. I listened to them *and* read them.

    After listening to the audio, I did read through and annotate a Word file that showed the student's rough draft, my comments, and changes that the students made to those comments. We all have different "reading" modes -- sometimes reading in order to plunder a text for information, to probe it for vulnerabilites we can exploit and beauty we can treasure. Given the fact that I had access to several layers of the student's work, I feel that I needed the linearity of the audio file in order to help me keep track of the student's current argument, so that it wouldn't get lost in all the layers.

    I find that it's very easy to catch redundancy and word omissions when a voice (even a monotonous one) forces you to listen to every word.

    I agree that text has an important visual component. Blind Homer can get away with being blind since he lived in an oral culture. But would I be nudging you too far down the slippery slope if I asked you whether we should only study work that Borges and Milton created before they went blind? If Samuel Johnson had two weak eyes, would his dictionary be more or less relevant than if he had one sighted eye and one blind eye? Does Hellen Keller's autobiography count as a literary text?

    Text-to-speech and speech-to-text are not quite the same things. A human speaker who communicates irony through facial expressions or shades a speech through timing is giving out far more information than can be encoded in a text file, at least not in a machine-readable form. (My knowledge of Samuel Clements might inform me that a particular passage is meant to be ironic, but a literary work that is printed and bound for a novel-reading public does not contain the coded information that a computer program would need in order to convey "irony" in a given line.)

    I don't have students read word-for-word from papers that they are about to hand in. I do often ask students to give an oral presentation on a particularly thorny or risky area of a subject that they're working on for a formal paper.

    I'm curious as to your response to some of my points about reading a paper on screen vs. on paper, or on the tiny screen of a PDA.

  • Please don't read ahead...transcode this to an mp3 file and then listen to it and see if it still sounds like me, from the snarky sarcastic barbs to the concerned intellectual appeals.
    ********

    Oh come on: Why don't we throw out the novels and just teach audiobooks in our literature classes, too, while we're at it? Heresy, Jerz! Heresy!!! :-)

    Okay, I can see the value in what you're doing because it may help you, in particular, to process text in a way that's fun and enlightening for you. But I think a lot of my concerns remain, and I might as well argue with you awhile. After all, we're on summer break!

    I guess the reason I'm motivated to argue stems from your example of being maimed and having to compose via dictation. That almost swayed me...until I realized it actually supports my position: why metaphorically maim the student writer via this technology when grading them?

    I guess what my position comes down to is the subtle shifting of the ground that occurs when you process and evaluate a text. A movie reviewer doesn't review a film by reading the script. A music reviewer doesn't review a new CD by just reading the lyrics. These are examples where evaluation would be problematic because they've shifted the ground from audible to written text. You're proposing the converse, but I think that if the content is the same both ways, then it shouldn't matter which side of the equation you start with.

    If a student writes a paper, they should have the expectation that their paper will be evaluated by a reader who casts his eyes upon the page (whether on the monitor or in print). If they give a speech, then their speech should be evaluated by a listener. You have said that you do both...but then I wonder, why bother with both?

    Reverse your assumptions and see if they're equal: Would you likewise evaluate a speech based mostly on how well it was written, if you had a voice-to-text encoder, eliding the delivery entirely?

    You see: I believe you've conflated the two processes, and even though you may be a skilled and experienced listener -- one capable of taking notes while, say, driving with the CD player on, or surfing the web while streaming an mp3 -- I'm not sure that's an entirely fair way to assess the student who "intended" to be read, not heard. And the way things sound will also colorize your feedback in some way, whether subtle or overt.

    Perhaps it just comes down to my knuckleheaded assumption that audio doesn't really "count" as e-text. Does it? To what degree are visual characters and paragraph structures necessary for something to constitute 'writing'? I've read my Walter Ong, but I'm not sure that this is really about our culture entering a second oral phase. It might be more about trying to cut corners that ought not be cut.

    However, I do assign speeches and have poetry writing students recite poems and perform plays when we do those in lit classes, so maybe I'm just having trouble seeing what you're proposing as being on the same level as these oral-based actions. Probably because the students themselves aren't doing the speaking in your grading process? Why not have them read their papers on the microphone and collect/listen to them that way first? That seems, in the very least, more fair to me. And it might even help them to hear the problems in their writing for themselves.

    ***
    Am I still streaming through your mp3 player? How does this sound:

    I am a robot. I am a robot. Moop. Meep. Beelzebub. One. Zero. Zero. One. Zero. Zero. One.

  • I definitely read the final paper before I assigned the grade, so I think of the audio version as something extra. I should note that the class in question was an upper-level lit crit course, where the bulk of the weekly classwork was writing a short analytical essay and then discussing our methods in class. Since the papers in question were all from students who took incomplete grades, the context in which I evaluate them is already different from the context in which I graded all the others. I felt that listening to all their drafts before I marked the papers would help me return to the mental state I was in when the class was still going on.

    Listening to the essays was part of my attempt to focus on ideas, without being distracted by MLA formatting minutiae or punctuation errors (which loom larger in the printed text then they would in the audio version).

    Relying on the skills I developed when I worked in radio news (an environment where I couldn't pause an event in order to ponder a thought), when I had a thought while listening to the text, I jotted down the elapsed time and then just a few words, and then when I had the time to review the paper again, the quick note was enough to recall the rest of the thought to mind.

    I was surprised at how well I followed the dissertation chapter.

    I wouldn't have tried this if I hadn't within the past few weeks listened to two different drafts of my own 95-page article, and noticed how listening to a text that I was overly familiar with made me notice new things.

    I have experimented with listening to the student's new draft while holding a copy of the student's old draft, and vice-versa; I also used Word's "compare documents" feature to see exactly what the student changed between drafts.

    My voice recorder does have very simple controls for pausing and rewinding a few seconds at at time -- I can access those controls without even taking the recorder out of its case.

    If I lost the use of my hands, and had to compose via dictation (to a computer or human), that would certainly affect the composition process, but I'd still say it was writing.

    Yes, I'm sure that I do "read" a student paper differently when I hear it, but I also "read" it differently when I see it on my big LCD monitor at work, my small monitor at home, or the teeny monitor on my PDA. I "read" it differently when I'm holding a printout from my smudgy home inkjet printer, or from my crisp printer at work. I am thinking of a "history of writing" theme for "Media Aesthetics" in SP2008, something that focuses on the history of thought as it changes from classical Greek oratory, to Roman bureaucracy, medieval scriptoria, the printing press, and of course e-text.

    As for the clever students who are skilled at writing for the ear -- I don't think I'm any more or less susceptible to that kind of style than any other kind of style that might compete with substance for my attention.

    I did find it psychologically important to choose a voice that matches the gender of the author. (I haven't yet tried listening to an essay co-authored by a male and female.)

  • I admire your creative attempt to manage work and time, and I continue to be impressed by your experimentation. But I have trouble seeing this as a proper and practical means for evaluating student writing. Maybe I'm just more old school than I realized. But...I'm riddled with questions.

    Can you really evaluate writing when its transcoded to an audio file, where you can't really see the structured composition of the paragraph all layed out before you?

    Will the program's voice colorize (or, alternatively, rob the color from) the student's own writing voice?

    Is it still "reading" when you're "listening"?

    Is it fair to grade "writing" when you don't even read it in your evaluation?

    Will students who are clever try to play to your ears rather than your intellect? Will you punish those who don't?

    Will you seriously take the time to rewind when you miss a beat, when you could otherwise just flick your eyeball up in a microsecond to review what you missed? Will you stop the stream of sound to contemplate an idea?

    etc. etc.

Share
Published by
Dennis G. Jerz