LLMs and the Problem of Meaning
If you put garbage into an LLM, you may get (fancy looking) garbage out
The Important Work is a space for writing instructors at all levels—high school, college, and beyond—to share reflections about teaching writing in the era of generative AI.
This week’s post is by Adrian Rowland. Adrian teaches scientific writing and EAP in the Center for Language Education at the Southern University of Science and Technology (SUSTech) in Shenzhen, China. His PhD is in chemistry, and he had an earlier career as a secondary school (high school) teacher of chemistry.
If you’re interested in sharing a reflection for The Important Work, you can find all the information here. —Jane Rosenzweig
“Basically unreadable.” This is how my ex-student Lu Hongyi, who took my scientific writing course in 2022 and is currently completing his computer science PhD, recently described some papers by colleagues of his. These colleagues “extensively use LLMs” when writing, and their papers were poor, he said, because the LLM use had disguised bad logic rather than improved it; his colleagues had “fed in simple texts... too simple for LLMs/people to fully understand, and used LLMs to expand into lengthy stuff with a seemingly good look... but it just made it really hard to get the gist.”
Did he feel that that LLMs had no place in scientific writing? Not at all. As he also said, they are useful for “improving” and “polishing” writing, for example by fixing grammar problems.
Hongyi’s account is a helpful route into my chosen topic—LLMs and their use in writing assessments—because such a discussion must be grounded in a realistic view of what LLMs can and cannot do. In my experience of experimenting with their application to scientific text, text written by my students and by members of the chemistry faculty for whom I have provided editing assistance, LLMs are highly effective at improving cosmetic aspects of prose, but even with specific prompting cannot be relied upon to fix1 problems of meaning: to identify that a paragraph has failed to state its real point or that an explanation has made an excessively large leap in the middle, to correct sentences that are misemphasized or ambiguous or outright wrong.
I shall give an example. Everyone reading this is likely familiar with the principle of “given before new” in sentence construction, the principle that sentences should start by providing information with which the reader is already familiar and present new information later. There’s a great piece of student text I like to use in class as an example of the importance of this principle in scientific writing. Here it is:
In the past three decades, lithium-ion batteries (LIBs) have become the most successful rechargeable batteries due to their high energy density, long cycle life and high working voltage. However, the energy density and rate performance of LIBs cannot meet the demands created by the fast development of electric vehicles and portable devices. Ni-rich layered oxide materials with the chemical formula of LiNixCoyMnzO2 (x+y+z=1, x≥6) has the advantages of high specific capacity (200 mAh g-1) and low cost, which has made it the most promising candidate for the next generation of cathode materials for LlBs.
Hard to read, I think you’ll agree. The main problem is that the only familiar item in the third sentence is LIBs, and they appear right at the very end of the sentence. It’s not actually possible to get them all the way to the beginning of the sentence (have a try!)—and this implies there must be some step of reasoning missing from the explanation. Here’s an improved version with the missing step added:
In the past three decades, lithium-ion batteries (LIBs) have become the most successful rechargeable batteries due to their high energy density, long cycle life and high working voltage. However, the energy density and rate performance of LIBs cannot meet the demands created by the fast development of electric vehicles and portable devices. One of the factors that most severely limits LIB performance is the cathode material. The most promising candidates for the next generation of cathode materials are Ni-rich layered oxide materials with the chemical formula of LiNixCoyMnzO2 (x+y+z=1, x≥6), which have the advantages of high specific capacity (200 mAh g-1) and low cost.
Paying attention to “given before new” is a great way to notice and fix broken explanations like this one, and I try to help my students become accustomed to applying it.
Though the process of fixing this particular text is not hard to reason through, ChatGPT has consistently failed to achieve it. I’ve given this paragraph to several versions of ChatGPT and asked it to act as an editor, specifically telling it to insert any missing information and to ask questions if anything is unclear. Every time, it has asked no questions and returned text that looks smoother than the original but still lacks the critical piece of information about cathode performance. Here’s one of ChatGPT’s attempts:
Over the past three decades, lithium-ion batteries (LIBs) have emerged as the premier rechargeable battery technology, lauded for their remarkable attributes such as high energy density, extended cycle life, and elevated operating voltage. Nonetheless, the burgeoning demand spurred by the rapid evolution of electric vehicles and portable electronics has exposed limitations in LIBs’ energy density and performance capabilities. To address this challenge, researchers have turned to Ni-rich layered oxide materials, characterized by the chemical formula LiNixCoyMnzO2 (where x+y+z=1, and x≥6). These materials offer a compelling proposition with their combination of high specific capacity (up to 200 mAh g-1) and cost-effectiveness, positioning them as the front runners for the next generation of cathode materials in LIBs.
This may be more pleasant to read—though even this is debatable in my view, given the advertising-copy-like tone that has appeared out of nowhere— but the main problem remains unsolved. The reader still has to worker harder than they should to appreciate the overall argument being made, because they are still being left to work out for themselves that cathode material performance is a major problem. Indeed, I suspect many people would move on from this smooth-reading paragraph without having comprehended this point at all.
Seeing this example, one can appreciate my ex-student Hongyi’s comment about papers written with too much reliance on LLMs. It is easy to imagine that the cumulative effect of multiple LLM-edited paragraphs like this one, paragraphs that appear fluid but fail to state key points, might well be a paper that has “a seemingly good look” but is nonetheless “basically unreadable.”

No doubt someone will put the text above into an LLM and manage to generate a response that solves the problem, but in my view that’s beside the point: if I gave ChatGPT a reasonable prompt and it failed to give me a good answer, that is sufficient to demonstrate that it is not reliable as a writing aid when it comes to fixing explanations that have made excessive leaps.
It also cannot be relied upon to resolve ambiguous sentences. Another piece of text I like sharing with my students in class is a sentence from some chemical writing I edited for a member of faculty, presented here simplified and anonymised but retaining its essential structure:
Next, we explain the challenges and benefits of using X molecules in roles Q and R to replace the rare and more expensive Y molecules and Z molecules in reaction D with a certain amount of chemical T.
When I first read this, I found the sentence structure a little difficult to disentangle. Discussion with the author made it apparent that a clearer expression might be as follows:
Next, we explain the challenges and benefits of using reaction C (with X molecules in roles Q and R) to replace reaction D (which requires rare and more expensive molecules Y and Z, as well as a certain amount of chemical T).
In my view, the original sentence could be read to mean this, but it is by no means clear that it should mean this. The original sentence was not a successful expression of the author’s ideas.
Now, when I ran the original sentence (with several hundred words of surrounding text as context) past ChatGPT and asked it to act as an editor, the results were interesting. It was given three bites of the cherry. Once, it successfully rewrote the sentence...and twice it left it basically unchanged and just as unclear.
The outcomes described here should be no surprise, given that LLMs function not by reasoning but by “estimating the likelihood that a particular word will appear next, given the text that has come before,” leading to output that is created to appear convincing rather than created to be true; they are, ultimately, bullshit generators, in the philosophical sense of bullshit as communication indifferent to truth. There is no reason to expect such a tool to infallibly intuit the unexpressed intent of an author. To adapt a common computer science expression: it’s quite possible that if you put garbage into an LLM, you’ll get (fancy-looking) garbage out. This means that if your aim is to truthfully communicate exact ideas, rather than merely produce some text that is largely correct and appears reasonable, the quality of the input does matter—as does the expertise to effectively assess the nuances of LLM output. It follows that independent proficiency in the writing skills that allow an author to clearly and unambiguously express what they actually mean, with their intended emphasis, at the sentence, paragraph, and document level, remains of value for LLM users.
If we want our students to be developing these writing skills, we need to be assessing their proficiency in these skills. This suggests a problem with allowing the use of LLMs during writing assessments, because they are very good indeed at giving the appearance of most writing skills in the sorts of relatively simple writing tasks that are suitable for assessment, tasks that can plausibly be attempted by developing writers and marked by a teacher without the need for detailed discipline-specific expertise. I’m thinking here of tasks like the classic five-paragraph argumentative essay, for example, or reflective writing, or critical commentary. In other words, though LLMs are not powerful enough to remove our students’ need for strong writing skills, if those students wish to one day be good at scientific writing (or, I suspect, any form of complex writing where strict adherence to truth is desirable), they are more than powerful enough to destroy the integrity of the assessments that measure whether or not students are building those skills. What to do?
In my own writing course, I have taken the most obviously robust route to assessing my students’ own skills and switched most of my assessment to a traditional, pen-and-paper final exam. This approach also has the advantage of encouraging responsible LLM use during the course. Students can approach their formative coursework and other practice exercises however they like, with whatever level and mode of LLM use they prefer, but they are motivated to make their LLM use genuinely educational by the knowledge that at the end of the course, the skills they are supposed to be developing will be tested in an LLM-free exam.2
LLMs should be tools for empowerment, especially for (and this is my own particular area of teaching interest) people writing in their second language. An LLM is without doubt a tremendously powerful means for helping second-language authors in particular to find the right vocabulary, explore alternative phrasings, and, perhaps most importantly, smooth out comparatively minor issues of grammar and expression that might otherwise, unfairly, make their writing appear less sophisticated and lead others to take their ideas less seriously. But an LLM will be a more precise and truthful tool to do all these things in the hands of an author who has been educated to a higher level of writing proficiency than a lower. We should be careful that LLMs do not become tools of disempowerment, as well as a menace to the integrity of the scientific literature, by letting them undermine writing assessment and hence the careful nurturing of writing skill.
I thank my friend Lu Hongyi for several stimulating discussions about scientific writing, his permission to relate the conversation summarized above, and his comments on an earlier version of this essay.
As mentioned above, I’ve done editing work for several members of faculty in SUSTech’s departments of chemistry and computer science. I believe the draft manuscripts I hold would (with the permission of their authors) make the basis of an interesting research paper: patterns of LLM strength and weakness in editing could be examined by asking an LLM to edit these drafts and comparing the results to my own editing and/or the final published versions. If anyone reading this has thoughts on how such a study might rigorously be approached and might like to collaborate on research for publication, please free to get in contact via adrian@sustech.edu.cn.
Not “cannot ever fix” but “cannot be relied upon to fix.”
I do wonder if this only works because I am teaching adults; if I were teaching children, I imagine it might be necessary to restrict LLM use in the classroom.
As a non-scientist who sometimes teaches STEM-oriented writing classes, I found this immensely helpful. You're drawing attention to something that's a key dynamic in student writing: developing the ability to articulate a clear chain of thought building from things that we know to things that we're figuring out. If LLMs can't reliably engage with this work, that represents a very significant limitation to their performance when it comes to clearly communicating technical information. I'd just add that even if they *could*, turning this work over to the LLM would still have a negative impact on student learning.
I do have questions about using AI as a grammar-checker, especially for people who are writing in a language that they're not fully fluent in. As the AI rewrite of the battery passage suggests, AI tools tend to make changes that go beyond merely correcting errors of grammar and usage. In this case, the AI rewrite slathers on purple prose. The passage becomes less clear, the tone becomes less context-appropriate, and the writing takes on the characteristic style of AI slop in a way that expert writers are likely to pick up on but novice and especially language-learning writers are likely to miss. In fact, I'd argue that the AI rewrite is actually substantially worse than the original version, and it's worse in ways that distract from the subject at hand, are harder to fix, and expose the writer to greater risks. This seems like a poor trade-off, especially when there are other tools that can help students catch mechanical errors.
This is great.