The Important Work is a space for writing instructors at all levels—high school, college, and beyond—to share reflections about teaching writing in the era of generative AI. We hope to spark conversations and share ideas about the challenges ahead, both through regular posts and through comments on these posts. If you have comments, questions, or just want to start a conversation about this week’s post, please do so in the comments at the end of the post.
This week’s post is by Faun Rice, who is a researcher and writer based in Vancouver, Canada. She has worked in language revitalization with the Canadian Indigenous Languages & Literacy Development Institute (CILLDI) and in museum visitor research with the Smithsonian Institution. At the University of Alberta, she spent many hours as a writing coach and TA, helping students hate essay-writing just a little less. Today, she is the Manager of Research and Evaluation at the Information & Communications Technology Council (ICTC), where she explores the intersections of technology, policy, and society. You can find Faun on LinkedIn.
If you’re interested in sharing a reflection for The Important Work, you can find all the information here. Your reflection does not have to be about using AI in the classroom—we are interested in any way that you’re thinking about the important work these days. If you’ve redesigned assignments to avoid using AI, if you have strong feelings about when and how AI should or should not be used in the classroom, if you do something that you think works very well without AI, we want to hear about that too. —Jane Rosenzweig
About a decade ago, when I started my first social science research internship, I used a clunky transcription pedal to pause and play slowed-down audio to transcribe recorded interviews by hand. Now, I increasingly use Artificial Intelligence (AI) tools to speed up tedious tasks like transcription, and I’m willing to bet that the same is true for other social scientists. One very tedious task that is nevertheless foundational to most studies is the literature review.
To make sure I’ve found the most relevant literature that informs my research question, I still log the keyword search strings and databases I’ve tried, and manually discern from titles and abstracts what I should include. The process of discovery can take days. For some researchers, the traditional, labor-intensive method of conducting systematic keyword searches is being supplemented—or even replaced—by AI-powered software like Perplexity Deep Research, ScholarGPT (a custom ChatGPT for open-access scholarly materials) and Elicit. These tools promise to streamline literature reviews by identifying and summarizing relevant information. If we could trust AI tools to search exhaustively, identify relevant material, and summarize it, we’d get back many hours of our lives.
But can we trust AI tools to perform well, especially regarding niche topics?
A thorough literature review helps identify gaps in existing knowledge—unanswered research questions that present opportunities for new contributions. If even one little-known study has already addressed a question, it's vital that AI-powered literature review tools can uncover it just as effectively as a manual keyword search would. This ensures that researchers build upon existing work rather than inadvertently duplicating efforts.
As a social researcher working in the non-profit sector in Canada, I wanted to evaluate AI literature review tools to help professional researchers understand their uses and limitations. But I am also a former writing coach and research instructor, and I think the results of this evaluation also raise important questions for how we teach student researchers and non-fiction writers.
Testing AI Tools Against a Known Benchmark
To evaluate AI literature review tools, I revisited a manual literature review I conducted in 2017. This was a knowledge synthesis project aimed at gathering, digitizing, and summarizing all available information about an Indigenous Dene language in the Sahtú (Great Bear Lake) region of northern Canada. We created a database hosted on a regional organization's website. We also published a final report, which exists on a Government of Canada website and on ResearchGate. Knowing this specific body of information exists open access, I wanted to see if AI tools could locate it based on the research questions from our original funding application.
Why are Niche Topics a Problem for AI? The “Long Tail Data” Problem Explained
Imagine all the world's written content stacked in a vast warehouse. English-language recipe blogs from the 2010s would form towering piles. In contrast, 18th-century Chinese recipes would create shorter stacks. Other topics, such as traditional foods in Lakota—a language primarily oral until colonization—might barely form a pile at all. Arranged from tallest to shortest, these stacks create a slide-shaped curve with a "long tail" of tiny stacks representing topics with minimal data.
Large Language Model (LLM) tools like ChatGPT underpin most AI literature review software. LLMs are trained on vast amounts of internet data, mirroring the distribution of information in our hypothetical warehouse. They excel at generating responses based on commonly associated words and phrases but struggle and are more likely to “hallucinate,” when the data is scarce.1
How Did the AI Tools Perform?
I provided three AI tools with a prompt mirroring our original research objectives2 and then assessed what they found.
Perplexity Deep Research
Perplexity found our final knowledge synthesis paper. It also found a relevant thesis by Laura Tutcho, a Sahtúgot’ı̨nę speaker and expert. However, like other tools, it lacked discernment between different Dene languages: its second most relevant resource was a master’s thesis on Dene Tha, its fourth was a Government of Canada funding announcement about Dene Zhatié.
ScholarGPT
ScholarGPT found our final paper but missed resources from our database. Its first three resources were relevant to the Sahtú, its fourth was a thesis about Cree (though on a very relevant topic), and then it turned to continent-wide resources like Nancy Turner’s book on ethnobotany.
Elicit
This tool primarily surfaced academic articles, including some relevant work on biocultural diversity in the Sahtú region. It also ventured into theoretical discussions linking language, culture, and environment, but did not find our paper—likely because it was a government-funded report rather than a peer-reviewed journal article. To be fair, there is a version of Elicit that claims to do systematic reviews that I did not try this time because of the subscription fee.
What Did They Miss?
Each of the tools successfully found a small number of relevant articles about language revitalization in the Sahtú. Then, they turned to other, less-relevant articles about other Dene or Indigenous communities to fill in the gaps. The problem with this is that there are thousands of relevant documents about the Sahtú. These tools missed:
Digitized Archival Documents: Scanned documents in the open access community archive were overlooked.
Key Academic Works: Important resources like Sahtú linguist Fibbie Tattie's contributions and books like the original Sahtú grammar were not found.
Government Publications: None of the tools prioritized Dene curriculum documents or NWT annual reports on official languages.
Sources in Physical Archives but with Digital Metadata: A better tool would have identified relevant materials in locations like University of Alberta Circumpolar Institute or the Prince of Wales Northern Heritage Centre and recommended contacting staff.
Why the Shortfall?
The shortcomings point to a big problem: AI tools struggle with discernment when dealing with long-tail data. With less information to draw from, it's harder for AI literature review tools to determine relevance. Additionally, algorithms that prioritize sources based on citation counts or prestige inadvertently reinforce existing biases, sidelining voices from the communities in question—such as local teachers' archives and publications, and other expert products coming from within the Sahtú.
The Risk of Reinforcing Data Gaps
Used uncritically, AI research assistants risk perpetuating a cycle where only easily discoverable sources are read and cited in research products. Qualitative research methodologist Margaret Roller recently argued that AI adoption belongs in research “to the extent that AI is facilitating processes but not dominating the thinking directly related to the data itself.” “The problem,” she continues, “arises when we allow AI to shut off our thinking.”
Takeaways for Teachers
While AI literature review tools are tempting supplements to a time-intensive process, they currently fall short in navigating the long tail of data, the very place where many researchers aim to contribute new knowledge. We rarely expect students, especially those learning to do research for the first time, to contribute new knowledge. However, as teachers or mentors, I think we have a few opportunities:
To treat the “long tail” of hard-to-find data as a treasure hunt, challenging students to find and evaluate sources that aren’t appearing in online aggregation tools. (Even better if we can send students to a physical library or archive and demonstrate just how much material isn’t digitized.)
To challenge students to find patterns in sources that are available via AI-assisted search tools, and to think critically about why those resources appear while others do not.
To challenge students to shape research questions that fill a real gap in what is readily available via AI-assisted synthesis.
AI search tools have improved considerably over the past year, and their capabilities will likely continue to evolve. Poor performance with sparse data is also a known problem in AI, one that many are seeking to improve. However, uncritical reliance on AI tools for literature discovery could cause a self-reinforcing problem if newly published papers continue to miss harder-to-find work. To keep contributing novel, inclusive, and comprehensive research, it’s essential that we remain critical users of these technologies.
A “classic” long tail data problem is when a machine learning algorithm encounters an uncommon input. For example, an autonomous vehicle might not be able to classify a pedestrian pushing a bike in at an unusual angle. Both issues are related to data sparsity, but the problem described in this article is more about retrieval failure than failure to accurately classify or generalize.
Prompt text: March 7, 2025.
I’d like your help doing do a systematic literature review. I am going to give you text from a grant proposal with the objectives of your literature review. Can you help me (1) by providing me with a list of the most relevant resources you find and (2) provide me a 300-word synthesis of these documents?
Here is the text of the grant proposal:
“For the first time, three previously isolated areas of literature and data—Dene kedǝ (North Slavey language), Dene ts'ı̨lı̨ (being Dene, Dene ways of life), and Dene ts’ǫ́ dane (Dene youth)—will be brought together in order to address uniquely Dene integrative questions and strategies in sustaining biocultural diversity in Northern Canada. The project undertakes three key objectives with respect to community visions for Dene kedǝ and Dene ts'ı̨ lı̨ revitalization in the Sahtú Region of the Northwest Territories, considering Dene ts’ǫ́ dane as a focal point for each objective: 1) synthesize knowledge about the status of Dene kedǝ and Dene ts'ı̨lı̨ revitalization programs; 2) assess current data on indicators related to status and trends in Dene kedǝ and Dene ts'ı̨lı̨ variation, continuity and change; and 3) disseminate and mobilize lessons learned in indigenous language and cultural revitalization within the context of regional and cross-regional aboriginal governance, collaborative resource management (co-management), and territorial policy frameworks.”
Love that you drew the line from using AI in research directly to teaching about research in the age of AI. Wonderful essay.
Excellent, and very clear. I really enjoyed the specificity of the example you walked readers through.