Robots Replacing Readers

The non-profit Allen’s Institute for Artificial Intelligence (AI2) publicly released an AI tool for the scientific journal search engine Semantic Scholar.

Reading Time: 4 minutes

As our lives have shifted from the physical world to the virtual one, our definition of a “normal” day has undoubtedly changed. Instead of spending hours in school filling out worksheets, we are using online platforms to complete assignments. Despite the change in environment, there are still tasks we must perform. As students, we have all needed to do research. Whether it’s for an English paper or a biology lab report, these lengthy assignments take extended amounts of time to research for and write. Oftentimes, the process is prolonged due to difficulties in analyzing certain articles. Students may find themselves scrolling through countless pages packed with technical jargon that is difficult to understand. What if there was a solution to this problem?

Recently, the non-profit Allen’s Institute for Artificial Intelligence (AI2) publicly released an AI tool for the scientific journal search engine Semantic Scholar. AI refers to any task performed by a program or machine that is deemed to require human intelligence. Though this definition has been debated, tasks involving planning, learning, reasoning, knowledge representation, perception, or creativity are generally within the capabilities of AI. After analyzing a research paper, the free tool creates a few short sentences to summarize it. The team refers to these reports as “TLDRs,” a common acronym for “Too Long Didn’t Read.” Though the technology has only been applied to articles covered by Semantic Scholar, the results have proved promising. Dan Weld, manager of the Semantic Scholar group at AI2, says that “people seem to really like it,” as the tool has helped readers sift through papers faster than if they had to read titles and abstracts, especially on a mobile device. A preprint, which is a journal article published before formal peer review, described the instrument and was first introduced on the online archive arXiv server in April of 2020. The developers have also published the code along with a demo website that allows users to try out the technology. Because the tool is still being fine-tuned, it is expected that new versions will be released with the ability to summarize articles written by other publishers with equally impressive efficiency.

This device isn’t the first of its kind, though. AI technologies have been developed with similar abilities already. For one, in 2018, the website Paper Digest started providing short summaries of the papers that it published. These summaries consisted of key sentences extracted from the article rather than newly generated ones. As a result, the summaries weren’t as thorough and often focused on one particular topic rather than the article as a whole.

On the other hand, TLDR can generate a unique sentence from a paper’s summary, introduction, and conclusion. This advanced capability comes from the use of deep learning as opposed to basic machine learning. Deep learning is arguably a more intelligent sect of machine learning as it “structures algorithms in layers to create a ‘[deep] neural network’ that can learn and make intelligent decisions on its own.” In contrast, machine learning simply “uses algorithms to parse data, learn from that data, and make informed decisions based on what it has learned.” The former is capable of running without guidance as it can determine the accuracy of its outputs and fix them accordingly, while the latter depends on a physical engineer to step in and make adjustments. As a result, deep learning tends to be more capable, efficient, and more accurately mimics the human brain. These deep neural networks were first trained on tens of thousands of research papers using software that used frequently occurring phrases and concepts to summarize the papers. Once the networks learned to generate concise sentences by scanning through the paper and correlating it to the brief title, the team provided the software with thousands of computer science papers. They also provided the program with summaries (written either by the papers’ authors or by students) as a means of self-checking. The deep learning software was then able to identify its errors and adjust itself when its output was not similar to the summary provided. The team will continue to feed the tool with training examples from different fields so that it can learn from diverse writing styles and approach analyzing papers more efficiently.

Currently, the tool is limited to the 10 million computer science papers covered by Semantic Scholar. The summaries are usually constructed from key phrases within the article and are aimed more toward individuals who understand the paper’s jargon. Though the original intended audience was scientists or professionals in the field, it doesn’t seem far-fetched that versions of the tool may someday be designed for non-expert audiences. Perhaps the tool will eradicate the need for the “middle man” media that interprets information in complicated papers before conveying it to the public. If all it takes is a browser extension, one may not need to search through countless websites for a simplified version of an article. The tool may even be helpful for educators who now spend hours reading through their students’ research papers or essays. It is no secret that some end up skimming through the papers regardless, so the tool may result in more meaningful grades as the analysis is done via AI.

Overall, this advancement in technology has the potential to improve the lives of people from a variety of fields. As Jevin West, an information scientist at the University of Washington in Seattle, said, “I predict that this kind of tool will become a standard feature of scholarly search in the near future. Actually, given the need, I am amazed it has taken this long to see it in practice.”