Philly ETE 2023 — Tips For Detecting The Use Of AI-Generated Text — Dr. Jake Ryland Williams



Few researchers have access to the resources needed to train the state-of-the-art language models (LMs) used in cutting-edge technologies. Processing “big data” over computational frameworks and expensive GPUs, there are substantial environmental implications: in 2019, one team of researchers estimated that 626,000 pounds of carbon dioxide were produced from the costs associated to producing one model’s parameters (GPT-2’s)—the lifetime emissions of approximately five cars. Its developers, OpenAI, reported in 2018 that “since 2012, the amount of compute used in the largest AI training runs has been increasing exponentially with a 3.4-month doubling time”. After OpenAI released GPT-3 in 2021, a report estimated that by using “10,000 GPUs and 400 gigabits per second of network connectivity per server”, the months it took to process “45 Terabytes of text data from all over the internet” means that “GPT-3 could have easily cost 10 or 20 million dollars to train”.

Staring down this trend in 2018, OpenAI even suggested: “it’s worth preparing for the implications of systems far outside today’s capabilities”.

This talk will demonstrate software aspiring to address the profound need for more efficient systems. The presented NLP framework operates shallow neural networks that are optimized locally, using only a single pass over training data, and without the need for gradient descent—the de facto standard algorithm, underlying training processes for most modern applied AI technologies. Our framework is in early stages of development and is based on some of the conclusions drawn from unpublished discoveries that extend from the same statistical theories developed by the presenter to detect when AI, or bots, generate text.

A review of tips and tricks used to identify bots will then be followed by a Q/A session, likely to consider ChatGPT’s implications.

About Dr. Jake Ryland Williams

Dr. Jake Ryland Williams is a natural scientist who is trained in physics, mathematics, and scientific programming. Their work includes a decade-long progression of critical works and theories for quantitative linguistics. These theories lead to applications in NLP, computational social science, and data engineering, including government-funded projects building NLP tools for forum moderation on social media platforms whose users discuss mass media news content.

About the Conference

The Philly Emerging Technologies for the Enterprise (ETE) is the Mid-Atlantic’s premier developer’s conference. Entering its 17th year, we’ve brought world-class speakers — including some local favorites — to speak about leading-edge technologies being used today, and emerging technologies that will be important for attendees to know about in the near future.

Watch More

Check out our YouTube playlist to watch all the talks from Emerging Technologies for the Enterprise 2023.