Abstract
Writing serves an essential role in human communication and development. From blogs to scientific essays, writing has conveyed and promoted diverse ideas and cutting-edge innovations. In particular, in the era of flourishing artificial intelligence, people aim to harness the power of AI to assist them in writing. While AI writing is more convenient and effortless, AI Writing Assistants often overshadow the development of writing skills, which are particularly important for students and authors. Additionally, many existing AI writing assistants—such as Wordcraft, Effidit, and PEER—focus on grammar correction but fail to address creativity and stylistic nuances. In order to address these issues, this study aims to develop a writing assistant that enhances user creativity through character-driven narratives and visual aids. This approach may inspire the writer and drive the story forward. In addition, this model can generate scene illustrations in text and pictures, thereby adding an engaging setting to the story. Our model also incorporates external text and has the capability to fine-tune itself to fit the specific style and humanized nuance of these samples, allowing the users to write with nuance, mimicking John Steinbeck’s style or crafting engaging children’s book content. The experiments demonstrated consistent convergence, with training loss reducing from 3.44 to 0.94 across 825 steps, highlighting the model’s robustness and precision. Additionally, due to monetary concerns, the number of trials and samples is limited. But the metrics nonetheless indicate the model’s capacity to generate stylistically accurate and contextually coherent outputs. In short, our research demonstrates an alternative way to assist writers without hindering the development of writing skills, paving the way for future English learners and authors to utilize the emerging AI writing technology properly.
Keywords: Generative AI Writing Assistant, Fine-Tuning, Collaborative Simulative Writing, Transformer-based Models, Style Mimicry, Interactive Storytelling
Introduction
The ability to write can pave the way for countless opportunities by allowing the writer to concisely express ideas and develop critical thinking skills. In the age of flourishing artificial intelligence, writing assistants have become increasingly popular, and people are learning and exploring ways to embed AI to help them in daily tasks. In particular, AI writing assistants have been exceptionally prevalent and influential due to the large number of English learners across America. In fact, there were 5.3 million English Foreign Language (EFL) public school students in the U.S. in fall 2021, which is 10% of the total K-12 student population1. In addition, EFL students have consistently reported struggling to convey their ideas in a second language, thereby producing considerable levels of cognitive stress that could impede them from developing writing proficiency and essential writing skills2. Native English speakers, students in particular, can struggle with writing as well. Nevertheless, the presence of AI writing assistants like Grammarly, ChatGPT, and Jasper can significantly improve the performance of writers. For example, the famous AI-powered English writing assistant Grammarly has the ability to boost a user’s writing performance by 43%3.
However, with the widespread implications of AI writing, plagiarism has become an issue. In numerous instances, students are not using AI as a writing tool that helps them improve their writing skills. Instead, they take AI’s work and publish it as their own4. This highlights the need for conscious and responsible AI writing assistants to ensure that such tools serve as aids in learning and not as shortcuts that compromise the integrity of one’s work. Therefore, our research aims to meet this need for such AI writing assistants by leveraging Large Language Models (LLMs).
LLMs are advanced types of AI designed to understand and generate human-like writing based on extensive data that includes diverse sources such as books, articles, websites, and other text corpora5. A well-trained LLM model has the ability to capture linguistic patterns and contextual nuances. The main research areas in this field include developing methods to fine-tune these models for specific tasks and styles, integrating user feedback to improve output quality, and ensuring the systems are effective in real-world applications6,7,8,9. By building and improving on other concurrent AI writing assistants, this research will contribute to the field by developing a comprehensive, user-centered AI assistant that genuinely enhances the user’s writing skill instead of providing a shortcut with short-term benefits and limiting the possibility of cheating. In addition, our model will incorporate findings from previous studies, such as refinement, user input, and user-friendliness. This will enable our product to not only produce high-quality creative content but also be efficient and convenient for writers. Furthermore, although our model will not delve deeply into specific areas of previous research, the use of LLMs allows us to leverage and incorporate their functionalities.
Previous research on AI writing assistants includes Wordcraft, Effidit, and PEER. Ippolito et al. utilized Natural Language Generation (NLG) to discuss the impact of AI writing assistants by collaborating with professionals to evaluate Wordcraft, a text editor with built-in AI-powered writing assistance tools and in-context learning suggestions7. They found that NLG technologies often struggle to preserve the style and authorial voice. The model also lacks a deep understanding of natural language because the user input often exceeds the maximum sequence length of 1024 tokens. Thus, in order for AI-powered writing assistants to achieve their full potential, it is essential to take into account the diverse goals and expertise of human writers.
Moreover, the research of Seßler et al. provided insights on modifying our model in order to refine human feedback as a part of better outcomes8. They utilized LLMs with Reinforcement Learning with Human Feedback (RLHF) to generate writing suggestions on essays. PEER, the resulting model of the research, employs a prompt-based approach to allow users to select personalized feedback options and improve the outcome based on user preferences and needs. Since its initiation in 2023, PEER has successfully motivated students to improve their writing skills through positive feedback and specific suggestions. PEER has also gained high public interest, with over 4000 essays uploaded to the model. However, the researchers also stated that the current model lacked model fine-tuning in order to meet students’ specific requirements.
While many other AI writing assistants only provide limited text completion and polishing functions, another instance of AI Writing Assistant, Efficient Intelligent Editing (Effidit), supports numerous additional supports, such as generation-based sentence completion, retrieval-based sentence completion, phrase polishing, and sentence expansion for its users9. In addition, it also focuses on expanding its capacities in text completion, error checking, text polishing, keywords to sentences (K2S), and cloud input methods (cloud IME). The extensive amount of experiments—12 Transformer layers with 3 billion data constructed from about 1.3 billion English sentences—also gives the model a significantly positive evaluation (4.28 on a 5-point scale) on its quality. However, it lacks the capability to fine-tune itself.
In short, the integration of large language models in writing assistants represents a significant advancement in AI applications. By fine-tuning these models for specific styles and tasks, researchers can develop tools that not only reduce the workload but also significantly enhance the quality and coherence of generated content.
The present research aims to advance the field of creative AI assistants by integrating and elaborating on concurrent methodologies and concepts. We build on the foundational work of existing models like GPT-3 and LaMDA by fine-tuning them for specific writing tasks and styles10. This approach will enhance the models’ ability to generate content that aligns with particular creative requirements, thereby increasing their capacity and humanized nuances. Our model will also incorporate a prompt-based approach and leverage user feedback to improve the precision of our model’s outcome. Furthermore, the writers can also have more detailed images of their scenes by utilizing the additional image-generated function to add illustrations that may help visualize the scenes for both of the readers and authors. By building and improving on other concurrent AI writing assistants, our research will contribute to the field by developing a comprehensive, user-centered AI assistant that genuinely enhances the user’s writing skill instead of providing a shortcut with short-term benefits. This integration of fine-tuning, user feedback, and usability ensures that our AI assistant will not only generate high-quality artistic content but also be effective and practical for writers. For instance, a student struggling with creative writing could use our model to create complex characters, fitting dialogues, and visual scenes aligned with their story’s theme. Similarly, a user creating a fantasy story can generate a wizard character with vivid scene illustrations.
Model Name | Strengths | Weaknesses |
Wordcraft | In-context learning suggestionsNatural Language Generation | Struggle to preserve style and authorial voiceLacks deep understanding of story contents |
Effidit | Sentence completionPhrase polishingSentence expansionError checking | Lacks model fine-tuning Text only |
PEER | Prompt-based approachPersonalized feedback options Improve the outcome based on user preferences and needs | Lacks model fine-tuning to meet user’s specific requirements |
Proposed Model | Prompt-based approachPersonalized feedbackFine-tuningPreserve style and authorial voice of diverse writersAutomatic writing revisionScene generation | Number of experiment is limited |
Proposed Model
The model consists of three parts: 1) chatbots with different personas, 2) description of scenes with optional illustrations, and 3) personalized domain-enhanced AI agent.
To create AI agents, we first install OpenAI and manually obtain the API key. Then, we program the objects (AI agents) using a prompt-based approach. We also introduce a new method of collaborative simulative writing, in which multiple agents can work together to create a scene from different perspectives. Additionally, the API key enables our model to generate vivid scene descriptions. The scenes can also be illustrated using the DALL·E model.
The following code snippet creates a sample chatbot, defines it as a “helpful assistant” (the definition can be changed based on the user’s requirements), and utilizes previous prompts to add cohesiveness to the responses.
Sample code in Figure 3 instantiates the agents and uses Harry Potter as an example.
Code snippet in Figure 4 generates sample scene illustrations based on user input.
Another unique feature of our model is its ability to adapt to user-chosen datasets. The LLM model is pre-trained with two datasets: Hugging Face’s StorySeed, a dataset specially designed for text generation models in the domain of children’s picture book creation, and the first three chapters of John Steinbeck’s The Great Gatsby. The datasets are modified to have prompt and response pairs of a certain style of writing stories, and the response uses three hashtags and dashes to separate different book elements. For instance, StorySeed’s prompt starts with “Please help me create a picture book with the theme…”. While the response usually includes the following: “### Title:…— ### Cover Page Illustration:…— ### Page 1 Illustration..”. Both the prompt and the response are of string type. The prompts are usually shorter than the respective responses.
StorySeed’s prompt lengths range from 246 to 440 characters, while the lengths of the responses range from 17 to 5.76 thousand characters. The data frame has a total of 4380 rows and 3 columns. The three columns are questions, answers, and text. The text is a combination of the questions and answers pair formatted with hashtags. It has no missing values or duplicated rows.
StorySeed’s prompt lengths range from 16 to 210 characters, while the lengths of the responses range from 1 to 906 characters. The data frame has a total of 563 rows and 3 columns that are similar to those of StorySeed. It has no missing values or duplicated rows.
To fine tune the LLM model, we utilized the Transformer model, a class of deep learning models defined by architectural traits11. The transformer architecture is the building block of numerous LLMs, including BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer)12,13. First presented in the seminal paper Attention is All You Need by a team of Google researchers, transformers have redefined the benchmarks in large language models14. The first Transformer model outperformed the previous best results by 2 BLEU (BiLingual Evaluation Understudy) scores on the WMT 2014 English-to-German translation task.
The steps of how transformers work are as followed:
- Input/output embeddings: It occurs in the lowest encoder (layer). This layer converts input tokens-words or subwords-into numerical vectors. These vectors, known as embeddings in the bottom encoder (in other encoders, they would be the output of the encoder that’s directly below them), represent the semantic meaning of the tokens.
- All the encoders receive a list of vectors, each of a fixed-size of 512.
- Multi-head or masked attention layer: This layer is an important design in transformer models, because it provides a matrix form of representation to calculate the probability distribution of the model.
- Feed forward layer or linear layer: This is the conventional dense layers used in neural network models that are designed using perceptrons. These are neurons that are interconnected or fully connected such that every cell has calculation of a linear model and a nonlinear model.
- Add and Normalize: The “add” action is to concatenate layers together because each layer is designed using sequential API. Normalization is designed here to standardize the numbers in the vectors.
- Softmax: This is very similar to an attention layer, but it gives a probability distribution of a vector instead of a matrix. This is important for classification or next word prediction.
Experiment
First, the user would start writing a story. Users can generate multiple characters with unique backgrounds, including their personalities, roles in the story, and specific conversations that would influence the plot. These characters, or agents, can guide the storyline, and the AI assistant will facilitate interactions between them while also advancing the story and crafting dynamic and coherent dialogues.
The author could also ask for help with certain scenes and illustrations of their fantasized setting during the creative process if needed. Then, the LLM model would respond with specific descriptions of scenes that fit the style and nuances of the utilized dataset. The author and LLM model can repeat this process until it is finished. Another benefit of our LLM model is its ability to write and assist the author in numerous ways. For example, the writer could use the pre-trained model to write in the classic style of John Steinbeck by reinforcing the model with prefabricated datasets from The Great Gatsby. The user could also use the model that’s pre-trained with the StorySeed dataset to write children’s books with appropriate tone and style.
Results
When the number of training epochs equals 6, the The Great Gatsby dataset has a training loss of 3.440200 at step 25 and an improved training loss of 0.940900 when the number of steps increased to 825. On the other hand, when the number of training epochs equals 2, the StorySeed dataset has a training loss of 1.347000 at step 25 and an improved training loss of 1.131300 when the number of steps increases to 50. Both datasets show a considerable success rate with generally decreasing training loss as the number of steps increases.
Step | Training Loss |
25 | 1.347000 |
50 | 1.131300 |

Our model maintains consistency with the target writing style, as demonstrated by the first sample of user input and generated response based on the style of Storyseed:
Prompt:
Please help me create a picture book with the theme of a curious kitten embarks on a whimsical journey through a magical forest, encountering friendly creatures along the way. The format should include a title, followed by a cover page with an illustration only, and the remaining pages should each have both text and illustrations.
Response:
### Title: Whisker’s Magical Forest Adventure
### Cover Page Illustration:
A small, fluffy, gray kitten with wide, curious eyes standing at the edge of a lush, vibrant forest filled with colorful flowers and twinkling lights.
### Page 1 Illustration:
Whiskers peeking out from behind a large oak tree, gazing at the path ahead.
#### Text:
In the heart of a sun-dappled garden, little Whiskers the kitten found a hidden path leading into a mysterious forest. With a twitch of his nose and a flick of his tail, he decided it was the perfect day for an adventure.
### Page 2 Illustration:
Whiskers meeting a cheerful squirrel with a bushy tail, sitting on a tree branch.
#### Text:
Not far into the forest, Whiskers met Squeaky the squirrel. “Where are you off to, little kitten?” Squeaky chirped. “I’m exploring this magical forest,” Whiskers replied with a purr. Squeaky nodded, “Follow me! I’ll show you the way to the Whispering Willows.”
### Page 3 Illustration:
Whiskers and Squeaky walking together along a path lined with twinkling willow trees.
#### Text:
As they walked, the forest seemed to hum with magic. The willow trees whispered secrets of the forest to anyone who would listen. Whiskers listened carefully, learning tales of the trees and the creatures who lived among them.
### Page 4 Illustration:
A sparkling stream with a friendly frog wearing a crown of lilies.
#### Text:
Next, they met Ferdinand the frog, who ruled over a sparkling stream. “Welcome, travelers! Are you thirsty?” he ribbited joyfully. Whiskers lapped up the cool, refreshing water and felt like a little prince himself.
### Page 5 Illustration:
Whiskers looking in awe at a gathering of fireflies lighting up a small clearing in the woods.
#### Text:
Night began to fall, and the forest lit up with a thousand fireflies. “These are the stars of the forest,” Squeaky whispered. Whiskers felt as if he were walking in a dream, surrounded by tiny, glowing friends.
### Page 6 Illustration:
Whiskers curled up in a cozy nest made of soft leaves, surrounded by new friends.
#### Text:
Tired after a long day of adventures, Whiskers found a cozy nest made of leaves. Ferdinand the frog, Squeaky the squirrel, and even the fireflies gathered around to bid him goodnight. “You’re always welcome here, Whiskers,” Squeaky said with a smile.
### Page 7 Illustration:
Whiskers walking back home, glancing back at his friends and the magical forest.
#### Text:
The next morning, Whiskers headed back home, his heart full of wonderful memories. He knew he would return soon, for the magical forest and his new friends were just a small adventure away.
### Back Cover Illustration:
A distant view of the magical forest with Whiskers looking back at it from the garden, a small smile on his face.
#### Text:
“Every adventure brings its own magic.”
Discussion
According to the results, the models demonstrated a solid ability to create content that matches various styles and tones, such as children’s stories and classics, while showcasing the potential for advancing creative AI assistants. In the future, AI assistants will be an important tool that can provide writers with detailed descriptions and examples of scenes, create inspiration, and add nuances to the book styles.
To further validate the results, this research also presents our results against other fine-tuning results on text corpus data (figure 7) from other sources, as shown below :
Figure 11 presents the proposed loss against the models in the literature on similar tasks15. The proposed model is on part with most of the algorithms in the literature, which is indicated by the blue line.
Limitations
Our research has some inevitable limitations. We only scrape data from PDF documents of literature and texts, so the format of our dataset is limited to text (string) only. In addition, we can only run a small number of trials—2 trials for StorySeed and 33 trials for The Great Gatsby—due to monetary concerns for a large number of API calls. Thus, future studies should validate the results with more diverse texts and longer trials to ensure broader applicability and credibility. It’s highly recommended for these limitations to be addressed in order to develop a robust AI writing assistant that can be confidently employed in various circumstances.
Conclusion
We are excited for individuals to utilize and contribute their unique datasets to our models, and we encourage them to share any feedback they may have. We also hope that our research can offer valuable insights on the potential and implications of AI writing assistants for writers. By leveraging our model, writers can significantly reduce their workload, which allows more people to create higher-quality work and improve their writing skills. In summary, our research has demonstrated the potential for creating writing assistants using LLM models and custom data pipelines. The findings indicate that at a training loss of 0.8846 after about 800 steps, our model performs similarly to existing models while incorporating additional features. As a result, our research paves the way for further exploration of LLM models in aiding the writing process, demonstrating their ability to develop conversational agents, generating realistic scenes, and adding nuanced details to align with various writing styles. This model can revolutionize creative writing workshops by enabling collaborative story generation. However, although our model is restricted to generating the entire story, it may also be misused for generating academic plagiarism. Thus, there could be more investigations about alternative ways to utilize AI writing assistants appropriately. Future versions could also support creative writing in non-English languages.
Acknowledgments
We especially thank Yiqiao Yin from Columbia University for his guidance, advice, and mentorship throughout the research process. We also thank Lumiere Education for facilitating the relationship with Yiqiao Yin and for their support in formulating the research paper.
References
- U.S. Department of Education. “English learners (ELs) enrolled in public elementary and secondary schools, by state or jurisdiction: Fall 2011 through fall 2021.” National Center for Education Statistics, Table 204.20. nces.ed.gov/programs/digest/d23/tables/dt23_204.20.asp. (2024). [↩]
- Gayed, John M., et al. “Exploring an AI-based Writing Assistant’s Impact on English Language Learners.” Computers and Education: Artificial Intelligence, vol. 3, p. 100055, doi.org/10.1016/j.caeai.2022.100055. (2022). [↩]
- Fitria, Tira N. “‘Grammarly’ As AI-powered English Writing Assistant: Students’ Alternative for English Writing.” Metathesis Journal of English Language Literature and Teaching, vol. 5, no. 1, doi.org/10.31002/metathesis.v5i1.3519. (2021). [↩]
- Roe, J., Renandya, W., and Jacobs, G. “A Review of AI-Powered Writing Tools and Their Implications for Academic Integrity in the Language Classroom.” Journal of English and Applied Linguistics, vol. 2, Iss. 1, Article 3. doi.org/10.59588/2961-3094.1035. (2023). [↩]
- Liu, V., Yin, Y. “Green AI: exploring carbon footprints, mitigation strategies, and trade offs in large language model training.” Discover Artif Intell, vol. 4, 49, doi.org/10.1007/s44163-024-00149-w. (2024). [↩]
- Rangan K. and Yin Y. “A Fine-tuning Enhanced RAG System with Quantized Influence Measure As AI Judge.” ArXiv, /abs/2402.17081. (2024). [↩]
- Ippolito, D., Yuan, A., Coenen, A., & Burnam, S. “Creative Writing with an AI-Powered Writing Assistant: Perspectives from Professional Writers.” ArXiv. /abs/2211.05030. (2022). [↩] [↩]
- Seßler, K., Xiang, T., Bogenrieder, L., and Kasneci, E. “PEER: Empowering Writing with Large Language Models.” Springer Link. doi.org/10.1007/978-3-031-42682-7_73. (2023). [↩] [↩]
- Shi, S., Zhao, E., Tang, D., Wang, Y., Li, P., Bi, W., Jiang, H., Huang, G., Cui, L., Huang, X., Zhou, C., Dai, Y., & Ma, D. “Effidit: Your AI Writing Assistant.” ArXiv. /abs/2208.01815. (2022). [↩] [↩]
- Rangan K. and Yin Y. “A Fine-tuning Enhanced RAG System with Quantized Influence Measure As AI Judge.” ArXiv, /abs/2402.17081. (2024). [↩]
- Amatriain, Xavier, et al. “Transformer Models: An Introduction and Catalog.” ArXiv, /abs/2302.07730. (2023). [↩]
- Devlin, Jacob, et al. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” ArXiv, /abs/1810.04805. (2018). [↩]
- Zhang, Emma Y., et al. “From Turing to Transformers: A Comprehensive Review and Tutorial on the Evolution and Applications of Generative Transformer Models.” Sci, vol. 5, no. 4, p. 46, doi.org/10.3390/sci5040046. (2023). [↩]
- Vaswani, Ashish, et al. “Attention Is All You Need.” ArXiv, /abs/1706.03762. (2017). [↩]
- Venturmini02. Weblog post. “Exploring the Intricacies of Fine-Tuning Through Loss Metrics.” community.openai.com/t/exploring-the-intricacies-of-fine-tuning-through-loss-metrics/820403/2. (2024). [↩]