Main Article Content
Abstract
Neural story generation models have two significant challenges: (1) coherence over narrative structure, especially long-range dependencies, and (2) emotional coherence and consistency, generally producing redundant or incoherent narration. A new, emotionally intelligent two-stage short story generation model is presented by combining GPT-2 with a tailored FNET model, a light transformer architecture substituting standard self-attention with Fourier Transform layers to improve semantic and emotional relationship capture in text. The first stage employs GPT-2 to generate a list of input candidate sentences, a question, an answer, and an emotional state. The candidate sentences are then filtered using an emotion classifier from DistilRoBERTa to keep only those that adhere to a desired emotional tone. The filtered sentences are then fed into a fine-tuned FNET model, which examines inter-sentence relationships and enforces emotional coherence to generate a coherent and emotionally engaging narrative. An empirical comparison using three benchmark datasets demonstrates the system's superiority over earlier state-of-the-art approaches. The FNET model achieves 0.3093 in BLEU-1, outperforming Plan-and-Write (0.0953) and T-CVAE (0.2574), with an enhanced narrative quality and lexical coherence with human-written narratives. The story coherence and emotion retention accuracies are 85%, 67%, and 60% for Visual7W, ROCStories, and Cornell Movie Dialogs datasets.
Keywords
Article Details
References
- Jurafsky, D. (2000). Speech & language processing. Pearson Education India.
- ISBN-13: 9780131873216
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.
- ISBN: 9781510860964
- Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., & Others. (2018). Improving language understanding by generative pre-training.
- Link: Click Here
- Dwivedi, Y. K., Kshetri, N., Hughes, L., Slade, E. L., Jeyaraj, A., Kar, A. K., … Others. (2023). Opinion Paper: "So what if ChatGPT wrote it?" Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. International Journal of Information Management, 71, 102642.
- DOI: https://doi.org/10.1016/j.ijinfomgt.2023.102642
- Brown, P. F. (1990). Class-based n-gram models of natural language. Comput. Linguist., 18, 18.
- Link: https://aclanthology.org/J92-4003/
- Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., … Others. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901.
- ISBN: 9781713829546
- Zhang, H., Song, H., Li, S., Zhou, M., & Song, D. (2023). A survey of controllable text generation using transformer-based pre-trained language models. ACM Computing Surveys, 56, 1–37.
- DOI: https://doi.org/10.1145/3617680
- Meehan, J. R. (1976). The metanovel: writing stories by computer. Yale University.
- ISBN: 0824044096
- Turner, S. R. (2014). The creative process: A computer model of storytelling and creativity. Psychology Press.
- DOI: https://doi.org/10.4324/9781315806464
- Bringsjord, S., & Ferrucci, D. (1999). Artificial intelligence and literary creativity: Inside the mind of brutus, a storytelling machine. Psychology Press.
- DOI: https://doi.org/10.4324/9781410602398
- Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, 27.
- ISBN: 9781510800410
- Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., & Others. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1, 9.
- Sepúlveda-Torres, R., Bonet-Jover, A., & Saquete, E. (2023). Detecting Misleading Headlines Through the Automatic Recognition of Contradiction in Spanish. IEEE Access, 11, 72007–72026.
- DOI: https://doi.org/10.1109/ACCESS.2023.3295781
- Lebowitz, M. (1985). Story-telling as planning and learning. Poetics, 14, 483–502.
- DOI: https://doi.org/10.1016/0304-422X(85)90015-4
- PÉrez, R. P. Ý., & Sharples, M. (2001). MEXICA: A computer model of a cognitive account of creative writing. Journal of Experimental & Theoretical Artificial Intelligence, 13, 119–139.
- DOI: https://doi.org/10.1080/09528130010029820
- Riedl, M. O., & Young, R. M. (2010). Narrative planning: Balancing plot and character. Journal of Artificial Intelligence Research, 39, 217–268.
- DOI: https://doi.org/10.1613/jair.2989
- Cavazza, M., Charles, F., & Mead, S. J. (2002). Character-based interactive storytelling. IEEE Intelligent Systems, 17, 17–24.
- DOI: https://doi.org/10.1109/MIS.2002.1024747
- Fan, A., Lewis, M., & Dauphin, Y. (2018). Hierarchical Neural Story Generation. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
- DOI: https://doi.org/10.48550/arXiv.1805.04833
- Xu, J., Ren, X., Zhang, Y., Zeng, Q., Cai, X., & Sun, X. (2018). A skeleton-based model for promoting coherence among sentences in narrative story generation. arXiv Preprint arXiv:1808. 06945.
- DOI: https://doi.org/10.48550/arXiv.1808.06945
- Yao, L., Peng, N., Weischedel, R., Knight, K., Zhao, D., & Yan, R. (2019). Plan-and-write: Towards better automatic storytelling. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 7378–7385.
- DOI: https://doi.org/10.1609/aaai.v33i01.33017378
- Wang, T., & Wan, X. (2019). T-CVAE: Transformer-based conditioned variational autoencoder for story completion. IJCAI, 5233–5239.
- DOI: https://doi.org/10.24963/ijcai.2019/727
- Chen, G., Liu, Y., Luan, H., Zhang, M., Liu, Q., & Sun, M. (2020). Learning to generate explainable plots for neural story generation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 585–593.
- DOI: https://doi.org/10.1109/TASLP.2020.3039606
- Zhang, Y., Shi, X., Mi, S., & Yang, X. (2021). Image captioning with transformer and knowledge graph. Pattern Recognition Letters, 143, 43-49.
- DOI: https://doi.org/10.1016/j.patrec.2020.12.020
- Chen, G., Liu, Y., Luan, H., Zhang, M., Liu, Q., & Sun, M. (2021). Learning to generate explainable plots for neural story generation. ACM Transactions on Audio, Speech, and Language Processing, 29, 585–593.
- DOI: https://doi.org/10.1109/TASLP.2020.3039606
- Brahman, F., & Chaturvedi, S. (2020). Modeling protagonist emotions for emotion-aware storytelling. arXiv Preprint arXiv:2010. 06822.
- DOI: https://doi.org/10.48550/arXiv.2010.06822
- Tan, B., Yang, Z., AI-Shedivat, M., Xing, E. P., & Hu, Z. (2020). Progressive generation of long text with pretrained language models. arXiv Preprint arXiv:2006. 15720.
- DOI: https://doi.org/10.48550/arXiv.2006.15720
- Min, K., Dang, M., & Moon, H. (2021). Deep learning-based short story generation for an image using the encoder-decoder structure. IEEE Access, 9, 113550–113557.
- DOI: https://doi.org/10.1109/ACCESS.2021.3104276
- Wu, C., Wang, J., Yuan, S., Wang, L., & Zhang, W. (2021). Generate classical Chinese poems with theme-style from images. Pattern Recognition Letters, 149, 75–82.
- DOI: https://doi.org/10.1016/j.patrec.2021.05.016
- Liu, Y., Huang, Q., Li, J., Mo, L., Cai, Y., & Li, Q. (2022). SSAP: Storylines and sentiment aware pre-trained model for story ending generation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 686–694.
- DOI: https://doi.org/10.1109/TASLP.2022.3145320
- Jin, Y., Kadam, V., & Wanvarie, D. (2022). Plot writing from pre-trained language models. arXiv Preprint arXiv:2206. 03021.
- DOI: https://doi.org/10.48550/arXiv.2206.03021
- Chen, Y., Li, R., Shi, B., Liu, P., & Si, M. (2023). Visual story generation based on emotion and keywords. arXiv Preprint arXiv:2301. 02777.
- DOI: https://doi.org/10.48550/arXiv.2301.02777
- Khan, L. P., Gupta, V., Bedi, S., & Singhal, A. (2023). StoryGenAI: An Automatic Genre-Keyword Based Story Generation. 2023 International Conference on Computational Intelligence and Sustainable Engineering Solutions (CISES), 955–960.
- DOI: https://doi.org/10.1109/CISES58720.2023.10183482
- Hartmann, J. (2022). Emotion english distilroberta-base. See
- Link: https://huggingface.co/j-hartmann/emotion-english-distilroberta-base
- Lee-Thorp, J., Ainslie, J., Eckstein, I., & Ontanon, S. (2021). Fnet: Mixing tokens with fourier transforms. arXiv preprint arXiv:2105.03824.
- DOI: https://doi.org/10.48550/arXiv.2105.03824
- Fu, K., Li, H., & Shi, X. (2024). An encoder-decoder architecture with Fourier attention for chaotic time series multi-step prediction. Applied Soft Computing, 156(111409), 111409.
- DOI: https://doi.org/10.1016/j.asoc.2024.111409
- Dittakan, K., Prompitak, K., Thungklang, P., & Wongwattanakit, C. (2023). Image caption generation using transformer learning methods: a case study on instagram image. Multimedia Tools and Applications, 83(15), 46397–46417.
- DOI: https://doi.org/10.1007/s11042-023-17275-9
- Danescu-Niculescu-Mizil, C., & Lee, L. (2011). Chameleons in imagined conversations: A new approach to understanding coordination of linguistic style in dialogs. arXiv Preprint arXiv:1106. 3077.
- DOI: https://doi.org/10.48550/arXiv.1106.3077
- Zhu, Y. (2024). Visual7W dataset [Data set].
- DOI: https://doi.org/10.57702/zqariweh
- Mostafazadeh, N. (2024). ROCStories [Data set].
- DOI: https://doi.org/10.57702/26yy027v
- Lee, S., Lee, J., Moon, H., Park, C., Seo, J., Eo, S., … Lim, H. (2023). A survey on evaluation metrics for machine translation. Mathematics, 11(4), 1006.
- DOI: https://doi.org/10.3390/math11041006
- Kaptein, F., & Broekens, J. (2015, August). The affective storyteller: using character emotion to influence narrative generation. In International Conference on Intelligent Virtual Agents (pp. 352-355). Cham: Springer International Publishing.
- DOI: http://doi.org/10.1007/978-3-319-21996-7_38
- Rashkin, H., Celikyilmaz, A., Choi, Y., & Gao, J. (2020). PlotMachines: Outline-conditioned generation with dynamic plot state tracking. arXiv preprint arXiv:2004.14967.
- DOI: https://doi.org/10.48550/arXiv.2004.14967
- Li, Y., Gan, Z., Shen, Y., Liu, J., Cheng, Y., Wu, Y., ... & Gao, J. (2019). Storygan: A sequential conditional gan for story visualization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6329-6338).
- DOI: https://doi.org/10.48550/arXiv.1812.02784
- Keskar, N. S., McCann, B., Varshney, L. R., Xiong, C., & Socher, R. (2019). Ctrl: A conditional transformer language model for controllable generation. arXiv preprint arXiv:1909.05858.
- DOI: https://doi.org/10.48550/arXiv.1909.05858
References
Jurafsky, D. (2000). Speech & language processing. Pearson Education India.
ISBN-13: 9780131873216
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.
ISBN: 9781510860964
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., & Others. (2018). Improving language understanding by generative pre-training.
Link: Click Here
Dwivedi, Y. K., Kshetri, N., Hughes, L., Slade, E. L., Jeyaraj, A., Kar, A. K., … Others. (2023). Opinion Paper: "So what if ChatGPT wrote it?" Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. International Journal of Information Management, 71, 102642.
DOI: https://doi.org/10.1016/j.ijinfomgt.2023.102642
Brown, P. F. (1990). Class-based n-gram models of natural language. Comput. Linguist., 18, 18.
Link: https://aclanthology.org/J92-4003/
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., … Others. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901.
ISBN: 9781713829546
Zhang, H., Song, H., Li, S., Zhou, M., & Song, D. (2023). A survey of controllable text generation using transformer-based pre-trained language models. ACM Computing Surveys, 56, 1–37.
DOI: https://doi.org/10.1145/3617680
Meehan, J. R. (1976). The metanovel: writing stories by computer. Yale University.
ISBN: 0824044096
Turner, S. R. (2014). The creative process: A computer model of storytelling and creativity. Psychology Press.
DOI: https://doi.org/10.4324/9781315806464
Bringsjord, S., & Ferrucci, D. (1999). Artificial intelligence and literary creativity: Inside the mind of brutus, a storytelling machine. Psychology Press.
DOI: https://doi.org/10.4324/9781410602398
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, 27.
ISBN: 9781510800410
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., & Others. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1, 9.
Sepúlveda-Torres, R., Bonet-Jover, A., & Saquete, E. (2023). Detecting Misleading Headlines Through the Automatic Recognition of Contradiction in Spanish. IEEE Access, 11, 72007–72026.
DOI: https://doi.org/10.1109/ACCESS.2023.3295781
Lebowitz, M. (1985). Story-telling as planning and learning. Poetics, 14, 483–502.
DOI: https://doi.org/10.1016/0304-422X(85)90015-4
PÉrez, R. P. Ý., & Sharples, M. (2001). MEXICA: A computer model of a cognitive account of creative writing. Journal of Experimental & Theoretical Artificial Intelligence, 13, 119–139.
DOI: https://doi.org/10.1080/09528130010029820
Riedl, M. O., & Young, R. M. (2010). Narrative planning: Balancing plot and character. Journal of Artificial Intelligence Research, 39, 217–268.
DOI: https://doi.org/10.1613/jair.2989
Cavazza, M., Charles, F., & Mead, S. J. (2002). Character-based interactive storytelling. IEEE Intelligent Systems, 17, 17–24.
DOI: https://doi.org/10.1109/MIS.2002.1024747
Fan, A., Lewis, M., & Dauphin, Y. (2018). Hierarchical Neural Story Generation. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
DOI: https://doi.org/10.48550/arXiv.1805.04833
Xu, J., Ren, X., Zhang, Y., Zeng, Q., Cai, X., & Sun, X. (2018). A skeleton-based model for promoting coherence among sentences in narrative story generation. arXiv Preprint arXiv:1808. 06945.
DOI: https://doi.org/10.48550/arXiv.1808.06945
Yao, L., Peng, N., Weischedel, R., Knight, K., Zhao, D., & Yan, R. (2019). Plan-and-write: Towards better automatic storytelling. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 7378–7385.
DOI: https://doi.org/10.1609/aaai.v33i01.33017378
Wang, T., & Wan, X. (2019). T-CVAE: Transformer-based conditioned variational autoencoder for story completion. IJCAI, 5233–5239.
DOI: https://doi.org/10.24963/ijcai.2019/727
Chen, G., Liu, Y., Luan, H., Zhang, M., Liu, Q., & Sun, M. (2020). Learning to generate explainable plots for neural story generation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 585–593.
DOI: https://doi.org/10.1109/TASLP.2020.3039606
Zhang, Y., Shi, X., Mi, S., & Yang, X. (2021). Image captioning with transformer and knowledge graph. Pattern Recognition Letters, 143, 43-49.
DOI: https://doi.org/10.1016/j.patrec.2020.12.020
Chen, G., Liu, Y., Luan, H., Zhang, M., Liu, Q., & Sun, M. (2021). Learning to generate explainable plots for neural story generation. ACM Transactions on Audio, Speech, and Language Processing, 29, 585–593.
DOI: https://doi.org/10.1109/TASLP.2020.3039606
Brahman, F., & Chaturvedi, S. (2020). Modeling protagonist emotions for emotion-aware storytelling. arXiv Preprint arXiv:2010. 06822.
DOI: https://doi.org/10.48550/arXiv.2010.06822
Tan, B., Yang, Z., AI-Shedivat, M., Xing, E. P., & Hu, Z. (2020). Progressive generation of long text with pretrained language models. arXiv Preprint arXiv:2006. 15720.
DOI: https://doi.org/10.48550/arXiv.2006.15720
Min, K., Dang, M., & Moon, H. (2021). Deep learning-based short story generation for an image using the encoder-decoder structure. IEEE Access, 9, 113550–113557.
DOI: https://doi.org/10.1109/ACCESS.2021.3104276
Wu, C., Wang, J., Yuan, S., Wang, L., & Zhang, W. (2021). Generate classical Chinese poems with theme-style from images. Pattern Recognition Letters, 149, 75–82.
DOI: https://doi.org/10.1016/j.patrec.2021.05.016
Liu, Y., Huang, Q., Li, J., Mo, L., Cai, Y., & Li, Q. (2022). SSAP: Storylines and sentiment aware pre-trained model for story ending generation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 686–694.
DOI: https://doi.org/10.1109/TASLP.2022.3145320
Jin, Y., Kadam, V., & Wanvarie, D. (2022). Plot writing from pre-trained language models. arXiv Preprint arXiv:2206. 03021.
DOI: https://doi.org/10.48550/arXiv.2206.03021
Chen, Y., Li, R., Shi, B., Liu, P., & Si, M. (2023). Visual story generation based on emotion and keywords. arXiv Preprint arXiv:2301. 02777.
DOI: https://doi.org/10.48550/arXiv.2301.02777
Khan, L. P., Gupta, V., Bedi, S., & Singhal, A. (2023). StoryGenAI: An Automatic Genre-Keyword Based Story Generation. 2023 International Conference on Computational Intelligence and Sustainable Engineering Solutions (CISES), 955–960.
DOI: https://doi.org/10.1109/CISES58720.2023.10183482
Hartmann, J. (2022). Emotion english distilroberta-base. See
Link: https://huggingface.co/j-hartmann/emotion-english-distilroberta-base
Lee-Thorp, J., Ainslie, J., Eckstein, I., & Ontanon, S. (2021). Fnet: Mixing tokens with fourier transforms. arXiv preprint arXiv:2105.03824.
DOI: https://doi.org/10.48550/arXiv.2105.03824
Fu, K., Li, H., & Shi, X. (2024). An encoder-decoder architecture with Fourier attention for chaotic time series multi-step prediction. Applied Soft Computing, 156(111409), 111409.
DOI: https://doi.org/10.1016/j.asoc.2024.111409
Dittakan, K., Prompitak, K., Thungklang, P., & Wongwattanakit, C. (2023). Image caption generation using transformer learning methods: a case study on instagram image. Multimedia Tools and Applications, 83(15), 46397–46417.
DOI: https://doi.org/10.1007/s11042-023-17275-9
Danescu-Niculescu-Mizil, C., & Lee, L. (2011). Chameleons in imagined conversations: A new approach to understanding coordination of linguistic style in dialogs. arXiv Preprint arXiv:1106. 3077.
DOI: https://doi.org/10.48550/arXiv.1106.3077
Zhu, Y. (2024). Visual7W dataset [Data set].
DOI: https://doi.org/10.57702/zqariweh
Mostafazadeh, N. (2024). ROCStories [Data set].
DOI: https://doi.org/10.57702/26yy027v
Lee, S., Lee, J., Moon, H., Park, C., Seo, J., Eo, S., … Lim, H. (2023). A survey on evaluation metrics for machine translation. Mathematics, 11(4), 1006.
DOI: https://doi.org/10.3390/math11041006
Kaptein, F., & Broekens, J. (2015, August). The affective storyteller: using character emotion to influence narrative generation. In International Conference on Intelligent Virtual Agents (pp. 352-355). Cham: Springer International Publishing.
DOI: http://doi.org/10.1007/978-3-319-21996-7_38
Rashkin, H., Celikyilmaz, A., Choi, Y., & Gao, J. (2020). PlotMachines: Outline-conditioned generation with dynamic plot state tracking. arXiv preprint arXiv:2004.14967.
DOI: https://doi.org/10.48550/arXiv.2004.14967
Li, Y., Gan, Z., Shen, Y., Liu, J., Cheng, Y., Wu, Y., ... & Gao, J. (2019). Storygan: A sequential conditional gan for story visualization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6329-6338).
DOI: https://doi.org/10.48550/arXiv.1812.02784
Keskar, N. S., McCann, B., Varshney, L. R., Xiong, C., & Socher, R. (2019). Ctrl: A conditional transformer language model for controllable generation. arXiv preprint arXiv:1909.05858.