Main Article Content
Abstract
The proliferation of artificial intelligence (AI) and the Internet of Things (IoT) has positioned smart kitchens as a frontier for innovation in personalized nutrition, safety monitoring, and sustainable consumption. Despite rapid progress, existing approaches remain fragmented: vision-based systems struggle with occlusion, speech-driven interfaces are vulnerable to noise, and IoT sensor networks, while reliable, often lack semantic integration with user preferences. Personalized recommender systems further suffer from static designs that fail to adapt to evolving contexts. Addressing these limitations, this study introduces a multimodal deep learning framework that unifies cross-modal attention and reinforcement learning to achieve context-aware personalization. Visual, auditory, and sensor streams are embedded into a shared representation, fused via attention mechanisms, and subsequently optimized through a reinforcement learning agent that balances nutritional goals, user satisfaction, and safety requirements. Empirical evaluation across three multimodal datasets demonstrates significant improvements over strong baselines, with gains of +8.4% in Top-1 accuracy, +14.0% in F1-score for safety monitoring, and a 23.5% reduction in nutritional prediction error. Interpretability modules employing SHAP and Integrated Gradients further provide transparent explanations, enhancing trust and accountability. The findings underscore the practical value of the framework in promoting healthier diets, improving energy efficiency, and ensuring domestic safety, while laying the groundwork for future applications in healthcare, adaptive living, and sustainable human-AI interaction.
Keywords
Article Details
References
- Purnama, S., & Sejati, W. (2023). Internet of things, big data, and artificial intelligence in the food and agriculture sector. International Transactions on Artificial Intelligence, 1(2), 156-174. https://doi.org/10.33050/italic.v1i2.274
- Güngör, O., & Yücel Güngör, M. (2024). Automation in gastronomy: use of smart cooking systems in industrial kitchens. Worldwide Hospitality and Tourism Themes, 16(2), 190-201.
- Ren, R., Wang, Z., Yang, C., Liu, J., Jiang, R., Zhou, Y., ... & He, B. (2025). Enhancing robotic skill acquisition with multimodal sensory data: A novel dataset for kitchen tasks. Scientific Data, 12(1), 476.
- Prajapati, A., Nigam, M., & Priyanka, R. (2024, May). RecipeLens: Revolutionizing Meal Preparation with Image-Based Ingredient Detection and Recipe Suggestions. In 2024 International Conference on Intelligent Systems for Cybersecurity (ISCS) (pp. 1-6). IEEE. https://doi.org/10.1109/iscs61804.2024.10581386
- Razin, M., KR, R. K., & Ramasamy, G. (2024, November). Cross-Modal Ingredient Recognition and Recipe Suggestion using Computer Vision and Predictive Modeling. In 2024 8th International Conference on Computational System and Information Technology for Sustainable Solutions (CSITSS) (pp. 1-6). IEEE. https://doi.org/10.1109/csitss64042.2024.10816851
- Coman, L. I., Ianculescu, M., Paraschiv, E. A., Alexandru, A., & Bădărău, I. A. (2024). Smart solutions for diet-related disease management: Connected care, remote health monitoring systems, and integrated insights for advanced evaluation. Applied Sciences, 14(6), 2351.
- Nfor, K. A., Theodore Armand, T. P., Ismaylovna, K. P., Joo, M. I., & Kim, H. C. (2025). An explainable CNN and vision transformer-based approach for real-time food recognition. Nutrients, 17(2), 362.
- Sadique, P. A., & Aswiga, R. V. (2025). Automatic summarization of cooking videos using transfer learning and transformer-based models. Discover Artificial Intelligence, 5(1), 7.
- Lin, B. (2024). Reinforcement Learning in Automatic Speech Recognition (ASR): The Voice-First Revolution. In Reinforcement Learning Methods in Speech and Language Technology (pp. 79-90). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-53720-2_9
- Kumar, K., Verma, A., & Verma, P. (2024). IoT-HGDS: Internet of Things integrated machine learning based hazardous gases detection system for smart kitchen. Internet of Things, 28, 101396.
- Nishad, D. K., Verma, V. R., Rajput, P., Gupta, S., Dwivedi, A., & Shah, D. R. (2025). Adaptive AI-enhanced computation offloading with machine learning for QoE optimization and energy-efficient mobile edge systems. Scientific Reports, 15(1), 15263.
- Abadeh, M. N. (2024). A semantic axiomatic design for integrity in IoT. Transactions on Emerging Telecommunications Technologies, 35(9), e5032.
- Lu, P. M., & Zhang, Z. (2025). The model of food nutrition feature modeling and personalized diet recommendation based on the integration of neural networks and K-means clustering. Journal of Computational Biology and Medicine, 5(1). https://doi.org/10.71070/jcbm.v5i1.60
- Li, X., Sun, L., Ling, M., & Peng, Y. (2023). A survey of graph neural network based recommendation in social networks. Neurocomputing, 549, 126441.
- Wang, Z., He, S., & Li, G. (2024). Secure speech-recognition data transfer in the internet of things using a power system and a tried-and-true key generation technique. Cluster Computing, 27(10), 14669-14684.
References
Purnama, S., & Sejati, W. (2023). Internet of things, big data, and artificial intelligence in the food and agriculture sector. International Transactions on Artificial Intelligence, 1(2), 156-174. https://doi.org/10.33050/italic.v1i2.274
Güngör, O., & Yücel Güngör, M. (2024). Automation in gastronomy: use of smart cooking systems in industrial kitchens. Worldwide Hospitality and Tourism Themes, 16(2), 190-201.
Ren, R., Wang, Z., Yang, C., Liu, J., Jiang, R., Zhou, Y., ... & He, B. (2025). Enhancing robotic skill acquisition with multimodal sensory data: A novel dataset for kitchen tasks. Scientific Data, 12(1), 476.
Prajapati, A., Nigam, M., & Priyanka, R. (2024, May). RecipeLens: Revolutionizing Meal Preparation with Image-Based Ingredient Detection and Recipe Suggestions. In 2024 International Conference on Intelligent Systems for Cybersecurity (ISCS) (pp. 1-6). IEEE. https://doi.org/10.1109/iscs61804.2024.10581386
Razin, M., KR, R. K., & Ramasamy, G. (2024, November). Cross-Modal Ingredient Recognition and Recipe Suggestion using Computer Vision and Predictive Modeling. In 2024 8th International Conference on Computational System and Information Technology for Sustainable Solutions (CSITSS) (pp. 1-6). IEEE. https://doi.org/10.1109/csitss64042.2024.10816851
Coman, L. I., Ianculescu, M., Paraschiv, E. A., Alexandru, A., & Bădărău, I. A. (2024). Smart solutions for diet-related disease management: Connected care, remote health monitoring systems, and integrated insights for advanced evaluation. Applied Sciences, 14(6), 2351.
Nfor, K. A., Theodore Armand, T. P., Ismaylovna, K. P., Joo, M. I., & Kim, H. C. (2025). An explainable CNN and vision transformer-based approach for real-time food recognition. Nutrients, 17(2), 362.
Sadique, P. A., & Aswiga, R. V. (2025). Automatic summarization of cooking videos using transfer learning and transformer-based models. Discover Artificial Intelligence, 5(1), 7.
Lin, B. (2024). Reinforcement Learning in Automatic Speech Recognition (ASR): The Voice-First Revolution. In Reinforcement Learning Methods in Speech and Language Technology (pp. 79-90). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-53720-2_9
Kumar, K., Verma, A., & Verma, P. (2024). IoT-HGDS: Internet of Things integrated machine learning based hazardous gases detection system for smart kitchen. Internet of Things, 28, 101396.
Nishad, D. K., Verma, V. R., Rajput, P., Gupta, S., Dwivedi, A., & Shah, D. R. (2025). Adaptive AI-enhanced computation offloading with machine learning for QoE optimization and energy-efficient mobile edge systems. Scientific Reports, 15(1), 15263.
Abadeh, M. N. (2024). A semantic axiomatic design for integrity in IoT. Transactions on Emerging Telecommunications Technologies, 35(9), e5032.
Lu, P. M., & Zhang, Z. (2025). The model of food nutrition feature modeling and personalized diet recommendation based on the integration of neural networks and K-means clustering. Journal of Computational Biology and Medicine, 5(1). https://doi.org/10.71070/jcbm.v5i1.60
Li, X., Sun, L., Ling, M., & Peng, Y. (2023). A survey of graph neural network based recommendation in social networks. Neurocomputing, 549, 126441.
Wang, Z., He, S., & Li, G. (2024). Secure speech-recognition data transfer in the internet of things using a power system and a tried-and-true key generation technique. Cluster Computing, 27(10), 14669-14684.