IMAGE CAPTION GENERATOR USING CNN AND LSTM

Main Article Content

Mrs. Busani Sravani
S. Sreepragna
R. Madhuri
V. Roja

Abstract

Machine learning is now all the rage in the AI world. We have recently used AI to construct very clever devices with exceptional performance. Deep learning is a subset of machine learning that produces very accurate findings, which in turn indicates very good performance. Apps for picture description make use of deep learning in our study. Providing a description of a picture's content is what image description is all about. Object and action detection in the input picture is the foundation of the notion. When describing images, there are primarily two methods: bottom-up and top-down. Bottom-up methods create captions by combining the information of an input picture. Using different architectures, such as recurrent neural networks, top-down methods provide a semantic representation of an input picture, which is then translated into a caption. One potential advantage of picture description is that it might aid those with visual impairments in comprehending what is shown in online images. What follows is an explanation of the specifics. Looking at the image below, what can you make out?

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Article Details

How to Cite
Sravani, B., Sreepragna, S. ., Madhuri, R. ., & Roja, V. . (2024). IMAGE CAPTION GENERATOR USING CNN AND LSTM. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 15(3), 266–277. https://doi.org/10.61841/turcomat.v15i3.14800
Section
Articles

References

Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, and David Forsyth. Every picture tells a story: Generating sentences from images. In Proceedings of the 11th European Conference on Computer Vision: Part IV, ECCV’10, pages 15–29, Berlin, Heidelberg, 2010. Springer-Verlag.

Polina Kuznetsova, Vicente Ordonez, Alexander C. Berg, Tamara L. Berg, and Yejin Choi. Collective generation of natural image descriptions. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1, ACL ’12, pages 359–368, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics.

Siming Li, Girish Kulkarni, Tamara L. Berg, Alexander C. Berg, and Yejin Choi. Composing simple image descriptions using web-scale n-grams. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning, CoNLL ’11, pages 220–228, Stroudsburg, PA, USA, 2011. Association for Computational Linguistics. [4] Xinlei Chen and C. Lawrence Zitnick. Learning a recurrent visual representation for image caption generation. CoRR, abs/1411.5654, 2014. [5] Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, and Alan L. Yuille. Deep captioning with multimodal recurrent neural networks (m-rnn). CoRR, abs/1412.6632, 2014. [6] Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. Show and tell: A neural image caption generator. CoRR, abs/1411.4555, 2014.

Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo. Image captioning with semantic attention. CoRR, abs/1603.03925, 2016.

Ryan Kiros, Ruslan Salakhutdinov, and Richard S. Zemel. Unifying visual-semantic embeddings with multimodal neural language models. CoRR, abs/1411.2539, 2014.

Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. Long-term recurrent convolutional networks for visual recognition and description. CoRR, abs/1411.4389, 2014.

Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. ¨ Neural Comput., 9(8):1735–1780, November 1997.

Andrej Karpathy and Fei-Fei Li. Deep visual-semantic alignments for generating image descriptions. CoRR, abs/1412.2306, 2014.

Hao Fang, Saurabh Gupta, Forrest N. Iandola, Rupesh Kumar Srivastava, Li Deng, Piotr Dollar, ´ Jianfeng Gao, Xiaodong He, Margaret Mitchell, John C. Platt, C. Lawrence Zitnick, and Geoffrey Zweig. From captions to visual concepts and back. CoRR, abs/1411.4952, 2014.

Ankit Kumar, Ozan Irsoy, Jonathan Su, James Bradbury, Robert English, Brian Pierce, Peter Ondruska, Ishaan Gulrajani, and Richard Socher. Ask me anything: Dynamic memory networks for natural language processing. CoRR, abs/1506.07285, 2015.

Alex Graves. Generating sequences with recurrent neural networks. CoRR, abs/1308.0850, 2013.

Karol Gregor, Ivo Danihelka, Alex Graves, and Daan Wierstra. DRAW: A recurrent neural network for image generation. CoRR, abs/1502.04623, 2015.