Enhancing Image Captioning and Auto-Tagging Through a FCLN with Faster R-CNN Integration
Journal Title: Information Dynamics and Applications - Year 2024, Vol 3, Issue 1
Abstract
In the realm of automated image captioning, which entails generating descriptive text for images, the fusion of Natural Language Processing (NLP) and computer vision techniques is paramount. This study introduces the Fully Convolutional Localization Network (FCLN), a novel approach that concurrently addresses localization and description tasks within a singular forward pass. It maintains spatial information and avoids detail loss, streamlining the training process with consistent optimization. The foundation of FCLN is laid by a Convolutional Neural Network (CNN), adept at extracting salient image features. Central to this architecture is a Localization Layer, pivotal in precise object detection and caption generation. The FCLN architecture amalgamates a region detection network, reminiscent of Faster Region-CNN (R-CNN), with a captioning network. This synergy enables the production of contextually meaningful image captions. The incorporation of the Faster R-CNN framework facilitates region-based object detection, offering precise contextual understanding and inter-object relationships. Concurrently, a Long Short-Term Memory (LSTM) network is employed for generating captions. This integration yields superior performance in caption accuracy, particularly in complex scenes. Evaluations conducted on the Microsoft Common Objects in Context (MS COCO) test server affirm the model's superiority over existing benchmarks, underscoring its efficacy in generating precise and context-rich image captions.
Authors and Affiliations
Shalaka Prasad Deore, Taibah Sohail Bagwan, Prachiti Sunil Bhukan, Harsheen Tejindersingh Rajpal, Shantanu Bharat Gade
Intelligent Road Crack Detection Using Fuzzy Logic and Multi-Scale Optimization
Accurate detection of road cracks is essential for maintaining infrastructure integrity, ensuring road safety, and preventing costly structural damage. However, challenges such as varying illumination conditions, noise,...
Comparative Analysis of Seizure Manifestations in Alzheimer’s and Glioma Patients via Magnetic Resonance Imaging
A notable association between Alzheimer's Disease and Epilepsy, two divergent neurological conditions, has been established through previous research, illustrating an elevated seizure development risk in individuals diag...
Critical Factors Influencing Cloud Security Posture of Enterprises: An Empirical Analysis
This study examines the aspects that can impact an organization's cloud security posture and the consequences for their cloud adoption strategies. Based on a thorough examination of existing literature, a conceptual fram...
Extraction of Judgment Elements from Legal Instruments Using an Attention Mechanism-Based RCNN Fusion Model
In the field of jurisprudence, judgment element extraction has become a crucial aspect of legal judgment prediction research. The introduction of pre-trained language models has provided significant momentum for the adva...
Enhancing Data Storage and Access in CSN Labs with Raspberry Pi 3B+ and Open Media Vault NAS
The purpose of this study was to devise a more efficient system for data storage and exchange in the Computer System and Network (CSN) Laboratory at Ibn Khaldun Bogor University. Open Media Vault (OMV) software and Raspb...