Please use this identifier to cite or link to this item: http://ir.lib.seu.ac.lk/handle/123456789/6773
Title: Automated text summarization of Sinhala online articles
Authors: Akmal Jahan, M. A. C.
Wijesekara, K. K. C.
Keywords: Text Summarization
Text-Rank
TF-IDF
Sinhala Article
Issue Date: Jun-2023
Publisher: Faculty of Applied Sciences, South Eastern University of Sri Lanka, Sammanthurai.
Citation: Journal of Science, Faculty of Applied Sciences, South Eastern University of Sri Lanka, Vol. 4, (No.1), June 2023, pp. 1-14.
Abstract: Information retrieval is one of the major tasks in natural language processing applications. In digitalized world, there is a development of retrieval information from online platforms and there are abundant of information for a specific subject available in online. With the hustle and bustle, readers need to know whether the information is important according to their need within a very short time. Automated text summarization plays a key role in natural language processing applications. Many studies have been explored for summarizing different languages like English, Bengali, Hausa, Chinese, Hindi, etc. However, the local language like Sinhala is still in beginning stage. On the other hand, as a diverse country, there is a community and language diversity in Sri Lanka. Therefore, there are people who have less fluency in Sinhala as their mother-tongue is another local language like Tamil. Social media like Facebook provides platform for translation of content in a different language. However, other online platforms do not provide such translation process of the content. In such scenario, having a short summary of those articles would be an advantageous step for the readers who can easily understand the main idea of the content. Therefore, this work aims to generate an online platform that can provide a good summary for Sinhala language online articles. This research investigates extractive text summarization for Sinhala online articles using some state-of-the art algorithms in NLP applications to select a best suitable method. This work comparatively analyses the performance of TF-IDF (Term Frequency-Inverse Document Frequency) and Text-Rank algorithms for Sinhala language. Performance of the algorithms is evaluated with human generated summary from online sources using ROUGE (Recall Oriented Understudy of Gisting Evaluation) where high ROUGE score (Measure the rate of n-gram overlapping of original text and automated summary) values represent the more accurate automated summary of the article. From the results, the TF-IDF algorithm comparatively performs better for Sinhala online article summarization with medium content size.
URI: https://www.seu.ac.lk/jsc/
http://ir.lib.seu.ac.lk/handle/123456789/6773
ISSN: 2738-2184
Appears in Collections:Volume 04 No.1

Files in This Item:
File Description SizeFormat 
Automated text summarization of Sinhala 1-15.pdf805.04 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.