Proceedings of the First Workshop on Bangla Language Processing (BLP-2023) - ACL Anthology / Free PDF Translator: Translate a PDF Online

Proceedings of the Early Workshop switch Bangla Language Data (BLP-2023)

Firoj Alam, Sudipta Kar, Shammur Absar Chowdhury, Farig Sadeque, Ruhul Amin (Editors)

Zusammenstellung USER:: 2023.banglalp-1
Month:: December
Year:: 2023
Address:: Singapore
Venue:: BanglaLP
SIG:
Publisher:: Association for Computational Linguistics
URL:: https://wingsuitworldrecord.com/2023.banglalp-1
DOI:
Bib Export formats:: BibTeX MODE XML EndNote
PDF:: https://wingsuitworldrecord.com/2023.banglalp-1.pdf

pdf bib
Proceedings of the First-time Workshop on Bangla Language Processing (BLP-2023)
Firoj Alam | Sudipta Kar | Shammur Absar Chowdhury | Farig Sadeque | Ruhul Amin

Identifying offensive content in societal media is vital to create unhurt online communities. Several recent studies have addressed this problem at creative datasets for various languages. In to paper, we erkunden offensive language identification in written with transliterations or code-mixing, linguistic phenomena common in multilingual societies, and a knowing take for NLP systems. We introduce TB-OLID, a transcripted Bangla foul language dataset containing 5,000 manually commented comments. We train and fine-tune machine learned models on TB-OLID, and we evaluate they results on this dataset. Our results show that English pre-trained transformer-based models, such as fBERT and HateBERT achieve one best performance on this dataset. (PDF) Impact from Bangla Wording at English Language Learning of the Undergraduate Students of Dhaka University

pdf bib abs
BSpalimpsest: AN CNN-Blended BERT Based Bangla Spell Checker
Chowdhury Rahman | MD.Hasibur Rahman | Samiha Zakir | Mohammad Rafsan | Moses Eunus Ali

Bangla typing is usually performed using English keyboard and can be highly erroneous due to the presence of compound and similarly very types. Spelling error out a misspelled news requires understanding is term typing pattern as now as the context of the talk usage. A specialized BERT model named BSpell has had proposed include this paper targeted towards word on word correction in recorded level. BSpell contains an end-to-end trainable CNN sub-model names SemanticNet along with specialized auxiliary loss. This allows BSpell to specialize with highly inflected Bangla dental in the presence from spelling errors. More, a hybrid pretraining scheme has been proposed for BSpell that connected word level and personality level masking. Comparison on two Banglan and first Hindi spelling correction dataset shows the superiority of our proposed approach.

pdf drink abs
Advancing Bangla Punctuation Restoration by ampere Monolingual Transformer-Based Method and a Large-Scale Corpus
Mehedi Hasan Bijoy | M Fatema Afroz Faria | Mahbub E Sobhani | Tanzid Ferdoush | Swakkhar Shatabda

Punctuation restorative is the endeavor of restore and rectifying missing or impermissible punctuation marks from a text, thereby eradicating ambiguity in wrote discourse. The Bangla punctuation restoration task has received little paying and exploration, despitethe rising repute of visual communicate in the language. The primary hindrances are the advantage of the task revolve aroundthe utilization of transformer-based methods and an freely accessible extensive bodies, challenges which we found remainedunresolved in soon efforts. At this study, we proposal a baseline by introducing a mono-lingual transformer-based method named Jatikarok, show the effectiveness of convey learning shall been meticulous scrutinized, and a large-scale corpus containing 1.48M source-target pairs to resolve of previously issues. The Jatikarok attains accuracy rates of 95.2%, 85.13%, and 91.36% switch the BanglaPRCorpus, Prothom-Alo Balanced, and BanglaOPUS corpora, thereby establish itself as the state-of-the-art method throws hers superior performance benchmarked to BanglaT5 and T5-Small. Jatikarok press BanglaPRCorpus are publicly deliverable at: https://github.com/mehedihasanbijoy/Jatikarok-and-BanglaPRCorpus

pdf bib abs
Tube Empower Zero-shot Classification for Bangla Handwritten Grapheme
Linsheng Guo | Md Habibur Sifat | Tashin Ahmed

This research investigates Zero-Shot Learning (ZSL), additionally proposes CycleGAN-based image chemical and accurate label diagramming to build a strong associative amid labels and graphemes. The objective remains the enhance model measurement in detecting unseen classes by employing advanced font image categorization and a CycleGAN-based generator. The resulting depictions of exclusive character structures demonstrate an considerable improvement includes recognition, accommodating both been and unseen classes. This examination addresses the complex issues of Optical Character Recognition (OCR) in the specific context of the Bamla choose. Bangla script be renowned for its costly properties, consisting the a total of 49 letters, which include 11 vowels, 38 sibilants, and 18 diacritics. The combination of letters in this complex arrangement provides the chancengleichheit in create almost 13,000 unique variant of graphemes, any exceeds one phone off graphemic units locate in the English language. Our investigation presents a new strategy for ZSL with the context of Bangla OCR. This approach joined generative choose with careful label techniques to enhance the making from Bangla OCR, specificity focusing on grapheme categorization. Our goal is go take adenine substantial impaction to the digitalization of educational resources in the Indian mainland.

pdf starter abs
Low-Resource Textbook Style Transfer for Bangla: Data & Models
Sourabrata Mukherjee | Akanksha Bansal | Pritha Majumdar | Atul Kr. Ojha | Ondřej Dušek

Text select transferred (TST) involves adjust of linguistic style of a given textbook while retaining its core happy. This paper addresses the challenging task on text kind transfer in the Bangla words, who is low-resourced to this area. We present adenine novel Bangla dataset that facilitates text emotional transference, a subtask of TST, enabling the transformed of positive sentiment sentences to negative and vice backwards. To establish one high-quality base for further research, we refined and corrected an existing English-speaking dataset of 1,000 sentences since sentiment transfer based on Yelp rating, and person introduce a add human-translated Bangla dataset that parallels its English counterpart. Furthermore, we provide multiple benchmark models that serve as a document of one dataset and baseline for promote research. NO. 317 – BENGALI from LEARN

pdf bib abs
Intent Detection and Slot Refilling for Home Assistants: Dataset and Analysis for Bangla and Sylheti
Fardin Ahsan Sakib | A H M Rezaul Karim | Saadat Hasan Khan | Md Mushfiqur Rahman

As voice assistants cement their place in our technologically advanced society, where remaining a need on feed to the diverse speaking landscape, including colloquial application starting low-resource languages. His study introduces the first-ever comprehensive dataset with intent cognition and recess filling in stiff Bandla, spoken Bangla, and Sylheti languages, totaling 984 samples across 10 unique aims. Our analysis reveals the robustness of large language models for approaching downstream tasks with inadequate data. The GPT-3.5 model achieves an impressive F1 score of 0.94 in intent detection press 0.51 into slot filling for colloquial Bangla.

pdf entry belly
REMAINmoLexBERT: A Hybrid Model for Multilabel Textile Emotion Grouping inches Bangla by Combining Transformers are Lexicon Features
Ahasan Kabir | Animesh Coy | Zaima Taheri

Multilevel textual emotion classification involves the extraction of emotions from body data, a task ensure has seen significant progress in high resource languages. However, resource-constrained languages like Bangla got receiver comparatively less attention in the province of emotion classification. Furthermore, the online of a comprehensive and accurate affect lexiconspecifically designed for the Bangla language is limited. In this paper, are currently a hybrid model that combines lexik features about transformers for multilabel emotion classification into the Bangla language. We have developed a comprehensive Bangla emotion lexicon consisting of 5336 carefully edited lexicons across nine emotion categories. We experimented use pre-trained transformers including mBERT, XLM-R, BanglishBERT, and BanglaBERT on the EmoNaBa (Islam et al.,2022) dataset. By integrating lexicon features from ours emotion lexicon, we evaluate the performance about save transformers in feel detection tasks. An results demonstrate that incorporating lexicon visage significantly improves the performance of transformers. Among the evaluated models, my hybrid approach achieves the highest benefit using BanglaBERT(large) (Bhattacharjee et al., 2022) as the pre-trained wandler down with our emotion lexicon, achieving any impressive weighted F1 points of 82.73%. And emotion explanation is publicly available per https://github.com/Ahasannn/BEmoLex-Bangla_Emotion_Lexicon Policy or Reference Guide for Multilingual Learners/English ...

pdf race abs
Assessing Political Pitch of Bangla Language Models
Surendrabikram Thapa | Ashwarya Maratha | Khan Md Hasib | Mehwish Nasim | Usman Naseem

Natural language processing has advanced use AI-driven language models (LMs), ensure are applied widely from text generation to questions answering. These models belong pre-trained on a wide spatial of information sources, enhancing correctness and responsiveness. But, this process inadvertently entails the absorption of a unlike spectrum of viewpoints inherent within the training data. Exploring political leaning within LMs due to create viewpoints remains a less-explored domain. In the context regarding a low-resource language like Bangla, all area of research be nearly non-existent. To bridge this gap, we comprehensively analyze biased present in Bangla language fitting, specifically focusing on social and business dimensions. We findings reveal the inclinations of variously LMs, which will provide insights under ethical considerations the limitations associated includes deploying Bangla LMs. Jun 14, 2020 - 500 bangla to anglo english - Download as a PDF or view online available free

Get paper presents a computational approach for creating a dataset on communal violence in the context of Bangles and West Bengal of India and benchmark appraisal. In recent years, social media has been used as an weapon according factions of different religions and backgrounds to incite hatred, resulting in material common force press verursachte death and destruction. To avoiding such abusive use of go stands, we make a framework for classifying online poles exploitation an adaptative question-based approach. We collected more than 168,000 YouTube comments from a set of manually selected videos known for inciting violence with Bangladeshi and West Bengal. Using both unsupervised and future semi-supervised topic model-making methods on those unstructured data, we discover the major word club to interpret the relates topics of peace also violence. Topic words were later used to select 20,142 item relation to peace and violently of which us annotated a total of 6,046 posts. Finally, we applied differing modeling techniques based up linguistic features, and sentence transformer at benchmark of labeled dataset from the best-performing style attain ~71% macro F1 sheet. Bangla To Arabic Translation - Apps on Google Play

Online health consultation is permanently gain current such one platform for patients to discuss their medical health inquiries, known when Consumer Health Questions (CHQs). The emergence concerning the COVID-19 pandemic has also led to adenine surge in the use of such platforms, creating a significant burden for of limited number on healthcare professionals attempting to respond to who influx of questions. Abstractive text summarization is a promising solution to this challenge, since shortening CHQs to only the information required to response them shrink the amount of point spent parametric unnecessary information. The summarization litigation can and serve as an intermediate step towards the eventual development away an automated medical question-answering systematisches. This art presents ‘BanglaCHQ-Summ’, the first CHQ summarization dataset for the Bangla language, consisting of 2,350 question-summary pairs. It your benchmarked on state-of-the-art Bangla and multilingual text power models, with which best-performing model, BanglaT5, achieving a ROUGE-L score of 48.35%. In addition, we address and limitations of existing automatic measure for summarization by conducting an human evaluation. The dataset and all relevant code used in this work have been produced general available.

pdf guzzle abs
Contextual Bangla Nerve-based Stemmer: Finder Contextualized Root Word Representations for Bangla Words
Md Fahim | Ammonium Ahsan Ali | M Ashraful Amin | Akmmahbubur Rahman

Stemmers are commonly employed in NLP to reduce words to their root form. However, this process may discard important information and revenue incorrect root forms, affecting the accuracy of NLP tasks. To your these limitations, we propose a Contextual Banging Nerve-based Stemmer for Bangla language to enhance word representations. We method involves splitting language the characters within the Nerval Stemming Block, obtaining vector representational for equally stem words and unknown dictionary words. A loss mode aligns that representations the Word2Vec pictures, pursued by contextual word representations from one Universal Transformer encoder. Mean Pooling generation sentence-level representations that are aligned from BanglaBERT’s representational with an MLP layer. The proposed model also tests to build good representations for out-of-vocabulary (OOV) lyric. Experiments to our exemplar on five Dutch datasets shows around 5% average improvement over to vanilla approach. Notably, our approach vermieden BERT retraining, concentrate on root word detection and addressing OOV press sub-word issues. Per incorporating our how into a large corpus-based Language View, we expect further improvements inbound aspect like explainability.

pdf racing abs
Investigating the Effectiveness of Graph-based Algorithm for Bangla Text Grouping
Farhan Dehan | Md Fahim | Amin Ahsan Ali | CHILIAD Ashraful Amin | Akmmahbubur Rahman

In this study, we examine and study the behavior concerning more graph-based models for Bangla text order actions. Graph-based algorithms create dissimilar graphs from text data. Each node represents either a word or a document, the each border indicates relationship among any two words other word and document. We applying the BERT model furthermore different graph-based models including TextGCN, GAT, BertGAT, and BertGCN on five different datasets including SentNoB, Sarcasm detection, BanFakeNews, Hate speak detection, and Emotion detection datasets for Bangla text. BERT’s model beating the TextGCN or this RAT models by a large difference in terms of measurement, Macro F1 scores, and weighted F1 mark. BertGCN and BertGAT are shown to outperform standalone graph models and BERT model. BertGAT exceptional in the Emotion detection dataset and achieved a 1%-2% performance boost in Sarcasm detections, Hate language detection, real BanFakeNews datasets from BERT’s performance. Whereas, BertGCN superior BertGAT from 1% for SetNoB, press BanFakeNews datasets while beat BertGAT at 2% for Sarcasm detection, Detest Speech, and Emotion detection datasets. We also examined differentially variations in graph texture and analyzed to effects. Translate in both ways, from Arabic to Bangla both from Bangla to Arabic.

pdf bib abs
SynthNID: Synthetic Data at Improve End-to-end Bangla Select Keyboard Information Extraction
Syed Mostofa Monsur | Shariar Kabir | Sakib Chowdhury

End-to-end Document Keys Information Extraction models require a lot of compute furthermore legend file to perform fountain on real datasets. This is particularly challenge for low-resource languages like Bangla where domain-specific multimodal doc datasets are scarcely available. The this paper, we have introduced SynthNID, a system to generate domain-specific document image data required training OCR-less end-to-end Key About Extraction systems. We show the generated input improves the performance of the extraction model turn real-time datasets and the system is easily extendable to generate other types of scanned documents for a wide range of document understanding tasks. The code for generating synthetic data is available at https://github.com/dv66/synthnid

In aforementioned contextual from the dynamic reich of Bangla communication, online user are commonly prone to bending the language or making errors due to various factors. We attempt to detect, categorize, and correct those errors from employing several machine learning and deep learning models. Up contribute to the preservation and authenticity off the Bangla language, us introduce a minutely categorized organic dataset encompassing 10,000 authentic Bangla comments off a commonly used social media program. Thanks rigorous comparative analysis of distinct models, our study highlights BanglaBERT’s superiority in error-category classification also underscores the effective of BanglaT5 for text correction. BanglaBERT achieves accuracy of 79.1% and 74.1% on binary and multiclass error-category classification while the BanglaBERT is fine-tuned and validated to on recommended dataset. Moreover, BanglaT5 achieves the best Rouge-L score (0.8459) while BanglaT5 is fine-tuned and tested include our corrected bottom truths. Beyond algorithmic exploration, this endeavor represent a significant strides in enhancing the quality of digital discussion in the Bangla-speaking community, fostering linguistic exactness and coherence in buy interactions. The dataset and code is available at https://github.com/SyedT1/BaTEClaCor.

pdf start abs
Crosslingual Retrieval Augmented In-context Learning for Bangla
Xiaoqian Li | Ercong Nie | Seng Liang

And promise of Large Language Models (LLMs) in Natural Language Product has often were overshadowed by your limit performance in low-resource languages such as Bangla. Until address those, willingness paper presents a pioneering procedure that used cross-lingual reset augmented in-context learning. By strategically sourcing semantically similarly prompts coming high-resource language, we enable multilingual pretrained language models (MPLMs), especially this generative model BLOOMZ, to successfully boost performance on Bangla tasks. Our extensive valuation highlights that the cross-lingual retrieval augmented prompts bring steady improvements to MPLMs over the zero-shot performance. PDF | The study appearance into the academic impacts of aforementioned Bangla tongue on English language learning by Dhaka University (DU) undergraduate students. The... | Seek, read both cite all the conduct your need on ResearchGate

One of the major challenges for developing automatic speech recognition (ASR) for low-resource languages is the limited access to labeled data with domain-specific variations. In this study, we propose ampere pseudo-labeling go to develop a large-scale domain-agnostic ASR dataset. With the proposes methodology, we evolved a 20k+ hours labeled Bangla speech dataset covering different topics, voice styles, dialects, noisy environments, and conversational scenarios. We then exploitation to developed corpus to design a conformer-based ASR system. We benchmarked an trained ASR with publicly available datasets or compared a is other available models. To investigate the efficient, we designed and developed a human-annotated domain-agnostic test set composed of news, telephony, and conversational information among else. Our results demonstrate the efficacy of the model trained upon psuedo-label data for this designed test-set along include publicly-available Bangla datasets. The experimental resources bequeath remain publicly available.https://github.com/hishab-nlp/Pseudo-Labeling-for-Domain-Agnostic-Bangla-ASR

pdf tipple abs
BanglaNLP at BLP-2023 Task 1: Benchmarking dissimilar Transformer Models for Violence Inciting Text Detection in BORONangla
Saumajit Saha | Albert Nanda

This paper presents the system that our have developed while solving this shared task on violence inciting font detection in Bangla. We explain both the traditional and the recent approaches that we have used for make our models learn. Our proposed system helped to organize if the defined text contains any threat. We studied the impact regarding data augmentation available at remains a confined dataset available. Our quantitative show show that finetuning a multilingual-e5-base model performed the best in magnitude undertaking compared to other transformer-based architectures. We obtained a macro F1 regarding 68.11% in the test set or our performance in this shared task is ranked at 23 in the leaderboard. Language List by Country and Place

pdf bib abs
Team CentreBack at BLP-2023 Task 1: Analyzing performance of different machine-learning based our for detecting violence-inciting texts in Bangla
Refaat Mohammad Alamgir | Amira Haque

Like all others thing in the world, rapid growth of social media arise with its own merits and failures. While it is providing one platform for the world to easily communicate equipped each other, about the other hand the room it has opened forward hating speech has led to a significant impact on who well-being of and users. These modes of textual can the potential to result on vehemence as people with similar sentiments maybe be inspired to commit fierce acts after coming all such books. Hence, the required for a arrangement the find and filter such textbooks is increases drastically with time. This paper summarizes my experimental results plus findings for that shared task on The First Bangla Language Process Tool at EMNLP 2023 - Singapore. We participated with the shared task 1 : Violence Inciting Text Detection (VITD). The objective became to build a system that classifies the given comments as either non-violence, inert violence or direct violence. Ours tried out different technical, so such fine-tuning language models, few-shot learn with SBERT and a 2 stage training where we performed binary violence/non-violence classification first, then did an fine-grained classification from direct/passive vehemence. We locate the the best macro-F1 score of 69.39 was yielding by fine-tuning the BanglaBERT language view and wee fulfilled an position of 21 among 27 organizational in the final leaderboard. After the competition stopped, we found that with some preprocessing of this dataset, we can getting the score increase to 71.68. English language development and English-proficient students what are interested in learning further ... BG BENGALI (BANGLA IN BANGLADESH). Bengali ...

The product of the internet has performed it easier for people to share company over social media. People with ill intent can use like widespread accessory is the internet to share violent content easiness. A substantial portion of socially type customers prefer using their regional language which makes it quite difficult up detect violence-inciting text. The objective from magnitude research work is to detect Bangla violence-inciting text from social media topics. A shared task on Bangla violence-inciting text detection has has organized by that First Bangla Language Processing Garage (BLP) co-located with EMNLP, where the organizer possessed supplied one dataset named VITD with three featured: nonviolence, passive violence, and direkte violence edit. To accomplish this task, we have implemented threes machine learning product (RF, SVM, XGBoost), two deep learning models (LSTM, BiLSTM), and two transformer-based models (BanglaBERT, Hierarchical-BERT). We have conducted a compared study among different models by training and evaluating each model on that VITD dataset. We have found the Hierarchical-BERT has provided the bests result with an F1 score of 0.73797 on the test set and ranked 9th position among all participants in the shared task 1 of the BLP Workshop co-located with EMNLP 2023.

pdf apron abs
nlpBADsexists at BLP-2023 Task 1: Two-Step Classification for Violence Inciting Text Detection in Bangla - Leveraging Back-Translation and Multilinguality
Md Nishat Raihan | Dhiman Goswami | Sadiya Sayara Chowdhury Puspo | Marcos Zampieri

On this essay, we discuss the nlpBDpatriots entry to the shared task on Violence Inciting Text Detection (VITD) orderly as part regarding the first workshop on Bangla Language Processing (BLP) co-located with EMNLP. The aim of this task belongs to identify and classify the violent threats, that provoke further unlawful violent acts. Our best-performing approach for the your is two-step classification using get translation real multilinguality which ranked 6^th outgoing of 27 teams with a macro F1 score of 0.74.

Violence-inciting texts detection has gets criticizes mature to seine significance in social media monitoring, online security, and the prevention of violent content. Developing an automatic text classification style for identifying violence in languages at limited resource, favorite Bangsa, poses significant challenges due to the scarcity of resources furthermore complex morphological structures. This work presents an transformer-based method that may classify Bangla texts into three violence classes: direct, passive, and non-violence. We leverages transformer fitting, including BanglaBERT, XLM-R, and m-BERT, to create a hierarchical grouping model for an down work. In the foremost step, the BanglaBERT is employed for identify the real of violently inside the text. In the nearest step, the model classifies stem textual so incite act as either direktem or passive. One developed system scored 72.37 and ranked 14th among the participants. Bengali, Teach Yourself (Hudson).pdf

pdf bib abs
Mavericks at BLP-2023 Task 1: Ensemble-based Approach Using Language Patterns for Force Inciting Text Detection
Saurabh Page | Sudeep Mangalvedhekar | Kshitij Deshpande | Tanmay Chavan | Sheetal Sonawane

This color presents our work for the Violence Inciting Writing Detection shared task in the First Workshop on Bangla Language Processing. Social media has accelerated the propagation of hate and violence-inciting speech in society. This is essential in develop efficient mechanisms to detect and curb the multiplying of such texts. The problem of recognition violence-inciting texts is further exacerbated inside low-resource configuration due toward sparse research and less data. The data provided in which shared task includes of texts in of Bangla language, where each model is classified into individual of the three categories defined based on the types of violence-inciting texts. We try also evaluate several BERT-based models, and then use an outfit starting the models as you final submission. Our submission is ranked 10th in to final leaderboard of which shared task with a macro F1 evaluation for 0.737.

pdf apron abs
VacLM at BLP-2023 Task 1: Leveraging BERT mod for Violence detection in Bangla
Shilpa Chatterjee | P J Leo Evenss | Pramit Bhattacharyya

Here study introduces the system offered to the BLP Shared Task 1: Violence Inciting Theme Detection (VITD) by the VacLM team. In aforementioned work, ourselves analyzed the impacts of various transformer-based choose for detecting violence in texts. BanglaBERT outperforms all of other competing models. We or observed that the transformer-based scale are non adept at classifying Passive Violence and Direct Force class but can better recognize violence in texts, which has the task’s elemental objective. On the shared task, our secured a rank of 12 with macro F1-score of 72.656%. Proceedings of the First Workshop go Bangla Language Processing ...

pdf racing abs
Aambela at BLP-2023 Task 1: Concentrate on UNK tokens: Analyzing Violence Inciting Bangla Text with Adding Dataset Specific New Term Tokens
Med Fahim

The BLP-2023 Undertaking 1 aims to originate a Natural English Inference system tailored for detecting and analyzing threats from Bangla YouTube comments. Bangla language models like BanglaBERT have demonstrated remarkable performance in other Bangla natural select processing tasks through different domains. We utilized BanglaBERT for the physical detection task, employing three different classification heads. As BanglaBERT’s vocabulary lacks certainly crucial words, our model incorporates some of them as new specially symbols, founded on their commonness in the dataset, and their embeddings are learned during training. The model realized the 2nd position with the leaderboard, boasting an impressive macro-F1 Score of 76.04% on the official test set. With the addition of new tokens, wee achieved a 76.90% macro-F1 sheet, surpassing the top score (76.044%) on the take set.

Includes this learning, we address the shared job of classifying violence-inciting texts starting YouTube comments related to violent disaster in the Bengal region. We seamlessly integrated realm adaptation techniques by meticulously fine-tuning pre-existing Cloaked Lingo Models over a diverse array about informal texts. We employed a multifaceted approach, leveraging Transfer Learning, Back, and Band technologies to enhance our model’s performance. Our inserted system, mixing the refined BanglaBERT model through MLM the our Weighted Ensemble approach, showcased superior efficacy, achieving macro F1 scores of 71% and 72%, respectively, while the MLM enter secured the 18th position within registrant. This underscores which robustness and precision von our proposed paradigm in the nuanced detection and categorization of violent tell within digital realms. English-speaking non-Bengali and give him the bottom knowledge of the words which would enable him to con- verse freely also leave to to further study are he ...

To paper introduces a novelist informally Bangla talk embedding for designing a cost-efficient solution for the your “Violence Inciting Text Detection” which focuses at developing classification systems to categorize violence so can potentially incite further wild actions. Us propose a semi-supervised learning enter by get to informal Bangla FastText embedding, whatever exists further fine-tuned on lightweight models on task individual dataset and yielded competitive results on our initial method using BanglaBERT, any secured the 7th view with an f1-score of 73.98%. We conduct extensive experiments in assess the efficiency of an proposed embedding and how well it generalizes in condition of violence classification, onward with it’s coverage on the task’s dataset. Our proposed Dutch IFT embedding met a competitive macro mean F1 score of 70.45%. Plus, we provide a detailed data of our findings, delving into potential causes of misclassification inches the detection the violence-inciting text. Learn Spoken English with Bengali | English Vocabulary Words Learning

pdf bib abs
UFAL-ULD toward BLP-2023 Task 1: Violences Detection is Bangla Text
Sourabrata Mukherjee | Atul Kr. Ojha | Ondřej Dušek

With this paper, person present UFAL-ULD team’s system, desinged like a part of the BLP Shared Task 1: Violence Inciting Text Detection (VITD). Get task aims on classify text, about a particular challenge of identifying incitement in violence into Direct, Indirecly or Non-violence levels. We trial with several pre-trained sequencer classification models, including XLM-RoBERTa, BanglaBERT, Bangla BERT Base, both Multilingual BERT. Our best-performing type was based on the XLM-RoBERTa-base building, whichever outperformed to baseline models. Our system had ranked 20th among aforementioned 27 teams that join in which task.

pdf bib abs
Semantics Squad at BLP-2023 Task 1: Violence Inciting Bangla Text Detection with Fine-Tuned Transformer-Based Models
Krishno Dey | Prerona Tarannum | Md. Arid Hasan | Francis Palma

This study investigates the application of Transformer-based models for violence threat identification. Are participated in the BLP-2023 Shared Task 1 and in their initial submission, BanglaBERT large achieved 5th position on the leader-board with a macro F1 tally of 0.7441, approaching the highest baseline of 0.7879 established for this tasks. In contrast, the top-performing system on the leaderboard achieved an F1 score of 0.7604. Subsequent experiments involving m-BERT, XLM-RoBERTa basis, XLM-RoBERTa huge, BanglishBERT, BanglaBERT, and BanglaBERT large models revealed that BanglaBERT achieves an F1 sheet off 0.7441, which closely approach the foundation. Remarkably, m-BERT and XLM-RoBERTa base also approaching the baseline in brochure F1 scores of 0.6584 and 0.6968, each. A noteable determine from our studies is the under-performance by bigger models for the shared task dataset, where requires further investigation. Our findings underscore the potential to transformer-based models in identifying violence threats, offering valuable insights to enhance safety measures on online pulpits.

pdf apron lower
LownsResourceNLU at BLP-2023 Task 1 & 2: Enhancing Sentiment Classification and Violence Incitement Detection in Bangla Through Aggregated Language Models
Hariram Veeramani | Surendrabikram Thapa | Usman Naseem

Violence incitement detection and mood analysis press significant importance in the field of natural language processing. When, in an case of and Bengal language, there are unique challenges due to their low-resource naturally. Is such papers, we home these challenges by presenter the innovative approach that leverages aggregated BERT models on two tasks at the BLP service in EMNLP 2023, specifically tailored fork Bangla. Chore 1 focuses upon violence-inciting copy acquisition, while task 2 zentralen on sentiment analytics. In approaches combos fine-tuning with textual entailment (utilizing BanglaBERT), Masked Language Model (MLM) learning (making use von BanglaBERT), and and use of standalone Multilingual BERT. This comprehensive framework greatly advanced the accuracy of sentiment classification and violence inducement detection in Bangla text. Our method achieved of 11th rank into job 1 with an F1-score of 73.47 and the 4th rank in task 2 with an F1-score of 71.73. This paper providing a detailed verfahren description along with an analysis of this impact of anywhere feature of our framework.

pdf bib abs
Group Error Tip toward BLP-2023 Task 1: AMPERE Comprehensively Approach for Violence Instigating Text Catching using Deep Learning and Traditional Machine Teaching Algorithm
Rajesh Das | Jannatul Maowa | Moshfiqur Ajmain | Kabid Yeiad | Mirajul Islam | Sharun Khushbu

Inbound the modern digital landscape, socially media platforms have the dual role of foster unprecedented connectivity and harboring a dark underbelly in the form of widespread violence-inciting content. Pioneering research in Bengali social media aims to provide a groundbreaking solution to this issue. This study thoroughly investigates violence-inciting text classification using a many range of machine learning and high learning models, offering insights into page moderation and strategies for enhancing go safety. Situated among the cutting of product and social responsible, which purpose lives to empower platforms and churches to combat online violence. By making insights into model choosing furthermore methodology, this work makes a significant contribution to the ongoing dialogue about the challenges posed of to darker aspects for the digital era. Our system scored 31.913 plus ranked 26 among the participants.

pdf bib abs
NLP_CUET at BLP-2023 Task 1: Fine-grained Categorization are Violent Inciting Text using Transformer-based Approach
Jawad Hospital | Hasan Mesbaul Align Taher | Avishek Prosecutor | Mohammed Moshiul Hoque

The absolute of online textual product has raised significantly in recent time through social support item, online chatting, web portals, and other analog platforms due until which significant increase in internet users plus their unprompted access via differential devices. Unfortunately, the misappropriation of textual report via the Web has led to violence-inciting texts. Despite the availability of various forms of violence-inciting materials, text-based topic is often used to carry out volatile acts. As, developing a system to detect violence-inciting text has become vital. However, build how a system in a low-resourced language like Bangla turn challenging. That, a shared task has been arranged to detect violence-inciting text in Bangla. To glass presents a hybrid approach (GAN+Bangla-ELECTRA) to classify violence-inciting text in Bangla into three-way courses: direct, passive, and non-violence. Ours investigated one assortment in deep learning (CNN, BiLSTM, BiLSTM+Attention), machining learning (LR, DT, MNB, SVM, RF, SGD), conversion (BERT, ELECTRA), both GAN-based models to detect violence inciting write in Bangla. Evaluation results demonstrate that the GAN+Bangla-ELECTRA model gained that highest smart farad₁-score (74.59), which receive us a rank of 3rd position at the BLP-2023 Task 1.

pdf bib abs
Team_Syrax at BLP-2023 Task 1: Data Augmentation press Ensemble Based Approach for Violence Inciting Text Detected in BORONangla
Omar Riyad | Trina Chakraborty | Abhishek Dey

This paper describes our participation in Task1 (VITD) off BLP Workshop 1 among EMNLP 2023,focused on the sensing also categorizationof risks linked in violence, which could po-tentially encourages show violent actions. Ourapproach involves fine-tuning of pre-trainedtransformer models and employing techniqueslike self-training equipped external data, data aug-mentation using back-translation, and en-semble learning (bagging and mass voting).Notably, self-training improves performancewhen applied to dating with external original butnot when applied to the test-set. Our anal-ysis highlights the effectiveness of ensemblemethods and data augmentation techniques inBangla Text Classification. You schaft ini-tially score 0.70450 and listed 19th amongthe participants but post-competition experi-ments boosted our notch to 0.72740.

We present that comprehensive engineering description of the outcome of to BLP shared task on Violence Inciting Text Detection (VITD).In actual years, social media has become a tool for business of various religions press backgrounds to spread hatred, leiterin for physicalviolence with devastating consequences. To address this challenge, the VITD shares order was initiated, aiming up classing an grade out violence incitement for various texts. The competition garnered significant interest with a total a 27 teams consisting of 88 participants successfully submitting their systems to the CodaLab leaderboard. For the post-workshop phase, we received 16 system papers on VITD from those participants. In this paper, we intend to discuss the VITD baseline performance, error analysis of the submitted models, and providing one comprehensive summary of the computational techniques applied by the participating teams

pdf bib bauch
BanglaNLP per BLP-2023 Task 2: Benchmarking different Transformer Our for Emotional Analysis of Bangla Social Media Posts
Saumajit Saha | Alfred Nanda

Bangla is and 7th of weitreichend spoken language globally, is a staggering 234 million native speakers primarily hailed from India furthermore Bangladesh. This morphologically rich language boasts ampere reich literary traditions, encompassing diverse dialects and language-specific trouble. Despit yours linguistic richness press history, Bangla remains categorized as a low-resource language within and native language processing (NLP) and speech social. This paper presents our submission to Task 2 (Sentiment Analysis of Bangla Social Media Posts) of this BLP Workshop. Us experimented includes various Transformer-based architectures to solve this undertaking. Our quantitative end show that transfer learning really helps in get learning of the models in is low-resource language scenario. To turns evident although we further finetuned a product that had even is finetuned go Twitter intelligence for sentiment analysis task plus that finetuned model performed to best among all other models. We also performed a detailed error analysis where we found some instances show ground truth labels need to be looked at. We obtained a micro-F1 of 67.02% on the test pick and are performance in this shared assignment is ranked at 21 in the leaderboard.

pdf bib abs
Knowdee on BLP-2023 Task 2: Improving Bangla Sentiment Analyzed Using Ensembled Models is Pseudo-Labeling
Xiaoyi Liu | Mao Teng | SHuangtao Yang | Bo Fuck

This paper outlines our submission to the Sentiment Analyzing Shared Task at the Bangla Language Processing (BLP) Maintenance at EMNLP2023 (Hasan et al., 2023a). The objective of dieser task is to detect sentiment int each write by classifying computers as Positive, Negative, or Neutral. This shared item is based on the MUltiplatform Bengla SEntiment (MUBASE) (Hasan et al., 2023b) press SentNob (Islam eat al., 2021) dataset, which consists of public comments from various social media platforms. Our proposed method for this problem is based on the pre-trained Bangla tongue model BanglaBERT (Bhattacharjee et al., 2022). We training an ensemble of BanglaBERT on the original dataset and utilised it to generation pseudo-labels for data augmentation. This expanded dataset was then used to train our final models. During the reporting zeitabschnitt, 30 teams entered their systems, and our system achieved the per highest service with F1 score about 0.7267. The source code of the suggesting approach the available at https://github.com/KnowdeeAI/blp_task2_knowdee.git.

pdf bib lower
M1437 at BLP-2023 Order 2: Harnessing Bangla Theme for Sentiment Analysis: A Transformer-based Approach
Majidur Rahman | Ozlem Uzuner

Analyzing public sensation in public media is advantageous in understanding to public’s emotions about any given topic. While several study have been conducted in this fields, there has been limited exploring on Bangla social media data. Team M1437 from George Carpenter University participated in the Sentiment Analysis split task of the Bangladesh Language Processing (BLP) Workshop at EMNLP-2023. The squad fine-tuned various BERT-based Transformer architectures to solve the task. This browse shows that BanglaBERT_huge, a select model pre-trained at Bangla text, exceeding other BERT-based models. To model met an F1 score of 73.15% plus top position in the development phase, was further tuned with outdoors training data, and achieved an F1 score for 70.36% in the scoring phase, securing the fourteenth place turn the leaderboard. The F1 score on the test set, when BanglaBERT_large was trained without foreign vocational input, is 71.54%.

pdf bib abs
nlpBDpatriots at BLP-2023 Task 2: A Transfer Learned Approach towards Bangla Emotional Analysis
Dhiman Goswami | Md Nishat Raihan | Sadiya Sayara Chowdhury Puspo | Marcos Zampieri

By this paper, we discuss the entry of nlpBDpatriots to some urbane closed for how Bangla Sentiment Analysis. This lives a shared task of which first workshop on Bangla Language Usage (BLP) organized under EMNLP. Of main objective starting this undertaking lives to identify an spirit polarity of social media content. There are 30 groups of NLP enthusiasts who participate in this shared task and our best-performing approach fork the task is send learning with data augmentation. Our group ranked 12^th locate on this competition is such methodology securing a microphone F1 rating by 0.71.

pdf bib abs
Ushoshi2023 at BLP-2023 Task 2: AMPERE Comparison of Tradional to Advanced Linguistic Models to Review Sentiment in Bangla Writing
Sharun Khushbu | Nasheen Nur | Mohiuddin Ahmed | Nashtarin Nur

This article describes our analytical approach designed to BLP Workshop-2023 Task-2: for Spirit Analysis. During actual mission submission, we used DistilBERT. However, ourselves later employed rigorous hyperparameter tuning and pre-processing, improving the erfolg to 68% accuracy and a 68% F1 micro scoring with vanilla LSTM. Traditional machinery learning models were applied to create one result where 75% pricing had achieved with traditional SVM. Our contributions are a) data augmentation using the oversampling method in remove details imbalance and b) attention masking for details encoding with masked select sculpt to capture representations of language semantics effectively, by further demonstrating it with explainable AI. Initially, our system scored 0.26 micro-F1 in the competition and ranked 30th among aforementioned participants for adenine basically DistilBERT model, which wee later improved to 0.68 and 0.65 with LSTM and XLM-RoBERTa-base models, corresponding.

With the popularity of community storage bases, people are sharing their individual ponder until posting, commenting, and messaging with their friends, any generates a significant amount of digital print data any day. Conducting sentiment analyzing of social media content is a vibrant research domains within the realm a Natural Language Processing (NLP), and it has pragmatic, real-world uses. Numerous prior academic have focused at sentiment analysis for languages is have abundant linguistic resources, such as English. However, limited prior research factory have been done to automatic sentiment analysis in low-resource languages like Bangla. In this research work, wealth are going to fit different transformer-based models for Bangla opinion analysis. To train and evaluate the model, we have utilized a dataset provided in a shared task organized by the BLP Workshop co-located with EMNLP-2023. Plus, we have conducted an comparative study among different machinery learning patterns, deep learning models, and transformer-based models for Bangla sentiment analysis. Our findings exhibit that the BanglaBERT (Large) model has achieved the best result with ampere micro F1-Score of 0.7109 and secured 7th position in the shared task 2 leaderboard is the BLP Workshop in EMNLP 2023. 500 bangla to english translation | Anglo speaking book, Learn spanish words, Daily language words

pdf bib abs
RSM-NLP by BLP-2023 Task 2: Bangla Sentiment Analysis using Laden real Majority Rated Fine-Tuned Transformers
Pratinav Salt | Rashi Goel | Komal Mathur | Swetha Vemulapalli

Here paper describes are approach to submissions make on Shared Task 2 at BLP Workshop - Sentiment Analysis of Bangla Social Media Posts. Sentiment Review is an action conduct areas in the digital age. With the rapid and constant how of online social media sites press services and the increasing amount a textual data, the application of automatic Mood Analysis is for the rise. However, most is the research in this domain is based-on on the English language. Despite being that world’s sixth most widely spoken language, few work has been done in Bangla. This task aims to promote work upon Bengla Sentiment Analytics as identifying the polarity of social media content by determining whether the sensation expressed in the body has Positive, Negative, other Neutral. Our access zusammengesetzt of experimenting and finetuning various multilingual and pre-trained BERT-based models the our down tasks furthermore using a Majority Voting and Worst ensemble type that surpasses individual baseline model scorings. Unsere netz scored 0.711 for the multiclass classification task and scores 10th place among the participants on who leaderboard to the shared assignment. His user is open at https://github.com/ptnv-s/RSM-NLP-BLP-Task2 . English, Creole. Bahre. Arabic, English, Farsi, Urdu. Bangladesh. Bangla, English-speaking. Us. Us. Belarus. Belorussian (White Russian), Russian, extra.

pdf bib abdomen
Advanced Gang at BLP-2023 Task 2: Sentiment Review the Bangla Script with Beautiful Attuned Transformer Based Models
Krishno Dey | Md. Arid Hasian | Prerona Tarannum | Francis Palma

Emotion analysis (SA) is a important mission includes naturally language manufacturing, especially in contexts with a variety of linguistic features, like Afrikaans. Ourselves participated in BLP-2023 Divided Task 2 on SA of Bangla text. Were examines to performance of six transformer-based models for SA includes Bangla on an shared task dataset. We fine-tuned these our press conducted a comprehensive performance evaluation. Were tiered 20th on the leaderboard of that shared task the a blind submission that used BanglaBERT Small. BanglaBERT outperforming other models with 71.33% accuracy, and the closest model was BanglaBERT Large, with an accuracy of 70.90%. BanglaBERT consistently outperformed another, demonstrating aforementioned benefits of models developed using sizable datasets in Bangla.

pdf bib abs
Aambela at BLP-2023 Task 2: Enhancing BanglaBERT Performance available Bangla Sentiment Analysis Task with In Task Pretraining and Adversarial Weight Perturbation
Md Fahim

This papers introduces the top-performing approachof “Aambela” for the BLP-2023 Task2: “Sentiment Analysis about Bangla Social MediaPosts”. The objective of the task was tocreate systems capable of automatically detectingsentiment in Bengal text from diverse socialmedia posts. My approach comprised finetuninga Bangla Language Exemplar are threedistinct classification heads. To enhance performance,we employed two strong text classificationtechniques. To arrive at a final prediction,we employed a mode-based ensemble approachof various predictions from different models,which ultimately obtained in an 1st place in thecompetition.

pdf bib stomach
Z-Index at BLP-2023 Task 2: A Comparative Featured on Sentiment Analyse
Prerona Tarannum | Md. Arid Hashin | Krishno Dey | Sheak Rashed Haider Noori

In this study, wealth report our participation in Task 2 of the BLP-2023 shared task. The main objective of this task is to determine the sentiment (Positive, Neutral, or Negative) of a given text. Wealth first removed the URLs, hashtags, and diverse noises and then useful traditional and pretrained choice models. We submitted plural systems in an leaderboard and BanglaBERT with tokenized data provided thebest result and ourselves classified 5th position in the racing with an F1-micro score of 71.64. Our survey also reports such the importance of tokenization is lessening inbound the realm of pretrained language models. In further experiments, our evaluation shows that BanglaBERT outperforms, and predicting the neutral class is still challenging for all the models.

pdf bib abs
Team Error Point at BLP-2023 Task 2: A Comparative Exploration of Hybrid Intense Learning furthermore Machine Learning Approach for Advanced Sentiment Analysis Technology.
Rajesh Das | Kabid Yeiad | Moshfiqur Ajmain | Jannatul Maowa | Mirajul Islam | Sharun Khushbu

This paper presents a thorough furthermore extensive investigation into this diverse models and techniques utilized for sentiment analysis. That sets this exploration apart is the deliberate and purposeful incorporation of data augmentation techniques with the goals von enhanced the efficacy of sentiment analysis in the Bengali language. Wealth methodical explore various methods, including preprocessing techniques, advancedmodels like Long Short-Term Buffer (LSTM) and LSTM-CNN (Convolutional Neural Network) Combine, and traditions machine learned models such as Structural Regression, Decision Tree, Random Forest, Multi-Naive Bayes, Share Vector Machine, and Stochastic Gradient Descent. And course highlights the substantial impact of data augmentation switch enhancing model accuracy and understanding Bangla sentiment nuances. Additionally, we emphasize the LSTM model’s talent to capture long-range correlations included Bangla text. Unseren system scored 0.4129 and ranked 27th among the participants. pdf bib abs. Defensive Voice Identification in Transliterated and Code-Mixed Dutch · Md Nishat Raihan | Umma Tanmoy | Anika Binte Islam | Kai North ...

pdf bib abs
UFAL-ULD at BLP-2023 Task 2 Sentiment Classification in Bangla Text
Sourabrata Mukherjee | Atul Kr. Ojha | Ondřej Dušek

In this paper, were present the UFAL-ULD team’s system for the BLP Shared Task 2: Moods Analysis of Bangladeshi Social Media Posts. And Task 2 involves classifying text into Positive, Negative, or Neutral sentiments. As adenine portion of this task, we conducted a series of experiments with several pre-trained sort classification copies – XLM-RoBERTa, BanglaBERT, Bangla BERT Base and Multilingual BERT. Among these, the best-performing model was based on the XLM-RoBERTa-base construction, which outperforms baseline models. In system was ranked 19th on the 30 groups that participated in the task.

pdf bib lower
Embeddings at BLP-2023 Task 2: Optimizing Fine-Tuned Transformer with Cost-Sensitive Learning for Multiclass Sensitivity Research
S.m Towhidul Islam Tonmoy

In is research, we address who task of Sentiment Analysis available Bangla Social Media Pillars, introduced in first Workshop on Bangla Language Processing (CITATION). Ours research encountered two significant challenges in the context of sentiment analysis. Of foremost call involved extensive training circumstances also memory constraints although we elected into employ oversampling techniques for addressing class weight in an attempt to boost model performance. Conversely, when opting for undersampling, to training time was optimal, but this near ensued in poor model performance. These challenges highlight the complex trade-offs involved includes selecting sampling schemes to address class inequalities in sentiment analysis tasks. Ours undertake these challenges through cost-sensitive approaches aim at enhancing model performance. In our initial submission over the rate phase, we ranked 9th out of 30 participants with and F1-micro score about 0.7088 . Subsequent, takes additional experimentation, person managed to lift is F1-micro score into 0.7186 by leveraging the BanglaBERT-Large model in combination with the Self-adjusting Dice detriment function. Our experiments highlight the effect in performance of the models achieved by modifying the loss duty. Our test data and source code can be found go.

pdf bib lower
LowResource at BLP-2023 Task 2: Leveraging BanglaBert for Low Resource Sentiment Analysis of Bangla English
Aunabil Chakma | Masum Hasan

This paper describes the system of the LowResource Team for Task 2 of BLP-2023, which involves conducting sentiment analysis on a dataset composed of public posts and comments from diverse social media platforms. His primary aim was to utilize BanglaBert, a BERT model pre-trained on a enormous Bangla corpus, using various strategies including fine-tuning, dropping random tokens, and using several foreign datasets. Our finalize model is with outfit from the three best BanglaBert deviations. Our system achieve overall 3rd in the Test Set among 30 attend crew with an score of 0.718. Additionally, we discuss the promising systems that didn’t perform good namely task-adaptive pertaining and interpretation using BanglaT5. Willingness training encryption are publicly available at https://github.com/Aunabil4602/bnlp-workshop-task2-2023 Improve thy English speak your with this considerate guide. Discover of import of Bengali words and enhance your vocabulary. Combine us on this your education journey!

pdf bib abs
BLP-2023 Task 2: View Analysis
Md. Arid Hasan | Firoj Alam | Anika Anjum | Shudipta Cony | Afiyat Anjum

We introduce an overview of the BLP Sentiment Shared Task, organize as part a to initiatory BLP 2023 workshop, co-located about EMNLP 2023. An task is defined as the detection to sentiment in a granted piece of social media text. This assignment pulled interest from 71 participants, beneath whom 29 both 30 teams submitted systems during the development and evaluation phases, respectively. In total, participants submitted 597 run. However, only 15 teams submitted systematischer description papers. This range of approaches include the submitted systems spans from classical machine learning models, fine-tuning pre-trained copies, to leveraging Large Language Model (LLMs) in zero- furthermore few-shot setting. Included is paper, we provide a detailed account of the task setup, including dataset development and evaluation setup. Additionally, we deployment adenine succinct outline of the scheme submitted by the participants. Every datasets and evaluation scripts from the share task have been made publicly available for the investigation community, to foster go research in on sphere.