Published on

Bert modeli ilə Azərbaycan dilində tibbi sualların sinifləndirilməsi

4 min read

Authors
banner

Azerbaijani Medical Forum Question Classification

With the rapid increase of the internet, patients are increasingly use it for health information and support. However, given the large number of queries, and limited number of experts as well as not knowing which doctor to tell your complaint to, a significant fraction of the questions remains unanswered. Also, when patients apply online to the hospital, automatic direction to the appropriate doctor according to their disease is very important.

Automatic question classifiers can overcome this issue by directing questions to specific experts according to their topic preferences to get quick and better responses. In this project, I aim to classify Azerbaijani health forum questions with BERT multilingual base model (uncased). BERT is a transformers model pretrained on a large corpus of multilingual data in a self-supervised fashion.

For medical question classification, it requires high-quality datasets to train a deep-learning approach in a supervised way. Currently, there is no public dataset for Azerbaijani medical classification, and the datasets of other fields are not applicable to the medical QA system. To solve this problem, I scraped a m.tibb.az website using Python where 27k questions in 19 medical branch have been asked by users and answered by medical experts. I will also provide dataset which can be used in Azerbaijani medical QA and related fields.

How to use

Here is how to use this model.

Firstly, you need to build a dictionary with medical branch names and their numbers, because target is encoded and model output will be a number.

branch_dict = {0: 'Endoskopist', 1: 'Nevropatoloq',2: 'Dermato veneroloq',3: 'Qastroenteroloq',
 4: 'Psixoloq', 5: 'Pediatr', 6: 'Proktoloq', 7: 'Endokrinoloq',
 8: 'Psixoterapevt', 9: 'Allerqoloq', 10: 'Oftalmoloq', 11: 'Kardioloq', 12: 'Uroloq',
 13: 'Plastik cərrah', 14: 'Cərrah-proktoloq', 15: 'Ümumi cərrah',
 16: 'Hepatoloq', 17: 'LOR həkimi', 18: 'Ginekoloq'}

Secondly, we will use a simple Python function in order to convert model result to branch name.

def result_helper_funct(model_result):
    
    result = model_result[0][0]
    if result in branch_dict.keys(): 
        return branch_dict[result]

Then, we need to install simpletransformers library

!pip install simpletransformers

After succesfully installing, use pre-trained model.

from simpletransformers.classification import ClassificationModel
model = ClassificationModel("bert", "nijatzeynalov/azerbaijani-medical-question-classification", use_cuda=False)

At the next step, we just write down the text we want to classify and use our helper function.

sample_text = 'salam menim qulagimda agri var'
result = model.predict([sample_text])

result_helper_funct(result)

Code result:

'LOR həkimi'

Let's try another example.

sample_text = 'üzümdə səpgi var'
result = model.predict([sample_text])

result_helper_funct(result)

Code result:

'Allerqoloq'

Citation:

@misc {nijatzeynalov_2023,
	author       = { {NijatZeynalov} },
	title        = { azerbaijani-medical-question-classification (Revision ac4fa1e) },
	year         = 2023,
	url          = { https://huggingface.co/nijatzeynalov/azerbaijani-medical-question-classification },
	doi          = { 10.57967/hf/0290 },
	publisher    = { Hugging Face }
}
© 2023 Nijat Zeynalov