Projects

AzVoiceSent

The project aims to develop an automated system that can analyze voice recordings in Azerbaijani and classify the sentiment expressed by the speakers. By combining Automatic Speech Recognition (ASR) technology with advanced machine learning techniques, AzVoiceSent aims to provide valuable insights into sentiment analysis in Azerbaijani speech data.

Learn more →

New product demand forecasting

Forecasting sales is a challenging task when you are forecasting sales of a new product because you have no past performance on which to base your estimates. The proposed work is based on Ali and Nino multi-branch book store sales data. Dataset is contains 23.345 books with over 90k unique customers per month and more than 170 orders per day. Although sales dataset was generated artifically, the approach can be apply to real cases and even small improvement in forecast accuracy can lead to significant increase of company profit and customers can buy quicker and at lower price.

Learn more →

Azerbaijani Medical Forum Question Classification

Automatic question classifiers can overcome this issue by directing questions to specific experts according to their topic preferences to get quick and better responses. In this project, I aim to classify Azerbaijani health forum questions with BERT multilingual base model (uncased).

Learn more →

Scraping Azerbaijani real estate website and sending whatsapp message with AWS Lambda function

People who want to buy a house face enough difficulties. In this project, I perform web scrapping of "kub.az" real estate website. The reason that I have chosen this website is that, it brings house ads from all real estate websites of Azerbaijan. To put it briefly, I use AWS Lambda function which scrapes website in every 30 minutes and if a new house added, it creates a new csv file in S3 bucket and sends this file with a message to my whatsapp number during each run.

Learn more →

Satellite Images to real maps with Deep Learning (and reverse)

In this project, I developed a Pix2Pix generative adversarial network for image-to-image translation. I have used the so-called maps dataset used in the Pix2Pix paper. The Pix2Pix model is a type of conditional GAN, or cGAN, where the generation of the output image is conditional on an input, in this case, a source image. The image translation problem involves converting satellite photos to Google maps format, or the reverse, Google maps images to Satellite photos.

Learn more →

Identifiyng and solving Concept Drift detection with two approaches

Data can change over time. This can result in poor and degrading predictive performance in predictive models that assume a static relationship between input and output variables. This problem of the changing underlying relationships in the data is called concept drift in the field of machine learning. In the project, I have detected concept drift by using adversarial validation and Kolmogorov-Smirnov test which can also be used in the deployed system.

Learn more →

mT5-small based Azerbaijani News Summarization

In this model, Google Multilingual T5-small is fine-tuned on Azerbaijani News Summary Dataset for Summarization downstream task. The model is trained with 3 epochs, 64 batch size and 10e-4 learning rate. It took almost 12 hours on GPU instance with Ubuntu Server 20.04 LTS image in Microsoft Azure. The max news length is kept as 2048 and max summary length is determined as 128.

Learn more →

Norvig's Spell Checker Algorithm for Azerbaijani Language

The purpose of this project is to prepare a spell checker for Azerbaijani language by implementing a Azerbaijani corpus to Norvig’s algorithm. In general, Spell checking tools train through a corpus, train themselves on the correct spelling of words, and in the future, if the word is misspelled, take the correct word in the corpus as a reference. Choosing the right corpus is very important in spell checking, for this purpose I tried several corpuses in the Azerbaijani language available on the Internet, but most of the corpus itself contained such incorrect spelling words. I decided to create a new corpus based on several books written in Azerbaijani. Because, existing corpuses are crawled data and errors may exist. The corpus I created consists of 1478667 words collected from 47 books in 6 fields (biology, geography, detective, literature, encyclopedia, novel).

Learn more →

Generate Synthetic Images with DCGANs in Keras

In our GAN setup, we want to be able to sample from a complex, high-dimensional training distribution of the Fashion MNIST images. However, there is no direct way to sample from this distribution. The solution is to sample from a simpler distribution, such as Gaussian noise. We want the model to use the power of neural networks to learn a transformation from the simple distribution directly to the training distribution that we care about. The GAN consists of two adversarial players: a discriminator and a generator. We’re going to train the two players jointly in a minimax game theoretic formulation.

Learn more →

Easy Recipes bot

Just write down your ingredients and this bot instantly finds the matching recipes! This telegram bot recommends easy recipes in Azerbaijani using ingredients you already have in the kitchen. I have used text similarity method in order to determine how ‘close’ user message and our text. The big idea is that we represent documents as vectors of features, and compare documents by measuring the distance between these features. I have tried a few similarity methods, but after some experiments, I have chosen Cosine similarity method. As a dataset, I have fully scraped dadli.az website (2400 + recip.). You can find whole dataset in data folder.

Learn more →

Azerbaijani News Summary Dataset

I present az-news-summary, a comprehensive and diverse dataset comprising 143k (143,448) Azerbaijani news articles extracted using a set of carefully designed heuristics. The dataset covers common topics for news reports include war, government, politics, education, health, the environment, economy, business, fashion, entertainment, and sport, as well as quirky or unusual events. The dataset is prepared for Abstractive/Extractive summarization tasks. It can also be used in other scopes like Text Generation, Title Generation and etc.

Learn more →

Supermarket sales dataset

You have started working as a data analyst in a company with a large supermarket chain in Baku and you are participating in the first meeting. Your task is to perform the tasks mentioned to you in a meeting attended by several people from each department. They will say that they understand the tasks - in simple language, and you should come up with the most optimal result. In order to make the simulation of a real business meeting more believable, all the tasks are designed in the form of dialogue and have no relation to reality. The dataset contains data on 438,826 Azerbaijani products purchased by 80,000 customers in 20 branches of the supermarket in 2019.

Learn more →

Azerbaijani Fake News Generator

A language model can predict the probability of the next word in the sequence, based on the words already observed in the sequence. Neural network models are a preferred method for developing statistical language models because they can use a distributed representation where different words with similar meanings have similar representation and because they can use a large context of recently observed words when making predictions.

Learn more →

Developing a Neural Machine Translation Model (Azerbaijani - English)

In this project, I have discovered how to develop a neural machine translation system for translating Azerbaijani phrases to English. I use a dataset of Azerbaijani to English terms used as the basis for flashcards for language learning.

Learn more →