Academic / Faculties / Faculty
Professor
ID: 140352
Professor
ID: 140353
Professor
ID: 140354
Professor
ID: 140355
Professor
ID: 140268
Professor and Chairman
ID: 140269
Professor
ID: 140357
Professor
ID: 140271
Professor
ID: 140274
Associate Professor
ID: 140356
Associate Professor
ID: 140270
Associate Professor
ID: 140272
Associate Professor
ID: 140273
Associate Professor
ID: 140278
Assistant Professor
ID: 140276
Associate Professor
ID: 140275
Assistant Professor
ID: 140277
Professor Dr. Md Aktaruzzaman was the head of the Biomedical Signal Processing & Machine Learning (BiSPMaL) Lab and the Chairman of the Dept. of Computer Science and Engineering. Currently, he is on study leave for pursuing his postdoctoral fellowship at the KCVRC Indiana University School of Medicine, Indiana since Sep 2022.
Research interest:Biomedical signal processing, Machine Learning, Sample Entropy, Health Monitoring from Signals Acquired through Wearable Sensors, OCR, Attention Deficit Behavior Disorder Syndromes Analysis
ResearchGate:0
Google Scholar:https://scholar.google.com/citations?user=dp4InSQAAAAJ&hl=en
I am Dr. Md. Robiul Hoque, faculty member of Computer Science and Engineering, Islamic University, Kushtia.
Research interest:Context-Aware System, Smart Space, Sensor Network, Image and Speech Processing
ResearchGate:0
Google Scholar:https://scholar.google.com/citations?hl=en&user=GGniDIUAAAAJ
0
Google Scholar:Doctor of Philosophy in Computer Applied Technology, University of Chinese Academy of Sciences, China. Dissertation title: “RGB-D Object Detection and Recognition Based on Deep Learning Technique”. Advisor: Professor Ke Lu, School of Engineering Sciences, University of Chinese Academy of Sciences, China.
Research interest:Computer Vision, Deep Learning, Artificial Intelligence, Pattern Recognition, Machine Learning, Wireless Network
ResearchGate:0
Google Scholar:https://scholar.google.com/citations?user=by41XR8AAAAJ&hl=en
Dr. Md Shohidul Islam, a distinguished academic from Islamic University, Bangladesh, holds a Doctor of Engineering from the University of Science and Technology of China, specializing in Information and Communication Engineering. His doctoral research titled “Robust Supervised Single Channel Speech Enhancement in the Wavelet Domain” reflects his expertise in speech signal processing. Prior to this, he completed an M.Sc. in Computer Science and Engineering and a B.Sc. in Computer Science and Engineering from Islamic University, Bangladesh. Dr. Islam’s research interests span several domains, including speech-controlled technologies for cardiovascular health engineering, non-invasive continuous blood pressure estimation using physiological signals, and various aspects of speech processing such as enhancement, denoising, and separation. He is also involved in audio watermarking, image processing, video image processing, and medical image processing. Dr. Islam has made significant contributions to science through his publications in peer-reviewed journals and presentations at international conferences. His work on speech enhancement and physiological signals processing has advanced the understanding and application of technology in healthcare and engineering sectors globally.
Research interest:Physiological Signal Processing like PPG, ABP, Pusle, ECG, EMG and MMG, Speech Enhancement Speech Denoising Speech Dereverberation Blind Source Separation Speech Signal Processing Audio Watermarking Image Processing
ResearchGate:0
Google Scholar:https://scholar.google.com/citations?user=9NlKAOIAAAAJ&hl=en
The process of separating individual sound sources from mono audio is a complex yet essential endeavor in audio signal processing and analysis. This article presents an algorithm tailored for bidirectional transformations aimed at effectively isolating speech from single-channel audio. Leveraging the dual-tree complex wavelet transform (DTCWT) on time-domain signals circumvents limitations inherent in the discrete wavelet transform (DWT), such as its incapacity to manage substantial shifts and inability to discern the correct direction. In this process, a series of subband signals is generated and subjected to the short-time Fourier transform (STFT) to create a complex spectrogram, which is then transformed into its absolute value and input into the Bi-directional Long Short-Term Memory (Bi-LSTM) network with a specified number of layers and units. This network utilizes the bidirectional capabilities of LSTM units to understand both the preceding and subsequent contexts of the input data, enabling the identification of specific speech components, aided by the ideal soft mask components that serve as corresponding labels. The final predicted signal is obtained by element-wise multiplication of the complex spectrogram by the estimated mask produced by the model. Subsequently, the inverse STFT is applied with parameters consistent with the initial transform, followed by the inverse DTCWT on the refined source elements with the same decomposition levels and wavelet filters. The improved efficacy of the proposed method for source separation quality was validated through experimental assessments conducted on the GRID audio–visual and TIMIT databases, considering metrics such as SDR, SIR, SAR, SNR, PESQ, and STOI.
2024-08-10 Click HereSeparating speech is a challenging area of research, especially when trying to separate the desired source from its combination. Deep learning has arisen as a promising solution, surpassing traditional methods. While prior research has mainly focused on the magnitude, log-magnitude, or a combination of the magnitude and phase portions, a new approach using the Short-time Fourier Transform (STFT), and a deep Convolutional Neural Network named U-NET has been proposed. This method, unlike others, considers both the real and imaginary components for decomposition. During the training stage, the mixed time-domain signal undergoes a transformation into a frequency-domain signal by using STFT, producing a mixed complex spectrogram. The spectrogram’s real and imaginary parts are then divided and combined into a single matrix. The newly formed matrix is fed through U-NET to extract the source components. The same process is repeated at testing. The resulting concatenated matrix for the mixed test signal is passed through the saved model to generate two enhanced concatenated matrices for each source. These matrices are then transformed back into time-domain signals using inverse STFT by extracting the magnitude and phase. The proposed approach has been evaluated using the GRID audio visual corpuses, with results showing improved quality and intelligibility compared to the existing methods, as demonstrated by objective measurement metrics.
2024-02-26 Click HereIn modern manufacturing industry, Automatic defect detection is becoming an attractive alternative to Human Inspection. Automatic defect detection on object surfaces is a compelling process. For accurate automated inspection and classification, computer vision image processing system has been widely used in manufacturing industries. In this article, we proposed histogram based automatic defect detection that process three objects at a time. In the first step we collect image from the camera, perform preprocessing, segmentation then we used histogram and Spearman’s correlation coefficient to find the defect or non-defect objects. The experimental analysis was evaluated on 300 images including defective and non-defective objects.
2024-04-27 Click HereBackground: Lung cancer is one of the most fatal cancers worldwide, and malignant tumors are characterized by the growth of abnormal cells in the tissues of lungs. Usually, symptoms of lung cancer do not appear until it is already at an advanced stage. The proper segmentation of cancerous lesions in CT images is the primary method of detection towards achieving a completely automated diagnostic system. Method: In this work, we developed an improved hybrid neural network via the fusion of two architectures, MobileNetV2 and UNET, for the semantic segmentation of malignant lung tumors from CT images. The transfer learning technique was employed and the pre-trained MobileNetV2 was utilized as an encoder of a conventional UNET model for feature extraction. The proposed network is an efficient segmentation approach that performs lightweight filtering to reduce computation and pointwise convolution for building more features. Skip connections were established with the Relu activation function for improving model convergence to connect the encoder layers of MobileNetv2 to decoder layers in UNET that allow the concatenation of feature maps with different resolutions from the encoder to decoder. Furthermore, the model was trained and fine-tuned on the training dataset acquired from the Medical Segmentation Decathlon (MSD) 2018 Challenge. Results: The proposed network was tested and evaluated on 25% of the dataset obtained from the MSD, and it achieved a dice score of 0.8793, recall of 0.8602 and precision of 0.93. It is pertinent to mention that our technique outperforms the current available networks, which have several phases of training and testing.
2023-08-17 Click HereThis paper proposes an innovative single-channel supervised speech enhancement (SE) method based on UNET, a convolutional neural network (CNN) architecture that expands on a few changes in the basic CNN architecture. In the training phase, short-time Fourier transform (STFT) is exploited on the noisy time domain signal to build a noisy time-frequency domain signal which is called a complex noisy matrix. We take the real and imaginary parts of the complex noisy matrix and concatenate both of them to form the noisy concatenated matrix. We apply UNET to the noisy concatenated matrix for extracting speech components and train the CNN model. In the testing phase, the same procedure is applied to the noisy time-domain signal as in the training phase in order to construct another noisy concatenated matrix that can be tested using a pre-trained or saved model in order to construct an enhanced concatenated matrix. Finally, from the enhanced concatenated matrix, we separate both the imaginary and real parts to form an enhanced complex matrix. Magnitude and phase are then extracted from the newly created enhanced complex matrix. By using that magnitude and phase, the inverse STFT (ISTFT) can generate the enhanced speech signal. Utilizing the IEEE databases and various types of noise, including stationary and non-stationary noise, the proposed method is evaluated. Comparing the exploratory results of the proposed algorithm to the other five methods of STFT, sparse non-negative matrix factorization (SNMF), dual-tree complex wavelet transform (DTCWT)-SNMF, DTCWT-STFT-SNMF, STFT-convolutional denoising auto encoder (CDAE) and casual multi-head attention mechanism (CMAM) for speech enhancement, we determine that the proposed algorithm generally improves speech quality and intelligibility at all considered signal-to-noise ratios (SNRs). The suggested approach performs better than the other five competing algorithms in every evaluation metric.
2023-04-20 Click HereIn this article, we propose a new source separation method in which the dual-tree complex wavelet transform (DTCWT) and short-time Fourier transform (STFT) algorithms are used sequentially as dual transforms and sparse nonnegative matrix factorization (SNMF) is used to factorize the magnitude spectrum. STFT-based source separation faces issues related to time and frequency resolution because it cannot exactly determine which frequencies exist at what time. Discrete wavelet transform (DWT)-based source separation faces a time-variation-related problem (i.e., a small shift in the time-domain signal causes significant variation in the energy of the wavelet coefficients). To address these issues, we utilize the DTCWT, which comprises two-level trees with different sets of filters and provides additional information for analysis and approximate shift invariance; these properties enable the perfect reconstruction of the time-domain signal. Thus, the time-domain signal is transformed into a set of subband signals in which low- and high-frequency components are isolated. Next, each subband is passed through the STFT and a complex spectrogram is constructed. Then, SNMF is applied to decompose the magnitude part into a weighted linear combination of the trained basis vectors for both sources. Finally, the estimated signals can be obtained through a subband binary ratio mask by applying the inverse STFT (ISTFT) and the inverse DTCWT (IDTCWT). The proposed method is examined on speech separation tasks utilizing the GRID audiovisual and TIMIT corpora. The experimental findings indicate that the proposed approach outperforms the existing methods.
2020-10-23 Click HereIn this paper, we propose a novel single-channel speech enhancement algorithm that applies dual-domain transforms comprising of dual-tree complex wavelet transform (DTCWT) and short-time Fourier transform (STFT) with a sparse non-negative matrix factorization (SNMF). The first domain belongs to the DTCWT, which is utilized on the time domain signals to conquer the weakness of signal distortions brought about by the downsampling of the discrete wavelet packet transform (DWPT) and delivered a set of subband signals. The second domain alludes to the STFT, which is exploited to each subband signal and built a complex spectrogram. At last, we apply the SNMF to the magnitude spectrogram for extracting speech components. In short, the DTCWT decomposes the time-domain noisy signal into a set of subband signals and afterward applied STFT to each subband signal, and we get nonnegative matrices by taking the absolute value of the complex matrix. From this point forward, we apply SNMF to each nonnegative matrix and identify the speech components. Finally, the estimated signal can be achieved through a subband binary ratio mask (SBRM) by applying the inverse STFT (ISTFT) and, subsequently, the inverse DTCWT (IDTCWT). The proposed approach is assessed utilizing the GRID audio-visual and IEEE databases, and diverse kinds of noises such as stationary, non-stationary, and quasi-stationary. The exploratory outcomes demonstrate that the proposed algorithm improved objective speech quality and intelligibility altogether at all considered signal to noise ratios (SNRs), compared to the other seven speech enhancement methods of STFT-SNMF, STFT-SNMFSE, MLD-STFT-SNMF, STFT-GDL, STFT-CJSR, DTCWT-SNMF, and DWPT-STFT-SNMF.
2020-05-01 Click HereSingle-channel speech dereverberation and separation has been a challenging problem and is of high significance in speech processing applications. Many researchers have tried to address this problem separately, either investigating speech dereverberation or separation. Most of the available literature on joint dereverberation and separation research is for multichannel. Moreover, in joint dereverberation and separation research, mostly noise is considered as the other source. Therefore, there is an emergent need to provide an optimal and efficient solution to the single channel speech dereverberation and separation problem. In this paper, we work on speech dereverberation and separation for single-channel, and we do not consider any noise source except the reverberation effects. We combine the two dominant methods, namely robust principal component analysis (RPCA) and sparse non-negative matrix factorization (SNMF), for the dereverberation and separation of the underlying speeches from the speech mixture. Firstly, we use the RPCA algorithm to dereverberate the reverberant mixture, and then we use the SNMF technique to separate the speeches from the speech mixture. We consider unseen RIRs conditions and compare the results with the baseline. The experimental results show that the proposed algorithm improves the speech quality both in evaluation parameters as well as listening.
2020-10-01 Click HereIn this paper, we propose a novel single channel speech enhancement approach that takes up the Stationary Wavelet Transform (SWT) and Nonnegative Matrix Factorization (NMF) with Concatenated Framing Process (CFP) and proposes Subband Smooth Ratio Mask (ssRM). Due to downsampling process after filtering, Discrete Wavelet Packet Transform (DWPT) suffers the absence of shift-invariance, and for this reason, some errors occur in the signal reconstruction and to mitigate the problem, firstly we use SWT and NMF with KL cost function. Secondly, we exploit the CFP to build each column of the matrix instead of using NMF directly to take advantage of smooth decomposition. Thirdly, we apply the Auto-Regressive Moving Average (ARMA) filtering process to the newly formed matrices for making the speech more stable and standardized. Finally, we propose an ssRM by combing the Standard Ratio Mask (sRM) and Square Root Ratio Mask (srRM) with Normalized Cross-Correlation Coefficients (NCCC) to take the advantages of them (sRM, srRM and NCCC). In short, the SWT divides the time-domain mixing speech signal into a set of subband signals and then framing and taking the absolute value of each subband signal, and we obtain nonnegative matrices. Then, we form the new matrices by applying the CFP where each column of the formed matrix contains five consequent frames of the nonnegative matrix and performs an ARMA filtering operation. After that, we apply NMF to each newly formed matrix and detect the speech components via proposed ssRM. Finally, the estimated signal can be achieved through them by applying inverse SWT. Our approach is evaluated using IEEE corpus and different types of noises. Objective speech quality and intelligibility improve significantly by applying this approach and outperforms related methods such as conventional STFT-NMF and DWPT-NMF.
2019-09-13 Click HereIn this paper, a novel array structure exploiting coprime arrays is proposed which can be very proficient to determine the number of consecutive lags in proportion with the number of array elements. The proposed method comprises novel array structure by configuring three subarrays positioned in alignment with some prescribed values. By increasing array elements in third subarray while keeping other subarrays fixed, explicit number of consecutive lags could be obtained proportionately. The proposed method offers maximization of consecutive lags in remarkable number by calculating the fourth order difference co-array unifying interpolation. The forth order difference co-array is achieved by exploiting the second order difference co-array twice. The consideration of third subarray in addition with two coprime subarrays leads to a novel array structure which can significantly enhance degrees of freedom. An effective interpolation technique nuclear norm minimization is considered to fill the holes subsisting in the virtual co-array in order to exploit full virtual co-array length. This interpolation method uses convex framework which is trackable and very simple to implement yielding a freedom of fixing any predefined extra tuning parameter. The proposed method in this paper is named as VEFODCI which stands for virtual extension of coprime arrays by exploiting fourth order difference co-array with interpolation. The sparse Bayesian learning is used for direction of arrival (DOA) estimation exploiting the proposed novel array structure by imposing interpolation to fill the holes. The array geometry of sensor distributions proves that the proposed method is less susceptible from mutual coupling effect. The simulation results stipulate that the proposed method is performing DOA estimation accurately even in lower angular separation of sources by achieving larger number of consecutive lags than the state of the art.
2018-08-14 Click HereTransient interferences such as keystrokes, mouse clicks and hammering pose a significant challenge in the single channel speech enhancement due to their abrupt and non-continuous nature. Traditional noise suppression algorithms and even many non-stationary noise reduction algorithms do not adequately suppress transient interference. Therefore, in this work, we propose a semi-supervised single channel transient noise suppression method to effectively suppress the transient interference without significant audible distortion. The proposed algorithm consists of training and testing stages. In the training stage, the proposed technique first uses the optimally modified-log spectral amplitude (OMLSA) estimator to estimate the transient noise from the noisy speech signal. After that, we eliminate the residual speech components from the estimated noise obtained from OMLSA based on the correlation coefficient, by taking correlation between the estimated noise with the available clean speech data from the dataset passed through the voice activity detector for silence zones removal. Afterwards, we use this noise for training the noise dictionary in sparse non-negative matrix factorization. Clean speech data is used for speech dictionary training. In the enhancement stage, the dictionaries are fixed and concatenated, to obtain the corresponding activation matrices. The clean speech dictionary and the corresponding weight matrix are used to reconstruct the estimated speech. The experimental results reveal that the proposed algorithm provided better performance compared to other existing algorithms in the speech quality evaluation metrics.
2020-12-15 Click HereWireless sensor nodes have deployed with limited energy sources. The lifetime of a node usually depends on its energy source. The main challenging design issue of the wireless sensor network is to prolong the network lifetime and prevent connectivity degradation by developing an energy-efficient routing protocol. Many research works are done to extend the network lifetime, but still, it is a problem because of the impossibility of recharging. In this paper, we present a hierarchical clustering technique for wireless sensor network called Clustering with Residual Energy and Neighbors (CREN). It is based on two basic parameters, e.g., number of neighbors of a node and its residual energy. We use these properties as a weighted factor to elect a node as a cluster head. A well-known method, LEACH had a high performance in energy saving and the quality of services in the wireless sensor network. Like Low-Energy Adaptive Clustering Hierarchy (LEACH), CREN rotates the cluster head among the sensor nodes to balance the energy consumption. The simulation result shows the proposed technique achieves much higher performance and energy efficiency than LEACH.
2020-03-03 Click HereSingle channel speech separation (SS) is highly significant in many real-world speech processing applications such as hearing aids, automatic speech recognition, control humanoid robots, and cocktail-party issues. The performance of the SS is crucial for these applications, but better accuracy has yet to be developed. Some researchers have tried to separate speech using only the magnitude part, and some are tried to solve complex domains. We propose a dual transform SS method that serially uses the dual-tree complex wavelet transform (DTCWT) and short-term Fourier transform (STFT), and jointly learns the magnitude, real and imaginary parts of the signal applying a generative joint dictionary learning (GJDL). At first, the time-domain speech signal is decomposed by DTCWT, which produces a set of subband signals. Then STFT is connected to each subband signal, which converts each subband signal to the time-frequency domain and builds a complex spectrogram that prepares three parts like real, imaginary and magnitude for each subband signal. Next, we utilize the GJDL approach for making the joint dictionaries, and then the batch least angle regression with a coherence criterion (LARC) algorithm is used for sparse coding. Afterward, computes the initially estimated signals in two different ways, one by considering only the magnitude part and another by considering real and imaginary components. Finally, we apply the Gini index (GI) to the initially estimated signals to achieve better accuracy. The proposed algorithm demonstrates the best performance in all considered evaluation metrics compared to the mentioned algorithms.
2022-04-02 Click HereAlzheimer's disease (AD) is one of the most disabling and burdensome health conditions worldwide and a leading neurodegenerative disease that results in severe dementia. Parkinson's disease (PD) is also a neurodegenerative disease and literature suggests pathogenic links between AD and PD but the molecular mechanisms that underlie this association between AD and PD are not well understood and/or have a limited understanding of the key molecular mechanisms that provoke neurodegeneration. To address this problem, we aimed to identify common molecular biomarkers and pathways in PD and AD that are involved in the progression of these diseases and deliver clues to important pathological mechanisms. We have analyzed the microarray gene expression transcriptomic datasets from control, AD and PD affected individuals. To obtain robust results, we have used combinatorial statistical methods to analyze the datasets. Based on standard statistical criteria, we have identified 111 up-regulated genes overlapping between AD and PD and at the same time we have identified 20 down-regulated overlapping genes between AD and PD. Pathway and Gene Ontology (GO) analyses pointed out that these 111 up-regulated and 20 down-regulated common genes identified several altered molecular pathways and ontological pathways. Further protein-protein interactions (PPI) analysis revealed pathway hub proteins: EGFR; JAK2; MAPK11; EIF3B; WASL; BCL2L1; CDH1; MCM5; RAN; NCOA3; TBL1X; RARA; ARHGEF12; NCOA2 and ESR2.
Transcriptional components were then identified, and significant transcription factors (FOXC1; GATA2; YY1; TFAP2A; E2F1; FOXL1; NFIC; NFKB1; TP53; USF2 and
CREB1 were identified. We have performed protein-drug interaction analysis to reveal drug interaction with proteins. Thus, we identified novel putative links between pathological processes between AD and PD, and possible gene and mechanistic expression links between them.
2020-06-17 Click HereBlood pressure (BP), a vital sign in cardiovascular disease (CVD) monitoring, is traditionally measured invasively in critically ill patients or using cuff-based devices. Continuous monitoring of cardiovascular signs is crucial for early intervention and diagnosis, facilitated by miniaturized wearable technologies to mitigate further complications. In this study, we propose a modified UNET architecture with attention-based deep learning (DL) for continuous non-invasive BP estimation using only photoplethysmography (PPG) signals. The model incorporates a 1D convolutional neural network to handle sequential 1D data, enabling the capture and reconstruction of essential features for robust estimation. Attention blocks are employed to selectively collect information at different network stages, combining skip connections from the encoding and decoding paths to focus on significant features while filtering extraneous information. Additionally, the model operates in an autoencoder mode, facilitating feature extraction from intermediate layers and learning compact and meaningful representations of input data. Rigorous evaluation demonstrates that the proposed model achieves mean absolute errors (MAE) of 4.661 mmHg and 2.574 mmHg for systolic BP and diastolic BP, respectively. These results are comparable to existing methods and satisfy international standards, such as the BHS and AAMI guidelines for non-invasive BP monitoring in wearable technology. This research contributes to the advancement of accurate and non-invasive BP estimation, enabling early detection and intervention for improved cardiovascular health monitoring.
2024-03-02 Click HereConventional deep learning architectures do not adequately address the requirements of wearable high-precision medical devices such as blood pressure (BP) monitors. This paper presents a novel hybrid deep learning architecture that leverages advancements in sensors and signal processing modules for cuffless and continuous BP monitoring devices, emphasizing enhanced precision in an energy constrained system. The proposed architecture comprises a combination of a convolutional neural network and a bidirectional gated recurrent unit. The proposed model adopts a data-driven end-to-end approach to directly process raw photoplethysmography (PPG) signals, enabling simultaneous estimation of systolic BP and diastolic BP without the need for feature extraction. Performance evaluation was conducted using the Multiparameter Intelligent Monitoring in Intensive Care II dataset, yielding small mean errors of 0.664 mmHg and −0.028 mmHg for the estimated and reference SBP and DBP, respectively.
2024-03-02 Click HereDoctor of Philosophy in Pattern Recognition and Intelligent Systems from Institute of Automation, Chinese Academy of Sciences under University of Chinese Academy of Sciences, Beijing, China. Dissertation title: “Computational Bioinformatics and Machine Learning Models to identify Diseasome and Neurological Disease Comorbidities”. Advisor: Professor Silong Peng, Institute of Automation, Chinese Academy of Sciences, Beijing, China.
Research interest:Bioinformatics, Health Informatics, Machine Learning, Deep Learning,Pattern Recognition and Data Science.
ResearchGate:https://www.researchgate.net/profile/Md_Habibur_Rahman4
Google Scholar:https://scholar.google.com/citations?hl=en&user=DzAiISMAAAAJ
1
2021-12-04 Click HereSCI, IF: 6.69
2023-02-16 Click Here0
Google Scholar:It's test
2019-04-14 Click Here