Biclustering is crucial in finding co-expressed genes and their associated conditions in gene expression data. While various biclustering algorithms (e.g., combinatorial, probabilistic modelling, and matrix factorization) have been proposed and constantly improved in the past decade, data noise and bicluster overlaps make biclustering a still challenging task. It becomes difficult to further improve biclustering performance, without resorting to a new approach.

Inspired by the recent progress in unsupervised feature learning using deep neural networks, in this work, we propose a novel model for biclustering, named AutoDecoder (AD), by relating biclusters to features and leveraging a neural network that is able to automatically learn features from the input data. To suppress severe noise present in gene expression data, we introduce a non-uniform signal recovery mechanism: Instead of reconstructing the whole input data to capture the bicluster patterns, AD weighs the zero and non-zero parts of the input data differently and is more flexible in dealing with different types of noise. AD is also properly regularized to deal with bicluster overlaps. To the best of our knowledge, this is the first biclustering algorithm that leverages neural network techniques to recover overlapped biclusters hidden in noisy gene expression data. We compared our approach with four state-of-the-art biclustering algorithms on both synthetic and real datasets. On three out of the four real datasets, AD significantly outperforms the other approaches. On controlled synthetic datasets, AD performs the best when noise level is beyond 15%.

Model Overview

We relate the biclustering problem to feature learning where the expression pattern along the rows (conditions) in a bicluster can be regarded as a feature. Figure 1 shows the graphical representation of our proposed model, AutoDecoder. Given a data matrix, we input each column of the matrix one by one to the input layer. Each element in the column corresponds to one neuron in the input layer. Weights W connecting the input layer with a hidden neuron will extract one feature from the input column. Combination of the weights and the input neurons will try to activate the hidden neurons. Each element in the weight matrix W represents the contribution of one row in activating a hidden neuron, and determines whether a row is a member in a hidden neuron. Given a data column, after combining it with weights W, we can get the activation values of the hidden neurons for this column. For each input column, based on the activation values of hidden neurons, we can determine the membership of the column to the hidden neurons. After the membership of rows and columns to hidden neurons are determined, we use the row and column members in the same hidden neuron to compose a bicluster. In our model, we further proposed two strategies to enhance the robustness against noise and overlaps.

Figure 1. Graphical Representation of AutoDecoder

For more details about the model and the performance issues, please refer to the documents.