AutoDecoder has been evaluated on both real gene expression datasets and simulated datasets. We are providing the datasets employed in our paper.

Real gene expression datasets

Four real datasets were tested in our paper, namely Breast Cancer, Multiple Tissue, DLBCL, and Lung Cancer. The original datasets can be downloaded here The datasets were preprocessed in a manner as stated in our paper.

Breast Cancer 1213 genes, 97 samples
Multiple Tissue 5565 genes, 102 samples
DLBCL 3795 genes, 58 samples
Lung Cancer 12625 genes, 56 samples

Synthetic datasets

Users can easily synthesize the datasets according to the simulation process detailed in our paper: Given the matrix size 100x500 and the number of biclusters K, the number of rows r in a bicluster is randomly selected from the interval [10,30], and the number of columns c is randomly selected from [50,100]; then we randomly choose r rows from the total 100 rows and c columns from the total 500 columns as the members of the bicluster. The matrix is initially filled with 0. Each bicluster is filled with 1. In total, K biclusters are generated. We then inject noise to each matrix by fipping the value of elements. Specifcally, we flip the 1's inside biclusters to be zeros with probability p and flip the 0's outside biclusters to be 1 or -1 respectively with probability p/2. The flipping probability p is named noise level.