High throughput proteome sequencing projects continue to churn out enormous amounts of raw sequence data. However, most of this raw sequence data is unannotated and hence, not very useful. Among the various approaches to decipher the function of a protein, one is to determine its localization Experimental approaches for proteome annotation including determination of a protein's subcellular localization are very costly and labor intensive. Besides the available experimental methods, in silico methods present alternative approaches to accomplish this task. Here, we present two machine learning approaches for prediction of the subcellular localization of a protein from the primary sequence information.
Two machine learning algorithms, k Nearest Neighbor (k-NN) and Probabilistic Neural Network (PNN), were used to classify an unknown protein into one of the 11 subcellular localizations.  The final prediction is made on the basis of a consensus of the predictions made by two algorithms and a probability is assigned to it.