|
High throughput proteome sequencing projects continue to churn out enormous amounts
of raw sequence data. However, most of this raw sequence data is unannotated and
hence, not very useful. Among the various approaches to decipher the function of
a protein, one is to determine its localization Experimental approaches for proteome
annotation including determination of a protein's subcellular localization are very
costly and labor intensive. Besides the available experimental methods,
in silico methods present alternative approaches to accomplish this task.
Here, we present two machine learning approaches for prediction of the subcellular
localization of a protein from the primary sequence information. Two machine learning algorithms, k Nearest Neighbor (k-NN) and Probabilistic Neural Network (PNN), were used to classify an unknown protein into one of the 11 subcellular localizations. The final prediction is made on the basis of a consensus of the predictions made by two algorithms and a probability is assigned to it. |