Efficient generation of conformational ensembles of disordered proteins using residue-local probabilities
The prevalence of proteins with intrinsically disordered regions (IDRs) in eukaryotic genomes [1] and the increasing number of evidences of their important role in different biological processes [2¿4], despite they lack a folded three-dimensional structure, explain why their study is nowadays an active research field [1,2,5¿8]. IDRs dinamically explore their huge conformational space plenty of local minima separated by small free energy barriers that can be overpassed under physiological conditions. The image of a unique native structure must be then replaced in IDRs by a conformational ensemble.
Our research group has recently shown [13] that the probability of an IDR molecular conformation can be described properly as the product of conformational probabilities of each residue, conditioned to the identity of the residue neighbours. This allows us to characterize the conformational space of IDRs using a reduced number of probabilities. As a result, we can obtain the conformational probabilities of the central residue of every triad in the IDR molecule from independent MD simulations of the corresponding tripeptides.
In this project we propose to elaborate and test an open-source code to build conformational ensembles of IDRs using a strategy largely inspired on the flexible-meccano [14¿17] and the hierarchical chain growth [18¿20] approaches but using as source of the statistical distributions the results from MD simulations of the tripeptides. We refer to this methodology as probabilistic MD chain growth (PMD-CG).
The great potential of the PMD-CG methodology proposed comes from its computational performance since the computational effort of the PMD-CG method grows up linearly with the number of residues in the molecule, while the computational effort required in MD-based simulations grows up much faster [21]. This is particularly important under the scenario of mutagenesis studies because a single point mutation of the protein implies in the MD-based simulations the simulation re-run with the same high computational times for each mutation. However, a single-point mutation involves in the PMD-CG method requires the re-run of only three new tripeptides.
The reliability and computational efficiency of the PMD-CG method compared to different standard MD-based methods used to build conformational ensembles will be tested using as quality criterion their ability to converge NMR [9,10] (Chemical shifts (CSs), scalar couplings (SCs), and residual dipolar couplings (RDCs)) and SAXS [22] experimental data which reflect local (like NMR J-couplings) or global (like SAXS and NMR RDCs) properties. For this quantities many data are available in specialized databases. A test on directly available data, covering many more critical intermediate quantities like average accessibility to a paramagnetic, or like surface electrostatic potential, as measured by charged paramagnetic probes, strongly corroborate the methods or suggest possible causes of failure and clues for improvement.
During the last year of the project, the applicability of the developed methodologies will be tested in the field of the molecular biophysics and biomedicine by carrying out an in silico mutation scan of an IDR with medical interest. The most promising mutant candidates will be analyzed experimentally by means of standard and high resolution NMR techniques.