This page lists all the datasets that are currently available in DV-IMPACT with short description of each dataset, the description of the processing and formatting steps, the compliance with DV-IMPACT standard data model and the publish article corresponds to the dataset.
Phosphokinase-binding Domain Dataset
This data set describe assessment of cancer mutations impact on PPI mediated through the phosphokinase-binding domains. - Brief description of the data processing steps: The published data in the supplementary martials and the website associated with the study contains most of the data items required by DV-IMPACT standard data model. We used these data directly or after few processing to deposite the data into DV-IMPACT . However, we requested the full sequences of the proteins used in the analysis from the authors as they used one protein per gene. Furthermore, some information such as the protein names and Ensembl IDs were obtained from online resources.
Publication: Wagih O. et al, Nature Methods, 2015. doi:10.1038/nmeth.3396
SH2 Domain Dataset
This data set describe assessment of cancer mutations impact on PPI mediated through the SH2 domains. - Brief description of the data processing steps: To utilize this dataset, we mostly used data requested from authors. The method used to generate the original data provides single value that indicates the assessment result of the mutation impact. Therefore, the authors, kindly, offered to re-do the analysis after modifying their pipeline to generate independent wildtype (WT) and mutant (MT) scores. To obtain the protein full sequences, we requested the UniProt IDs of the employed proteins and downloaded the full sequences from UniProt. The method used in this study is PWM-independent. Therefore, we asked the authors to provide PWM, for visualization purposes, by converting the PEMs to PWMs. The PWMs provided by the authors were missing the Cysteine (C) amino acid due to experimental procedures. We added dummy C column to the PWM with values of 0 to make the PWM format compatible with the standard formats that works with visualization and scoring tools and algorithms. The variants description was reported in non-standard notation. The position of the mutation was relative to the start of binding-peptide rather than the start of the protein. Thus, we had to calculate the correct mutations position through Pm = Ppho + Pr - 8, where Pm is the mutation position, Ppho is the phosphosite position and Pr is the reported mutations position in the data received from the authors. Furthermore, this study evaluates the impact of the mutations that occurs in the SH2 domains, not only the binding peptides. Since doesn’t support this type of information, we kept the variants impact assessment in the binding peptides only (~45%) and eliminated the remaining data. Since DV-IMPACT does not support this type of information, we kept the assessment of mutations in the binding peptides only (~45%) and eliminated the remaining data.
Publication: AlQuraishi M. et al, Nature Genetics, 2014. doi:10.1038/ng.3138