@@ -77,7 +77,7 @@ Peak files correspond to MS/MS files that can be use to verified the correspondi
...
@@ -77,7 +77,7 @@ Peak files correspond to MS/MS files that can be use to verified the correspondi
## RESULT files
## RESULT files
Result files are standard file formats developed by HUPO-PSI Consortium to exchange proteomics results. Following the ProteomeXchange guidelines the submissions that provides these files are called **COMPLETE** submissions. These files are really important in PRIDE because they are the only result files that PRIDE ecosystem (resources, tools) are able to read/write/transform. For example, for these submissions are the only ones that are searchable by protein identifiers and peptide sequences in PRIDE Archive [read more here about searching](../../static/documentation/searchinginpridearchive).
Result files are standard file formats developed by HUPO-PSI Consortium to exchange proteomics results. Following the ProteomeXchange guidelines the submissions that provides these files are called **COMPLETE** submissions. These files are really important in PRIDE because they are the only result files that PRIDE ecosystem (resources, tools) are able to read/write/transform. For example, for these submissions are the only ones that are searchable by protein identifiers and peptide sequences in PRIDE Archive [read more here about searching](../static/documentation/searchinginpridearchive).
- mzIdentML (version 1.1 and 1.2 http://www.psidev.info/mzidentml): mzIdentML is one of the standards developed by the Proteomics Informatics working group of the PSI. The extension of the file .mzid is used by the submission tool to recognize the file format. The mzIdentML only contains the peptide/protein identification information of a proteomics experiment not the Quantitation.
- mzIdentML (version 1.1 and 1.2 http://www.psidev.info/mzidentml): mzIdentML is one of the standards developed by the Proteomics Informatics working group of the PSI. The extension of the file .mzid is used by the submission tool to recognize the file format. The mzIdentML only contains the peptide/protein identification information of a proteomics experiment not the Quantitation.
@@ -36,48 +36,48 @@ the corresponding scores from the search engines, the list of modifications, Pre
...
@@ -36,48 +36,48 @@ the corresponding scores from the search engines, the list of modifications, Pre
information of the OMSSA and XTandem Score, also the table provides the information of the peptide sequence length; start and end position in the protein
information of the OMSSA and XTandem Score, also the table provides the information of the peptide sequence length; start and end position in the protein
PRIDE Inspector ‘Overview’ panel: ‘Experiment General View’. The tab contains basic metadata information about an experimental file: experiment and project titles,
PRIDE Inspector ‘Overview’ panel: ‘Experiment General View’. The tab contains basic metadata information about an experimental file: experiment and project titles,
contact information, software used for the file generation, and original file format, amongst others.
contact information, software used for the file generation, and original file format, amongst others.
PRIDE Inspector ‘Overview’ panel: ‘Instrument & Processing View’. This tab contains metadata information about the instrument configuration and software used.
PRIDE Inspector ‘Overview’ panel: ‘Instrument & Processing View’. This tab contains metadata information about the instrument configuration and software used.
PRIDE Inspector ‘Overview’ panel: ‘Identification Protocol View’. This tab contains metadata information about the peptide/protein identification protocols
PRIDE Inspector ‘Overview’ panel: ‘Identification Protocol View’. This tab contains metadata information about the peptide/protein identification protocols
such as search parameters, databases, search engines and software used.
such as search parameters, databases, search engines and software used.
@@ -92,7 +92,7 @@ To do this we integrated specific components that access the identifications sou
...
@@ -92,7 +92,7 @@ To do this we integrated specific components that access the identifications sou
If the identifier was only updated, the new accession is automatically displayed in the protein table and the updated sequence retrieved. In some cases, even though a protein’s identifier did not change its underlying sequence was altered in the protein sequence database. Therefore, PRIDE Inspector automatically fetches a protein’s current sequence and checks whether the reported peptides still fit this identification.
If the identifier was only updated, the new accession is automatically displayed in the protein table and the updated sequence retrieved. In some cases, even though a protein’s identifier did not change its underlying sequence was altered in the protein sequence database. Therefore, PRIDE Inspector automatically fetches a protein’s current sequence and checks whether the reported peptides still fit this identification.


When using the **Obtain Protein Details** feature in the PRIDE Inspector, the status of the protein according to the original database is downloaded
When using the **Obtain Protein Details** feature in the PRIDE Inspector, the status of the protein according to the original database is downloaded
in addition to the protein name and protein sequence. It could be one of the following cases:
in addition to the protein name and protein sequence. It could be one of the following cases:
...
@@ -118,21 +118,21 @@ The PRIDE Inspector provides as home screen were the user can select the option
...
@@ -118,21 +118,21 @@ The PRIDE Inspector provides as home screen were the user can select the option
mzIdentML or mzTab:
mzIdentML or mzTab:


If the user provides an mzIdentML without protein inference information the tool will popup a message to run the protein inference algorithm:
If the user provides an mzIdentML without protein inference information the tool will popup a message to run the protein inference algorithm:
When the algorithm finish the protein panel shows the list of identified proteins, including the new protein groups and the proteins than bellows to them, the "Show Protein Inference Option" provides a new popup with the protein inference visualisation:
When the algorithm finish the protein panel shows the list of identified proteins, including the new protein groups and the proteins than bellows to them, the "Show Protein Inference Option" provides a new popup with the protein inference visualisation:
The final aim of the pride-protein-inference library and PRIDE Inspector tool is to show and present the inference information to the final users. Especially the information for each group and the number of PSMs and peptides shared by interested proteins can be seen using the Protein Inference Visualisation:
The final aim of the pride-protein-inference library and PRIDE Inspector tool is to show and present the inference information to the final users. Especially the information for each group and the number of PSMs and peptides shared by interested proteins can be seen using the Protein Inference Visualisation:
To make the scores more human readable, the result from Equation 1 (Pi) is transformed to a logarithmic scale and the final PeptideScore(i) is obtained (Equation 2):
To make the scores more human readable, the result from Equation 1 (Pi) is transformed to a logarithmic scale and the final PeptideScore(i) is obtained (Equation 2):
This entire process is repeated in 10 iterations. In each loop different peak depths (i) are chosen to calculate the cumulative binomial probability and
This entire process is repeated in 10 iterations. In each loop different peak depths (i) are chosen to calculate the cumulative binomial probability and
finally ten different are generated. A different weight is assigned for each score (1 = 0.5; 2 = 0.75; 3 = 1; 4 = 1; 5 = 1; 6 = 1; 7 = 0.75; 8 = 0.5; 9 = 0.25;
finally ten different are generated. A different weight is assigned for each score (1 = 0.5; 2 = 0.75; 3 = 1; 4 = 1; 5 = 1; 6 = 1; 7 = 0.75; 8 = 0.5; 9 = 0.25;
...
@@ -181,7 +181,7 @@ and 10 = 0.25), and then a weighted average score called peptide score is genera
...
@@ -181,7 +181,7 @@ and 10 = 0.25), and then a weighted average score called peptide score is genera
The pipeline:
The pipeline:


### Fragmentation annotation rules
### Fragmentation annotation rules
...
@@ -211,7 +211,7 @@ Mass deltas close to zero reflect more accurate identifications and also that th
...
@@ -211,7 +211,7 @@ Mass deltas close to zero reflect more accurate identifications and also that th
done accurately. This plot can highlight systematic bias if not centered on zero. Other distributions can reflect modifications not being reported
done accurately. This plot can highlight systematic bias if not centered on zero. Other distributions can reflect modifications not being reported
properly. Also it is easy to see the different between the target and the decoys identifications.
properly. Also it is easy to see the different between the target and the decoys identifications.


In Figure 1, we can clearly see that the distribution for this experiment is centred close to zero with for target identifications, but for
In Figure 1, we can clearly see that the distribution for this experiment is centred close to zero with for target identifications, but for
decoy identifications peaks at 0.5 and around 0.7 m/z units show that are wrong identifications. Peptide sequences, charges and modifications,
decoy identifications peaks at 0.5 and around 0.7 m/z units show that are wrong identifications. Peptide sequences, charges and modifications,
...
@@ -223,7 +223,7 @@ This is a bar chart displaying the percentage of protein identifications in the
...
@@ -223,7 +223,7 @@ This is a bar chart displaying the percentage of protein identifications in the
**Note**: To investigate further, in the Protein view, one can sort the proteins by number of peptide identifications.
**Note**: To investigate further, in the Protein view, one can sort the proteins by number of peptide identifications.


In the experiment represented in Figure 2, 60% of the proteins were identified through one PSM only. The rest of the protein identifications, especially the ones with higher peptide numbers can be considered more reliable identifications.
In the experiment represented in Figure 2, 60% of the proteins were identified through one PSM only. The rest of the protein identifications, especially the ones with higher peptide numbers can be considered more reliable identifications.
...
@@ -233,7 +233,7 @@ This is a histogram representing the percentage of peptides in the experiment wi
...
@@ -233,7 +233,7 @@ This is a histogram representing the percentage of peptides in the experiment wi
In a more practical way, this chart has two immediate applications: first, checking that the search engine is working correctly and the number of missed cleavages found in the identified peptides matches with the "missed cleavages" parameter used in the search engine. Second, by knowing the distribution of this chart, the researcher can adjust the number of missed cleavages used in future searches: e.g. maybe the use of 4 missed cleavages instead of 1 is producing only a 0.1% increase in peptide identifications with searches that are 10 times longer.
In a more practical way, this chart has two immediate applications: first, checking that the search engine is working correctly and the number of missed cleavages found in the identified peptides matches with the "missed cleavages" parameter used in the search engine. Second, by knowing the distribution of this chart, the researcher can adjust the number of missed cleavages used in future searches: e.g. maybe the use of 4 missed cleavages instead of 1 is producing only a 0.1% increase in peptide identifications with searches that are 10 times longer.
Figure 3 shows an example where only about 72% of the target peptides do not have a missed cleavage. However, it is interesting to see that most of the decoy identifications contain missed cleavages.
Figure 3 shows an example where only about 72% of the target peptides do not have a missed cleavage. However, it is interesting to see that most of the decoy identifications contain missed cleavages.
...
@@ -241,13 +241,13 @@ Figure 3 shows an example where only about 72% of the target peptides do not hav
...
@@ -241,13 +241,13 @@ Figure 3 shows an example where only about 72% of the target peptides do not hav
This graph is obtained adding all the MS/MS spectra in a given experiment. The result is an averaged spectrum. The highest peaks will reflect abundant and intense peaks in the overall set of MS/MS spectra. Most intense and ubiquitous peaks (both conditions needed) will be displayed here: contaminants, reagents used in the experiment, frequent fragmentations from highly common peptides. The next chart (Figure 4) shows an example of a public experiment in PRIDE, using iTRAQ reagents for quantification. The zoom has been used to show in detail the highlighted information.
This graph is obtained adding all the MS/MS spectra in a given experiment. The result is an averaged spectrum. The highest peaks will reflect abundant and intense peaks in the overall set of MS/MS spectra. Most intense and ubiquitous peaks (both conditions needed) will be displayed here: contaminants, reagents used in the experiment, frequent fragmentations from highly common peptides. The next chart (Figure 4) shows an example of a public experiment in PRIDE, using iTRAQ reagents for quantification. The zoom has been used to show in detail the highlighted information.
This is a bar chart representing the distribution of the precursor ion charges for a given whole experiment. This information can be used to identify potential ionization problems including many 1+ charges from an ESI ionization source or an unexpected distribution of charges. MALDI experiments are expected to contain almost exclusively 1+ charged ions. An unexpected charge distribution may furthermore be caused by specific search engine parameter settings such as limiting the search to specific ion charges.
This is a bar chart representing the distribution of the precursor ion charges for a given whole experiment. This information can be used to identify potential ionization problems including many 1+ charges from an ESI ionization source or an unexpected distribution of charges. MALDI experiments are expected to contain almost exclusively 1+ charged ions. An unexpected charge distribution may furthermore be caused by specific search engine parameter settings such as limiting the search to specific ion charges.


In this ESI experiment there are no single charged ions but only double and triple charged ones.
In this ESI experiment there are no single charged ions but only double and triple charged ones.
...
@@ -265,7 +265,7 @@ Experiments that only contained peptides without missed cleavages were ignored a
...
@@ -265,7 +265,7 @@ Experiments that only contained peptides without missed cleavages were ignored a
A curve that lies to the left of the empirical distribution (in a different colour) identifies a disproportionate number of lower mass peptides being identified/ fragmented. In an analogous way, a curve that lies to the right of the empirical distribution identifies a disproportionate number of higher mass peptides being identified/ fragmented. Such alterations may be caused by the general amino acid composition of the organism being investigated, or the digestion protocol used (non-tryptic) but does not necessarily indicate a problem in your experiment.
A curve that lies to the left of the empirical distribution (in a different colour) identifies a disproportionate number of lower mass peptides being identified/ fragmented. In an analogous way, a curve that lies to the right of the empirical distribution identifies a disproportionate number of higher mass peptides being identified/ fragmented. Such alterations may be caused by the general amino acid composition of the organism being investigated, or the digestion protocol used (non-tryptic) but does not necessarily indicate a problem in your experiment.


For human, the average tryptic peptide mass is 1,100 Da. This distribution should encompass this average. A shift to the right in this distribution should be expected due to a number of missed cleavages resulting in higher mass peptides.
For human, the average tryptic peptide mass is 1,100 Da. This distribution should encompass this average. A shift to the right in this distribution should be expected due to a number of missed cleavages resulting in higher mass peptides.
...
@@ -273,19 +273,19 @@ For human, the average tryptic peptide mass is 1,100 Da. This distribution shoul
...
@@ -273,19 +273,19 @@ For human, the average tryptic peptide mass is 1,100 Da. This distribution shoul
This chart represents a histogram containing the number of peaks per MS/MS spectrum in a given experiment. This chart assumes centroid data. Too few peaks can identify poor fragmentation or a detector fault, as opposed to a large number of peaks representing very noisy spectra. This chart is extensively dependent on the pre-processing steps performed to the spectra (centroiding, deconvolution, peak picking approach, etc). The example shown in Figure 7 shows that poor quality spectra are more likely to be decoy identifications that target identifications.
This chart represents a histogram containing the number of peaks per MS/MS spectrum in a given experiment. This chart assumes centroid data. Too few peaks can identify poor fragmentation or a detector fault, as opposed to a large number of peaks representing very noisy spectra. This chart is extensively dependent on the pre-processing steps performed to the spectra (centroiding, deconvolution, peak picking approach, etc). The example shown in Figure 7 shows that poor quality spectra are more likely to be decoy identifications that target identifications.


### Peak Intensity Distribution
### Peak Intensity Distribution
This is a histogram representing the ion intensity vs. the frequency for all MS2 spectra in a whole given experiment (Figure 8). It is possible to filter the information for all, identified and unidentified spectra. This plot can give a general estimation of the noise level of the spectra. Generally, one should expect to have a high number of low intensity noise peaks with a low number of high intensity signal peaks. A disproportionate number of high signal peaks may indicate heavy spectrum pre-filtering or potential experimental problems. In the case of data reuse this plot can be useful in identifying the requirement for pre-processing of the spectra prior to any downstream analysis. The quality of the identifications is not linked to this data as most search engines perform internal spectrum pre-processing before matching the spectra. Thus, the spectra reported are not necessarily pre-processed since the search engine may have applied the pre-processing step internally. This pre-processing is not necessarily reported in the experimental metadata.
This is a histogram representing the ion intensity vs. the frequency for all MS2 spectra in a whole given experiment (Figure 8). It is possible to filter the information for all, identified and unidentified spectra. This plot can give a general estimation of the noise level of the spectra. Generally, one should expect to have a high number of low intensity noise peaks with a low number of high intensity signal peaks. A disproportionate number of high signal peaks may indicate heavy spectrum pre-filtering or potential experimental problems. In the case of data reuse this plot can be useful in identifying the requirement for pre-processing of the spectra prior to any downstream analysis. The quality of the identifications is not linked to this data as most search engines perform internal spectrum pre-processing before matching the spectra. Thus, the spectra reported are not necessarily pre-processed since the search engine may have applied the pre-processing step internally. This pre-processing is not necessarily reported in the experimental metadata.
The Peptide per Ratio is a chart representing the peptide distribution versus the study variables in the quantitation experiment. It shows the differences between all the replicates and samples for every peptide. In addition, it shows the relation between different conditions globally. The following example shows the differences between all the samples in an 8-plex iTRAQ experiment.
The Peptide per Ratio is a chart representing the peptide distribution versus the study variables in the quantitation experiment. It shows the differences between all the replicates and samples for every peptide. In addition, it shows the relation between different conditions globally. The following example shows the differences between all the samples in an 8-plex iTRAQ experiment.


## Searching experiments in PRIDE
## Searching experiments in PRIDE
...
@@ -295,7 +295,7 @@ Users can search (Search box) using metadata information such as species, tissue
...
@@ -295,7 +295,7 @@ Users can search (Search box) using metadata information such as species, tissue
If the use select a Project, all the assays (files) corresponding with the select project are shown. Then, the user can download the files using the
If the use select a Project, all the assays (files) corresponding with the select project are shown. Then, the user can download the files using the
corresponding download button.
corresponding download button.


The user can remove the search terms in the right side of the screen.
The user can remove the search terms in the right side of the screen.
The PRIDE Submission Tool is the main tool used to submit the proteomics experiment to [PRIDE Archive](wwww.ebi.ac.uk/pride/archive/). This tool has been implemented as a Wizard, guiding the submitter trougth a set of simple steps to build the final submission.
The PRIDE Submission Tool is the main tool used to submit the proteomics experiment to [PRIDE Archive](http://wwww.ebi.ac.uk/pride/archive/). This tool has been implemented as a Wizard, guiding the submitter trougth a set of simple steps to build the final submission.
## Login Panel
## Login Panel
...
@@ -8,7 +8,7 @@ The first step to submit a dataset to PRIDE Archive is to log into PRIDE using a
...
@@ -8,7 +8,7 @@ The first step to submit a dataset to PRIDE Archive is to log into PRIDE using a
> Depending on the [files provided](/markdownpage/pridefileformats) and the type of submission, the tool will try to link the relation between files and also the file
> Depending on the [files provided](./pridefileformats) and the type of submission, the tool will try to link the relation between files and also the file
type.
type.
### Complete Submissions based on mzIdentML
### Complete Submissions based on mzIdentML
...
@@ -35,7 +35,7 @@ type.
...
@@ -35,7 +35,7 @@ type.
When a Complete submission is performed based on mzIdentMLs files. The dataset should contains at least one PEAK list associated with the mzIdentML
When a Complete submission is performed based on mzIdentMLs files. The dataset should contains at least one PEAK list associated with the mzIdentML
> By default, the tool makes an attempt to generate the mapping between the ‘RESULT’ and the other - most importantly ‘RAW’ - files.
> By default, the tool makes an attempt to generate the mapping between the ‘RESULT’ and the other - most importantly ‘RAW’ - files.
...
@@ -83,7 +83,7 @@ If the automatic mapping is partial only or does not apply, the submitter is ask
...
@@ -83,7 +83,7 @@ If the automatic mapping is partial only or does not apply, the submitter is ask
Additional metadata need be provided for each ‘RESULT’ file in the case of a ‘Complete’ submission, and what is needed is the same for both subtypes
Additional metadata need be provided for each ‘RESULT’ file in the case of a ‘Complete’ submission, and what is needed is the same for both subtypes
of submissions (mzTab and mzIdentML).
of submissions (mzTab and mzIdentML).


Previous figure shows the screen where the _‘Annotate’_ button can be clicked for each ‘RESULT’ file. This information is usually imported automatically
Previous figure shows the screen where the _‘Annotate’_ button can be clicked for each ‘RESULT’ file. This information is usually imported automatically
in the case of a mzTab file (if the recommended CVs/ontologies are used). For mzIdentML, the information needs to be manually annotated.
in the case of a mzTab file (if the recommended CVs/ontologies are used). For mzIdentML, the information needs to be manually annotated.
...
@@ -93,7 +93,7 @@ The following additional metadata is **Mandatory**:
...
@@ -93,7 +93,7 @@ The following additional metadata is **Mandatory**:
- Tissue: Tissue (not applicable should be used in case of cell line experiments).
- Tissue: Tissue (not applicable should be used in case of cell line experiments).
- Instrument information.
- Instrument information.


Each of these information should be provided in Controlled Vocabulary (CV) terms from a drop-down menu. Optionally, providing information
Each of these information should be provided in Controlled Vocabulary (CV) terms from a drop-down menu. Optionally, providing information
about the _cell type_, _disease_ and _quantification method_ (if applicable) is recommended.
about the _cell type_, _disease_ and _quantification method_ (if applicable) is recommended.
...
@@ -110,14 +110,14 @@ e.g. the fish Grayling (Thymallus thymallus) the species is not available from t
...
@@ -110,14 +110,14 @@ e.g. the fish Grayling (Thymallus thymallus) the species is not available from t
_Other species_ and search for Thymallus thymallus in the OLS panel.
_Other species_ and search for Thymallus thymallus in the OLS panel.
@@ -129,14 +129,14 @@ In this panel, it is recommended to provide additional metadata in four cases:
...
@@ -129,14 +129,14 @@ In this panel, it is recommended to provide additional metadata in four cases:
- There are other “omics” datasets (for instance transcriptomics, metabolomics data present in other repositories) that can be associated with it. In this case, please provide the accession number of the dataset in the corresponding repository.
- There are other “omics” datasets (for instance transcriptomics, metabolomics data present in other repositories) that can be associated with it. In this case, please provide the accession number of the dataset in the corresponding repository.
This is the last step before the file upload actually starts. You should double-check that all the necessary files are included in the submission summary before continuing to the upload step, see an example of an mzIdentML based ‘complete’ submission.
This is the last step before the file upload actually starts. You should double-check that all the necessary files are included in the submission summary before continuing to the upload step, see an example of an mzIdentML based ‘complete’ submission.