mzCloud is an extensively curated database of high-resolution tandem mass spectra that are arranged into spectral trees. MS/MS and multi-stage MSn spectra were acquired at various collision energies, precursor m/z, and isolation widths using Collision-induced dissociation (CID) and Higher-energy collisional dissociation (HCD). Each raw mass spectrum was filtered and recalibrated giving rise to additional filtered and recalibrated spectral trees that are fully searchable. Besides the experimental and processed data, each database record contains the compound name with synonyms, the chemical structure, computationally and manually annotated fragments (peaks), identified adducts and multiply charged ions, molecular formulas, predicted precursor structures, detailed experimental information, peak accuracies, mass resolution, InChi, InChiKey, and other identifiers. mzCloud is a fully searchable library that allows spectra searches, tree searches, structure and substructure searches, monoisotopic mass searches, peak (m/z) searches, precursor searches, and name searches. To preview all mass spectra or conduct searches, please go to Database.
Soft ionization and CID techniques can generate spectra whose appearance depends on the experimental conditions and sample preparation. To manage and search diverse product spectra with an identical precursor ion for a single chemical entity, the mzCloud uses spectral trees that can contain nodes with several node items. The node item stands for any product or calculated spectrum of an identical precursor m/z value or m/z range (node spectra), or for a chromatogram.
Node product spectra represent spectra that were acquired at various collision energies and isolation widths or that use wideband activation.They can also be zoom spectra, source CID spectra, or any other spectra that provide more spectral signatures for correct compound identification, similarly as police use several of the fingerprints of a single hand. If a node contains more than two parallel spectra, the application automatically calculates the average and composite spectra. In addition to spectra, each node can contain a chromatogram if the reference compound was a component of a mixture.
The spectral node strategy strengthens the robustness of all the mathematical processing methods and, compared to simple spectra averaging, does not distort the highly nonlinear peak ratio progress.
You can easily access node spectra from a library by selecting node items by clicking the edge of the spectral or chromatographic node item, or by browsing in the box displayed below the tree.
mzCloud is the first public spectral database that offers the Precursor Ion Fingerprinting (PIF) technique. This innovative approach identifies substructural information through the comparison of product ion spectra of structurally related compounds. Structural information is derived by utilizing previously characterized ion structures stored in mzCloud and matching them with unknown product ion spectra. PIF is a very powerful technique that heavily depends upon spectra of precursor ions of various chemical classes acquired under various experimental conditions. The more ion fingerprints present in the library, the more substructures can be identified from unknown spectra.
Since it is impossible for a single or a few research groups to acquire the substructural fingerprints represented as spectral trees that would cover a significant portion of biologically, environmentally or toxicologically relevant structural space, contributors from various metabolomics, natural compound, forensic or industrial fields are welcome to contribute to the mzCloud database.
Each mzCloud spectrum contains the precursor ion m/z value, isolation width, a list of product ion m/z values with per-peak mass accuracies, corresponding absolute and relative intensities, ion polarity, charge state, the structure of the precursor ion (in most cases), and the structure of the parent molecule. For the assignment of fragment structures to a precursor ion, proven heuristic and quantum mechanical methods are used. Read more…
Even if the acquisition process follows a standardized experimental protocol, reproducible product ion spectra are difficult to achieve since the probabilities of various possible ion decomposition products depend on their internal energy which can differ for identical ions if derived from different parent compounds. Since the standardized criteria cannot be met completely, several product ion spectra for the same precursor ion are acquired using various experimental conditions (collision energies, isolation widths, etc.) to compensate for possible differences between reference and query spectra. Various mathematical methods have been developed to harness spectral dissimilarities to allow the correct identification of fragment structures. Read more…
To harness the spectra reproducibility problem additionally and to control the spectra matching times of rich high resolution/accuracy mass spectra, each raw MSn scan is additionally represented as a filtered and recalibrated spectrum for each used collision energy and isolation width separately. Spectra are filtered by novel chemically intelligent peak removal algorithms and manually recalibrated using a series of supervised ion calibration steps. In order to accommodate filtered and recalibrated spectra and to manage their parallel searches, mzCloud uses advanced database technology. Read more…
Third generation spectra correlation algorithm mzCloud uses a native high resolution, accurate mass matching algorithm (HighRes) which takes into account the different nature of high resolution spectra when calculating the match factor. The HighChem HighRes algorithm is based on statistical, non dot-product analysis that considers the accuracy of every individual peak. The measure of dependence between two spectra is expressed by a match factor that is determined from two independent coefficients - m/z value and abundance correlation coefficients.
The m/z value coefficient expresses the measure of similarity of m/z values that are considered to be identical. The coefficient is calculated as a weighted mean of dimensionless distance between overlapped peaks.
The abundance coefficient defines the correlation between abundance ranks rather than absolute abundances or their transformation (e.g. the square root). An additional parameter, the ratio of abundances of overlapping peaks to the total sum of peak abundances, is considered.
The HighRes algorithm is more tolerant towards abundance variations and tends to elevate the correlation coefficient as the accuracy of overlapping spectral peaks increases. Such a characteristic is beneficial in the identification of unknowns by library searching of high resolution spectra since it exploits a high specificity of accurate m/z values. Critical situations considered by the HighRes algorithm.
Alternatively, a proven NIST or HighChem low resolution algorithms can be chosen.
Annotations not only allow a better understanding of the fragmentation processes which led to a particular peak, but they also provide a useful tool for the identification of unknowns.
To determine the correct identity of the fragment structures needed for the substructural characterization of precursor ions, all fragments in the mzCloud database are predicted using advanced heuristic and a broad range of quantum chemical methods:
For the assignment of fragment structures to a precursor ion in the process of creating structurally characterized product ion spectra, it is extremely beneficial to have high-resolution spectra since the accurate m/z values of precursor and product ions greatly reduce the number of possible molecular formulas for fragment structures. Also, the determination of the structural arrangement for the elucidated molecule benefits from exact mass measurements by constraining the elemental composition of the elucidated molecule and consistently validating the calculated mass of recognized fragment structures and accurate m/z values of precursor and product ions.
Structure and substructure searches are useful features for retrieving library entries with specific structural moieties. mzCloud allows tailored substructural searches by applying various search rules. Alternatively, functional groups in query structures can be formally substituted using substituent symbol “R” to retrieve common structural scaffolds. Because fragmentation mechanisms on rings significantly differ from acyclic moieties, searching substructures that exactly match the ring membership and have the best overlap with library structures can further refine the search results. In doing so, the Substructure Best Match and Substructure Match Ring Bonds options available in mzCloud yield structures, which are mass spectrometrically reasonable.
In addition to neutral molecules, fragment structures, including protonated, deprotonated and adduct molecules, are also supported throughout the mzCloud databases. The integrated structure editor allows fragment structures to be drawn and searched in libraries. The peaks of reference spectra are also annotated with fragments. Isotope-labeled compounds are supported as well.
As with any professional reference database, mzCloud keeps and displays extensive supplementary information and data sources along with experimental parameters and a collection of compound identifiers.
Only high-quality data can lead to reliable and accurate results. Therefore we take the data processing of raw data very seriously. All spectra, structures and metadata are rigorously manually and electronically evaluated for errors, checked for consistency and correctness, filtered, recalibrated and compound identifiers are compared with external databases. Full-time curators with a mass spectrometric background process every single spectrum manually. Dedicated software with a number of novel algorithms is used for transforming raw data into high quality spectral trees.
To manage the immense mzCloud data volume stored in various databases, a powerful database technology has been developed and bridged with a web-based interface.
mzCloud is capable of handling multiple databases containing various types of spectral trees managed by various users. We are continuously adding new records to mzCloud and extending the number of database types. Currently, five types of databases are available:
In order to speed up the growth of mzCloud, we have begun to automate the extensive curation process used to create the Reference Library. This allowed us to generate a new library of automatically curated data, the AutoProcessed Library. All data still comes from authentic standards, just as in the Reference Library, but undergoes high throughput autoprocessing allowing us to grow mzCloud faster than ever. All data is acquired to the high standards required for mzCloud and the curation applied includes spectral averaging, multiple noise removal tools, and automated quality control checks on the entire process. This assures that, even though the data has not yet undergone the extensive professional curation process, it is still high quality authentic standard fragmentation data. Over time, the data in the AutoProcessed Library will undergo the same manual professional curation and migrate into the Reference Library. This improvement allows us to significantly speed up the growth of the mzCloud Library to provide the best possible potential for fragment spectral identification information, through ID match, spectral similarity, MSn tree searching, or with mzLogic™