Exact(2)
This paper describes examples of how existing datasets have been used and identifies new and emerging technical issues that could be addressed using a broader set of reference datasets as part of an International CO2 Storage Dataset Consortium.
Within the Human Microbiome Project dataset (Consortium HMP, 2012), approximately 10% of the samples contained melainabacterial 16S rRNA gene sequences, providing a rough estimate as to what fraction of the American population carries Melainabacteria.
Similar(58)
To also investigate psiko in a human population context, we applied it to a subset of the HapMap Phase 3 dataset (International HapMap Consortium 2010).
On its own and in combination with PLINK's sliding window SNP pruning procedure, we also tested the Q-matrices produced by PSIKO and the three other methods under investigation on a subset of the HapMap Phase 3 project dataset (International HapMap Consortium 2010).
To focus on editing sites inside coding regions, and avoid repetitive elements that are prone to assembly and alignment errors, we retained only those components that were found to be significant similar (Blastx E-value<1e-6) to the Swiss-Prot proteins dataset (UniProt Consortium, 2014).
Additional file 13 shows GO terms overrepresented commonly in Genome Consortium dataset and Ensembl dataset (see Methods for the detail of these datasets).
Gene Ontology terms revealed to be overrepresented both in Genome Consortium dataset and Ensembl dataset are listed.
Between the three species, there were significant differences in the frequency of HPAA tract-containing peptides (Consortium dataset: p < 0.0001; NCBI mRNA dataset: p < 0.01).
Using a survey of SNPs surrounding the ORMDL3 gene genotyped in the Welcome Trust case consortium dataset, no association was found with AS (personal communication).
Filtering for variants with a possible impact at the protein level, and with a minor allele frequency <0.05 in the Exome Aggregation Consortium dataset [ 35], 71 potentially pathogenic variants were selected for further analysis (Fig. 1b).
To validate the results based on this Genome Consortium dataset, for the sea lamprey, we used 'all and known proteins' sequences (n = 11,442) available at Ensembl release 72 (ftp://ftp.ensembl.org/pub/release-72/fasta/petromyzon_marinus/pep/Petromyzon_marinus.Pmarinus_7.0.72.pep.all.fa.gz).fa.gz
Write better and faster with AI suggestions while staying true to your unique style.
Since I tried Ludwig back in 2017, I have been constantly using it in both editing and translation. Ever since, I suggest it to my translators at ProSciEditing.

Justyna Jupowicz-Kozak
CEO of Professional Science Editing for Scientists @ prosciediting.com