Your English writing platform
Discover LudwigExact(4)
Clustering performance is enhanced by a system for finding of erroneous identifiers and subsequent record partitioning.
Subsequently, the resulting clusters are studied and, if appropriate, partitioned using a graph based algorithm detecting erroneous identifiers.
An example is illustrated in Figure 1. 2. When one views the identifier combinations as a graph, with edges comprising shared identifiers, the erroneous identifiers form the origins of edges.
3. It follows that potentially erroneous identifiers are only a small subset of all the identifiers in the records of interest, in that they both form the origins of edges (as in 2, above); further, that they lie within the set of identifiers having the properties in (1, above).
Similar(56)
It has previously been proposed to use a voting approach to disambiguate non-systematic identifiers when integrating multiple databases, assigning the identifier to the compound to which it was most frequently associated in the databases [29], but this approach may be biased by error propagation when one database includes an erroneous identifier from another database.
Provided unique identifiers exist, however, if one can detect records from different individuals containing a shared, erroneous identifier, then there is the opportunity to partition the clusters formed in order to drive up clustering quality.
Patients for whom linkage with population registries was not possible (e.g. due to erroneous patient identifiers) were classified as "no follow-up available" and excluded from the analyses.
We identified a number of types of mismatch, including erroneous EntrezGene identifier (21 occurrences), inclusion of nonhuman or noncoding mutations (23 and 11 occurrences, respectively), missing gold standard information (3 occurrences) and erroneous Turker judgments (10 occurrences).
Erroneous author identifier labels, tagged to a publication, can also lend themselves to a protocol for de-convolution.
In particular, bias was greatest when identifier error rates differed between hospitals, supporting other studies that have shown differential linkage error by ethnic group, exclusion of vulnerable populations due to poorly-recorded identifiers and erroneous rankings of relative hospital performance due to differing data quality between units [ 26- 32].
Campbell, D. Identifying the Identifiers.
Write better and faster with AI suggestions while staying true to your unique style.
Since I tried Ludwig back in 2017, I have been constantly using it in both editing and translation. Ever since, I suggest it to my translators at ProSciEditing.

Justyna Jupowicz-Kozak
CEO of Professional Science Editing for Scientists @ prosciediting.com