I agree with that I think this approach may destroy that relationship by overweighting rare organisms. The actual value of alpha diversity depends on several external factors beyond rarefaction (wetlab, sequencing depth, denoising method, etc) that at the end of the day, the questions should not be "what is the absolute value" but rather "what is the relationship"? Practically working with my own data, I tend to scale my richness values because people reviewing the papers focus on the numbers rather than effect size. "You can hardly see a stable result" depends on your metric and observation, although it also depends on your goal, which IMO is a stable relationship. Will be happy to continue the discussion.Īnd please do not let this discourage you from continuing to contribute to the microbiome and qiime2 community! inifinite repeat-rarefaction to 10k reads is similar to normalizing by dividing by the original number of reads in the sample and multiplying by 10k. And therefore, it will be different compared to the 10k reads/sample repeats.Īnother way to think about it is that doing infinite number of repeat-rarefy is equivalent to total-sum-scaling (i.e. Then (if we round up), we will get 1 read/sample for the 100 rare bacteria. 0.5 read/sample for these 100 rare bacteria. However, if we do repeat-rarefy, we will get approx. If we just rarify to 10k reads/sample, we will lose approx 50 of these rare bacteria and keep the other 50 (similar to the 10k reads/sample repeats). all the rare bacteria with 1 read/bacteria. In the 20k reads/sample repeats, we expect to get approx. 50 of the rare bacteria with 1 read, and 50 with 0 reads. In the 10k reads/sample repeats, we expect to get approx. To explain why i think this will happen, lets assume we have some rare bacteria (say 100) that are in the (true) frequency of 1/10000 in the original sample. However, if we apply instead the repeat-rarefy procedure to the 20k reads repeats, and then look for difference between the repeats originating from 10k reads and originating from 20k reads, I think we may get some bacteria different between the 2 groups. If we rarify all repeats to 10k reads/repeat, and then look for difference between the repeats originating from 10k reads and originating from 20k reads, we will get no significant differences, as we would expect. For example, lets take the situation where we have a single biological sample, and we sequence it to two depths (and assume for each depth we have 10 technical repeats): 10 repeats with 10k reads and 10 repeats with 20k reads. The purpose of rarefaction is to remove the effects which are due to different read depths in the different samples. However, I think the idea of multiple rarefaction is incorrect from a statistical point of view: I saw your q2-repeat-rarefy qiime2 plugin and really appreciate your contribution to the microbiome community.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |