At L7 Informatics, we spend a lot of time talking about the importance of data provenance, and often how the need for robust provenance gets overlooked in even the most sophisticated laboratory setting. The conversation always brings up some very interesting scenarios and questions:
- “Yes, of course I keep track of it all. I have a LIMS.” Sure, everyone has a LIMS (well ok, not everyone), but does the LIMS you use track everything you need to track? Also importantly, does your LIMS withstand the barrage of information that’s associated with more data-intensive technologies, such as next-gen sequencing (NGS)?
- What about tracking samples through data analysis? Does anyone have LIMS that natively tracks analytical processes on their high-performance computing infrastructure?
- How about repeating an NGS experiment (including NGS analysis), whether it’s your own experiment, or whether it’s someone else’s? Do you know exactly which algorithm the bioinformatician used, or what version of the aligner they used? Do you even know what your bioinformaticians are doing?
- If everyone is keeping great records (which they probably aren’t), then can the scientific community trust the results that are coming from any given lab?
And so on, and so on (welcome to our rabbit hole)…
So we took a step back and thought about the way that data should be maintained and managed. There are a lot of considerations to take into account when setting up a robust data provenance system. While yes, we did form a company to build a robust software platform, we still believe that the entire subject does not get enough attention or conversation.
Therefore, we commissioned an eBook to discuss this topic and to bring to light many of our own internal discussions. I hope that you will take a look at the content, and then please get back to me with your thoughts, input, and feedback on your processes.