Big data storage and retrieval is cause of concern for future. This paper proposes to store data by organizing nucleotides (adenine (A) and thymine (T)) to represent binary 0’s and 1’s. Small fragments of high molecular DNA can be achieved by chain termination method and by shotgun sequencing method, we can select the fragments containing A’s and T’s in order we want. This paper demonstrates a python script, which can produce A and T sequence from the sequence of data as input which can be used while using shotgun sequencing for selecting/discarding of strands, script can also be used to query the DNA database by integration with DNA sequencing method. The data will not be written on the nucleotide but nucleotide sequence will be modified to represent the data. This paper further illustrates how sequence of nucleotides can be arranged as logical table with a proper header and how the python script can be used to save and retrieve the data from this table. The storage and retrieval of 24 bits of data can easily be completed in 15-20 minutes under controlled lab conditions which includes manual and mechanical effort. This paper proves how by automation and robotic arms, this delay can be reduced further and how to overcome challenges that can come for preserving the DNA strands. This approach, can lead to development of DNA data warehouse where there will be infinite storage space as the DNA can be obtained virtually free of cost from any living thing.
"BIG DATA MANAGEMENT – DNA DATA WAREHOUSE,"
International Journal of Computer and Communication Technology: Vol. 6
, Article 5.
Available at: https://www.interscience.in/ijcct/vol6/iss3/5