Data mining effects on SAN

dataminingToday’s e-business and scientific relational databases place a huge impact on SAN performance, especially within data mining. Data mining is the modern tool of the acquisition of demographic information on consumer buying habits. Within large e-business organizations, there is a constant demand on acquired customer demographic data: this data is examined within the database for buying trends and patterns to determine marketing and advertising strategy. This data is also compiled and sold. Within the scientific community, data mining is used within genetics. “A wealth of genomic information in the form of publicly available databases is underutilized as a potential resource for uncovering functionally relevant markers underlying complex human traits. Given the huge amount of SNP (single nucleotide polymorphism) data available from the annotation of human genetic variation, data mining is a reasonable approach to investigating the number of SNPs that are informative for ancestry information.” (Baye, et al, 2009). Population mapping is a large part of this database querying within data mining, and this places a large demand on SAN architecture, especially throughput and selection of communications medium such as fiber and data transport protocols.

For a SAN to provide the desired performance under this kind of load from data mining, it would require the implementing of good monitoring applications and smart storage management suites, which are available from the hardware vendors. One example of a data management suite is EMC’s Networker, an automated data accelerator: “…Networker backup software, you get a common platform that supports a wide range of data protection options including backup to disk, replication management, continuous data protection, and de-duplication across physical and virtual environments…” (Networker, 2009).

De-duplication is another process that is essential to data organization within storage, and there exist many applications that provide this valuable service; although it does cause more system overhead.


Baye, T M, Tiwari, H K, Allison, D B, & Go, R C (Feb 14, 2009). Database mining for selection of SNP markers useful in admixture mapping.(Research)(single nucleotide polymorphisms)(Report). BioData Mining, 2, 1. p.1. Retrieved June 05, 2009, from Academic OneFile via Gale:

Networker. (2009). EMC. Retreived June 5, 2009 from