About EMC Celerra NAS checkpoints

Here is a good intro on EMC Celerra NAS checkpoint technology. Checkpoints, also known as snapshots, are point-in-time images of a file system. These can be used for a quick system recovery in the event of file system corruption or loss:

Advertisements

Data mining effects on SAN

dataminingToday’s e-business and scientific relational databases place a huge impact on SAN performance, especially within data mining. Data mining is the modern tool of the acquisition of demographic information on consumer buying habits. Within large e-business organizations, there is a constant demand on acquired customer demographic data: this data is examined within the database for buying trends and patterns to determine marketing and advertising strategy. This data is also compiled and sold. Within the scientific community, data mining is used within genetics. “A wealth of genomic information in the form of publicly available databases is underutilized as a potential resource for uncovering functionally relevant markers underlying complex human traits. Given the huge amount of SNP (single nucleotide polymorphism) data available from the annotation of human genetic variation, data mining is a reasonable approach to investigating the number of SNPs that are informative for ancestry information.” (Baye, et al, 2009). Population mapping is a large part of this database querying within data mining, and this places a large demand on SAN architecture, especially throughput and selection of communications medium such as fiber and data transport protocols.

For a SAN to provide the desired performance under this kind of load from data mining, it would require the implementing of good monitoring applications and smart storage management suites, which are available from the hardware vendors. One example of a data management suite is EMC’s Networker, an automated data accelerator: “…Networker backup software, you get a common platform that supports a wide range of data protection options including backup to disk, replication management, continuous data protection, and de-duplication across physical and virtual environments…” (Networker, 2009).

De-duplication is another process that is essential to data organization within storage, and there exist many applications that provide this valuable service; although it does cause more system overhead.

References:

Baye, T M, Tiwari, H K, Allison, D B, & Go, R C (Feb 14, 2009). Database mining for selection of SNP markers useful in admixture mapping.(Research)(single nucleotide polymorphisms)(Report). BioData Mining, 2, 1. p.1. Retrieved June 05, 2009, from Academic OneFile via Gale:
http://find.galegroup.com.dml.regis.edu/itx/start.do?prodId=AONE

Networker. (2009). EMC. Retreived June 5, 2009 from http://www.emc.com/products/detail/software/networker.htm

More on Clustered NAS

Clustered NAS is gaining in popularity, and makes the administration of multiple NAS systems streamlined and easier. The difference between NAS and clustered NAS is that a clustered NAS appears to the administrator as one mount point. Previously with traditional NAS, one would have to administer an additional data mover for each NAS added – creating more workload. Imagine having at least 10 data movers. Clustered NAS is often used in very large environments where user demand is high, such as in the “entertainment music industry… uses clustered NAS quite a bit because it allows you to share workflows” (Staimer, 2009). Clustered NAS is primarily targeted at unstructured storage such as documents, music files, and presentation material in contrast to structured data such as databases and application data. According to Staimer, clustered NAS could grow substantially within cloud computing. Clustered file systems enable modular (pay as you grow) storage growth… and scaling in storage capacity (Schultz, 2008). As for cloud computing, clustered NAS and cloud computing “seem tailor made for each other because cloud=based services have the need for massive scaling and moderate performance while being very cost effective: (Crump, 2008). Cloud computing, a distributed nature, fits clustered NAS very well due to the replication nature within each technology. Traditional NAS is usually not replicated automatically (it is an add-on), but clustered NAS is – the answer to cloud computing storage demands for dynamic storage growth.

References:

Staimer, M. (2009). Using clustered network-attached storage (NAS) to manage unstructured data. SearchsmbStorage. Retrieved June 28, 2009 from http://searchsmbstorage.techtarget.com/generic/0,295582,sid188_gci1360165,00.html

Schultz, G. (2008). Clustered NAS gaining in popularity. SearchStorage.com. Retrieved Jun 28, 2009 from http://searchstorage.techtarget.com/tip/0,289483,sid5_gci1301588,00.html

Crump, G. (2008). Clustered NAS in the Cloud. Information Week. Re Retrieved Jun 28, 2009 from http://www.informationweek.com/blog/main/archives/2008/10/clustered_nas_i.html;jsessionid=1CFK3DLDN20MUQSNDLOSKH0CJUNN2JVN