Today’s data centers

Being a field engineer once again after a 13 year hiatus from field service, I see a lot of new things in data centers (as well as new technology of course). The data centers now are much more segregated, for lack of a better word – everything is now in cages.

Now when I visit a data center to change a disk or service a controller, I am guided in; and the person escorting me uses biometric authentication such as handprint or retinal scanning to gain access. Most of the data centers I travel to are cloud hosting providers or DR sites – so when I go inside there are now cages that segregate customer’s server equipment. Those cages are under lock and key.

And, of course you are always on camera…

Virtual machine drops ISCSI drives during Vmotion

celerra 1Recently during maintenance downtime early one Sunday morning, our VMware administrator was v-motioning numerous VM’s to patch the host environment blades on the Cisco UCS. One of the VM’s was a Windows 2008 R2 server that contained over 10 iSCSI connections to targets residing on an EMC Celerra NS-40G NAS. All migrations were going well until this particular VM was moved: all of the iSCSI drives disappeared from the Windows OS. A vMotion process is a transparent process and this should not have happened, but this time it had a negative effect on the VM: an unexpected glitch. This happens time to time. The iSCSI initiator in the Windows Server OS still registered the connections as connected and online, but the OS disk management would not see the drives.

celerra iSCSI targets

I disconnected and then re-connected the targets in the iSCSI client, but the Windows OS would still not see the drives. I restarted both the iSCSI initiator and Server service in the Windows OS with the same result. Re-scanning for storage in Windows disk management did not help. EMC tech support ran a check on the NAS end and came up with nothing. Eventually, I went into Celerra Manager and deleted the LUN masking for the target and then re-added it. The Windows Server OS was then able to see the drives. I was hesitant to do his at first, as I was unsure what effect it would have on the drive letter assignments on the server: it had no effect on that – all drives were reestablished as the previously were.

celerra iSCSI target LUN masking

Importance of disk offset

SAN Guy at CLARiiON BLOGS has a good article that explains disk alignment. A very important subject regarding performance, this issue has recently come up at my data center: a EHR application is experiencing performance issues due to fragmentation (of course), and improper disk alignment on the LUN. Our DBA just ran a SQL 2005 best practices on the database that resides on this particular LUN, and it spit out the recomendation of a 64k offset. SAN Guy taught me a few things on this important subject.

An overview of MAID

ubuntu-serversSGI, a technical computing manufacturer, produces a line of storage systems that incorporate green technologies. The COPAN line of storage products use the MAID storage methodology that presents the ability of powering down unused disks in the array while not in use, conserving power as a result. This is managed via internal “Power Managed RAID software” (Sgi, 2010). MAID, the acronym for Massive Array of Idle Disks, is suited for the environment that consists of long-term storage that requires write-once, read-occasionally data (WORO). MAID can be used in disk (virtual) libraries, also known as EDL (Enterprise Disk Library). This is energy efficient storage. SGI’s COPAN solution could be some tough competition against the Centera line from EMC. As of the time of this writing, I have not yet seen MAID technology on the archive systems from EMC.

Massive Array of Idle Disks (MAID) consists of a large disk group, consisting of hundreds or thousands of disks configured into RAID groups. Through internal power management code, only drives that are needed are activated at a time. This reduces drive wear and power consumption.

Reference:

Sgi. MAID vs. Enterprise MAID. (2010). Retreived November 2, 2010 from http://www.sgi.com/products/storage/maid/what.html

Tivoli fix for "number of unavailable volumes too high" issue

tsm small I get this error once in a while on my TSM 5.5 server:

“The number of unavailable volumes is too high Condition (3 > 2) Recommendation: Issue: Q LIBVOL F=D at a TSM command prompt for details.”

Resolution:

q vol * acc=unava for the list of the unavailable volumes… if you want to update their access status to readw;
upd vol volumename acc=readw for each unavailable volume or
upd vol * whereacc=unava acc=readw for all

If those 2 volumes are to an offsite location or vault then:
upd vol xxxxx access=offsite

See ADSM.org for more info.

Thesis research: SSD vs. FC drive benchark tests – part I

typewriterI am writing my graduate thesis on the subject of Solid State Drive (SSD). By the way, D stands for DRIVE, not DISK, as SSD does not use disk. Now, with that out of the way…

I have been benchmarking a new SSD array that I have added to my companies SAN: an EMC CLARiiON Cx4-480 system running on 4Gb/s fiber. It will be 8Gb/s soon, but we are waiting on the NAS code (an EMC NS-40G) to catch up so it will support 8Gb/s: the firmware on the NAS only supports up to 4G currently. The SAN is held together with two Brocade 4900 FC switches.

About the disks that I will be testing and comparing:

Disks used: (5) EMC 70GB SSD and (5) 300GB FC disks.

SSD:

66.639 raw capacity -FC SSD – Manufacturer: STEC – Model: ZIV2A074 CLAR72 – Serial: STM0000E9CFD – 4Gbps

FC:

268.403 raw capacity – FC – Manufacturer: SEAGATE – Model: STE30065 CLAR300 – Serial: 3SJ09XWW – 4Gbps

  • Created RAID5 (RAID group 100) on five SSD model ZIV2A074, 66.639GB each.
  • Creating RAID5 (RG 101) on five 300GB FC disks: Seagate 15K.7 ST3300657FC
  • LUN 104 is assigned drive letter W: (disk 3) (RG100) and named “SSD
  • LUN 108 is assigned drive letter X: (disk 4) (RG101) and named “FC

The test server was installed and set up with one dual-port 4Gb/s HBA. Windows Server OS Standard with 1GB RAM.

SAN Management: EMC Navisphere 6.28.21.

Network: 4Gb/s fiber. 2-Brocade 4900B FC switches. Host HBA: Emulex LightPulse FC2243.

Host connection via EMC PowerPath v5.2

Test I/O is generated by Microsoft SQLIOSim; I/O generation utility to similate I/O patterns found within versions of Microsoft SQL server. The versions simulated are Microsoft SQL Server 2005, SQL Server 2000, and Server 7.0. Brent Ozar, a SQL expert, has a good video on using SQLIO on his web site at brentozar.com. I have learned some things from him and am using the tips on SQLIO for my benchmarking.

The monitoring will be done with EMC Navisphere Analyzer and SUN StorageTek Workload Analysis Tool (SWAT).

Here is a preliminary test on SSD vs. FC data rates using SQLIOSim to generate I/O and SWAT to record the results:

SSD:

SSD performance survey data rate

FC:

FC performance survey data rate

So far there is not much of a difference. The Fiber Channel drives are keeping up with the SSD. Of course, this is a preliminary test and other tests at this time are giving similar results. I continue to plan my testing methodology.

Measuring disk performance

Here is a video on SearchDataCenter that I found that explains that noise can effect disk vibration:

About EMC Celerra NAS checkpoints

Here is a good intro on EMC Celerra NAS checkpoint technology. Checkpoints, also known as snapshots, are point-in-time images of a file system. These can be used for a quick system recovery in the event of file system corruption or loss:

Citrix or VPN?

This is a quick thought on the Citrix/VPN comparison question…
plugs
I would use a product such as Citrix in which an end user would use a secured browser session using SSL. VPN clients are still in use for remote encrypted access to the organization, but Citrix-type solutions are becoming more popular: this is because there is not a need to install a VPN client on the remote machine – this can reduce risk of vulnerability due to a mis-configuration of the VPN software on the client end. Besides ssl, digital certificates are also to be used with browser-based access to ensure authenticity of the target site.

In addition to the Citrix-type http/ssh technology, the remote devices would have encryption enabled on storage devices to protect data that is to be stored or transferred. A portable encryption device such as a handheld USB device that encrypts data and communications would be ideal.

If only VPN were to be used, the VPN clients would have split tunneling disabled to prevent any communications other than the encrypted connection to the organization’s intranet. If split tunneling were enabled, vulnerability would manifest, as a second channel would be opened to the outside internet. This would produce an “open hole” to the secure encrypted channel. In addition to the VPN solution, SecID token authentication would bring another layer of security to remote access.

A brief overview of Oracle administration

network1

ABSTRACT

The DBA is the administrator who plans, designs, implements, tests, operates, and maintains databases for an organization. The DBA wears many hats: there are numerous types of DBAs that contribute to the overall database functionality. This paper will examine the most common types of database administrators and the duties of each. Additionally, we will examine the database monitoring tools and Optimal Flexible Architecture, and how the DBA can administer security, performance tuning, and backup and recovery of the database.

Different database administrator types

The production database administrator works with other information systems administrators and specializes in the creation and management of database tables, backup and recovery, security, and performance of databases. Every day database administrative tasks are the duties of this position; and duties include monitoring database and ensuring availability, CPU, memory, utilization, and system I/O among numerous other things. Security is a large part of the DBA repertoire of which user accounts, roles, and profiles are a large part. Table space creation and control file backup are an important task that the production database administrator works with to ensure quick recovery from any corruption or data loss within the database.

Interaction with the analysts (programmers) would involve backup planning, as many backups incorrectly timed can interfere with database operation. This is because locked files can become corrupted if the backup runs at a time in which the file is written to. Files that are being written to are locked of course, but if a backup application is trying to copy that file at that time, errors within the backup logs can result: it is best to schedule backups at a time of reduced functionality.

The development DBA is usually an analyst who spends a lot of time programming and configuring applications and interfaces that connect to the database. Supporting the application development life cycle, this type of DBA often builds the database environment that is not yet operational: thus no immediate impact to business. The DBA follows project schedules and Gantt charts frequently and works closely with management and programmers to ensure support of business applications via the database structures. This includes working with the “application team to create and maintain effective database-coupled application logic [which includes] stored procedures, triggers, and user-defined functions (UDFs) (Mullins, 2003).

Patching is a constant in any information system; and patches must be test within a development or test system before being integrated into production. Therefore thorough testing of patches must be undertaken to examine the effects that this would have on the production system.

Skill in normalization is paramount in this position. Following the Database development Life Cycle (DBLC), similar to the systems development life cycle, this position works with management and top analysts in planning the database to realize business rules and goals.

OFA Standard

Every database should follow this standard. The Optimal Flexible Architecture standard is a set of guidelines that defines file naming and configuration that will ensure more structured Oracle installations. It also provides standards that can improve system performance by “spreading I/O functions across separate devices by separating data from software” (Powell & McCullough-Dieter, 2007). Bottlenecks can be avoided by utilizing OFA, as this will improve performance by sorting elements into distinctive directories that are placed within separate devices. In short, OFA is a more logical way of file and file indexing methodology that enhances system I/O.

Oracle OFA structure is installed into the directory ORACLE_BASE. On Windows systems, it follows the convention – c:\oracle\product\11.x. Within the UNIX/Linux architecture, we have /app/oracle/product/11.x. The database file architecture then includes within the base:

Admin: contains ini files and logs

db_1: the main Oracle database installation

client_1: client installation

oradata/<db name>: datafiles for each database, control and redo logs.

flash_recovery_area: backup and recovery and archive logs.

Each database function has its own area, or location of operation and “is stored separately” (Powell & McCullough-Dieter, 2007) from other types of system files.

All Oracle binaries are stored under the directory structure known as ORACLE_HOME. Each database will have its own ORACLE_HOME format; i.e.: /app/oracle/product/11.x/db1 (UNIX OS) or c:\oracle\product\11.x\db1. Within the OFA naming standard of Oracle, there are three important files located within the ORACLE_BASE/Oradata directory:

  • Control file (.ctl) – associated with only one database, this administrative file contains the database name and unique identifier (DBID), creation timestamp, data file information and redo log files, tablespace information, and RMAN backup information.
  • Redo log file (.log) – The database has two or more of these that consists of redo entries, or records, that can be used to restore recent changes to database data if needed. These redo log files are known collectively as the redo log.
  • Datafile (.dbf) – Belonging to each tablespace within a database, the datafile contains tables and indexes for the database.

As we have seen, Oracle is a very organized system that segregates its various elements to increase I/O and performance as well as recoverability.

Database security and auditing

Besides regular user permissions who can access the database with fundamental read/write permissions, there are two administrative privileges within the Oracle database: SYSDBA and SYSOPER. These permissions are high level operation levels that allow one to start up and shut down the database, and create database entities. SYSDBA is for fully empowered database administration, allowing all database permissions and access. The SYSOPER permission also has all administrative permissions, with the exception that it does not permit the ability to look at user data.

System auditing is used for supervising access, updates, and deletions. Oracle has three types of auditing: Statement auditing that requires the AUDIT SYSTEM privilege can be used to monitor changes to a table by a certain user account. This can be done within this context: AUDIT UPDATE TABLE by <user name>;. Privilege auditing presents the ability to monitor track table creation. This can be used to monitor any activity by any user. Object auditing can be used to monitor a certain object. This can be used to monitor queries to a particular table. The syntax for this is AUDIT SELECT ON <TableName>;.

Database monitoring

Monitoring of the database can be done via alerts. These alerts notify the administrator when a threshold has been reached. Alerts can be set to perform actions such as running a script when a threshold has been reached. An example of this is with a script that shrinks tablespace objects whenever a usage alert has been triggered.

Another means of database monitoring is performance self-diagnostics and automatic database diagnostics monitor (ADDM). Oracle “diagnose[s] its own performance” (Oracle, 2009) and can decide how to resolve identified anomalies. Snapshots are a part of ADDM and are used for performance comparison to other time periods, and the snapshots are stored up to 8 days before being purged to free space for newer snapshots.

Monitoring general database state and workload is done by the administrator via the database administration home page on the local system. CPU and memory utilization are presented, and all tools are available to shutdown, edit users, monitor queries, configure flashback, monitor workload, SQL response time, and much more.

Managing alerts includes viewing thresholds, clearing alerts, and “setting up direct alert notification” (Oracle, 2009).

Performance tuning

Part of proactive maintenance, performance tuning is the result of good monitoring practice. From the prediction and tracking of performance problems via the monitoring tools within the database control area, tuning the database begins. The monitors are also advisors: ADDM as previously mentioned, is an advisor as well as a monitor. Other advisors within Advisor Central within 10 g Database Control are SQL Tuning Advisor, SQL Access Advisor, Memory Advisor, MTTR Advisor, Segment Advisor, and Undo Management.

Backup and recovery

There are a few types of database backups in Oracle. Hot backup is performed when the database is in full operation, containing open files. Snapshots are used to backup the database, or smaller parts of the database one file at a time. This way, files can be restored into a running database individually or within a group. Backup tools for this include export and import data pump utilities, tablespace copies, RMAN (Recovery Manager), and Oracle Enterprise Manager and the Database Control (Powell & McCullough-Dieter, 2007). One effective means of continuity within an unstable environment is using the standby database: a backup database that is identical to the production database that can be failed over in seconds to provide continuous availability. This is also known as a physical standby database.

The most common tools of backup and recovery measures are checkpoints, redo logs, archive logs, and flashback. According to Powell and Dieter, a combination of archive logs and redo logs provides the ability of recovery from a point in time as long as the archive logs have been retained in the database structure (2007). Flashback is flash recovery that contains flashback data for a specific time of operation. The difference between physical and flashback recovery is that physical recovery can entail the restore of the entire database, where flashback includes the recovery of specific parts such as a table or a table’s entities.

Conclusion

Research skills are important for a successful database administrator; as the occupation includes examining anomalies, bugs, patch research, and normalization development. Performance monitoring and tuning are a daily task for the administrative DBA, and both performance and developmental DBAs should be well versed in the process of database normalization, standards, monitoring, and backup and recovery tools and methods.

References

McLean, C. (2006). Database Administrators: Multitasking for Advancement. Certification Magazine, 8(1), 30-40. Retrieved from Research Starters – Business database.

Mullins, C. (2003). DBAs Need Different Skills in Development and Production. The DBA Corner. Database Trends and Applications. Retrieved February 5, 2010 from http://www.craigsmullins.com/dbta_023.htm

Powell, G, & McCullough-Dieter, C. (2007). Oracle 10g Database Administrator: Implementation & Administration. Boston: Thomson Course Technology.

Oracle 11g. (2009). Monitoring and Tuning the Database. Oracle Database 2 Day DBA. 11g Release 2 (11.2). Retrieved February 13, 2010 from http://download.oracle.com/docs/cd/E11882_01/server.112/e10897/montune.htm#CACCIHAB