Troubleshooting Vertica backup on NFS

At my company we recently installed a new 3-node Vertica cluster, but when testing backup/restore procedures we encountered some issues.

The Environment

Our Vertica cluster consists of three identical physical hosts running Vertica 8.1 on CentOS 7.3 and connected through 10 Gbit/s Ethernet. Each host has two network interfaces, one for private interconnect and one for public connectivity.

The backup target directory is an NFS mount point which is shared by an external host, an EMC Data Domain server which provides redundancy and data deduplication.

The NFS share was initially mounted with all the default options.

First issue: NFS locking

The first issue we encountered was a locking error when running the backup task of Vertica’s vbr.py utility:

$ /opt/vertica/bin/vbr.py -t backup -c /home/dbadmin/vertica_backup/mybackup.ini

Error: Error locking backup location. Another vbr task is currently running: unknown.
Backup FAILED.

Backup was instead successful when the target directory was on the local file system, so we immediately suspected an NFS issue.

According to EMC Support Solution 304322 “NFS Best Practices for Data Domain and client OS”, Data Domain does not support NFS locking so the nolock keyword must be added to the NFS mount option on the clients.

Thus, I modified the /etc/fstab entry on the three Vertica nodes to include all the options recommended by the EMC note:

[dbadmin@vertica01 ~]$ tail -n1 /etc/fstab
datadomain.example.com:/data/col1/vertica /media/backup nfs hard,intr,nolock,nfsvers=3,tcp,timeo=1200,rsize=1048600,wsize=1048600,bg

After remounting the NFS directory on all three nodes, the backup was successful.

Increasing the verbosity of vbr‘s log files can help with troubleshooting. This can be accomplished by adding the --debug 3 parameter to the vbr invocation and will generate additional logging under the /tmp/vbr directory.

Second issue: SSH concurrency

After successfully completing the backup, I wanted to ensure their integrity before testing a full restore.

However, while both the listbackup and quick-check tasks were successful, the full-check task failed:

[dbadmin@vertica01 ~]$ /opt/vertica/bin/vbr.py -t full-check -c /home/dbadmin/vertica_backup/mybackup.ini
Checking backup consistency.
List all snapshots in backup location:
Snapshot name and restore point: mybackup_20170524_140413, nodes:['v_example0001', 'v_example0002', 'v_example0003'].
Error: Error accessing remote storage: failed to get remote files: ssh_exchange_identification: Connection closed by remote host
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: unexplained error (code 255) at io.c(601) [Receiver=3.0.7]
: returncode=255
Full-check FAILED.

The solution was provided by HPE Vertica Support and consists of changing the MaxStartups configuration parameter of the SSH daemon which specifies the maximum number of concurrent unauthenticated connections allowed before dropping them:

[root@vertica01 ~]# grep MaxStartups /etc/ssh/sshd_config
MaxStartups 50
[root@vertica01 ~]# systemctl reload sshd

After this configuration change, both the full-check and restore tasks were successful.

References

Advertisements

Email notification for failed OBIEE agents

At my company we rely heavily on OBIEE agents (formerly known as iBots) to deliver reports to users via email.

Sometimes an agent job can fail, so I wanted to be notified via email when this happens.

You can configure the OBIEE scheduler to write log files by setting these two parameters in the instanceconfig.xml file which resides in the $ORACLE_INSTANCE/config/OracleBISchedulerComponent/coreapplication_obisch1 directory:

<Debug>true</Debug>
<KeepErrorLogFiles>true</KeepErrorLogFiles>

The OBIEE scheduler will now generate a .Log file for successful agent executions and an .err file for unsuccessful ones in the $ORACLE_INSTANCE/diagnostics/logs/OracleBISchedulerComponent/coreapplication_obisch1 directory.

We can now use the find command to locate all the latest .err files using a simple trick.

We must first create an empty file called sentinel.txt in the $ORACLE_INSTANCE/diagnostics/logs/OracleBISchedulerComponent/coreapplication_obisch1 directory:

$ touch $ORACLE_INSTANCE/diagnostics/logs/OracleBISchedulerComponent/coreapplication_obisch1/sentinel.txt

We will periodically touch this file to update its timestamp, so that we can use the -newer option of the find command to obtain a list of the newest error log files.

We will then deliver the content of these logs via email using a shell script which leverages the mail command provided by the mailx package:

#!/bin/bash
set -eu -o pipefail

agent_log_dir="/u01/app/obiee/instances/instance1/diagnostics/logs/OracleBISchedulerComponent/coreapplication_obisch1/"
sentinel_file="${agent_log_dir}/sentinel.txt"

new_errors() {
        find "${agent_log_dir}" -type f -name "Agent*err" -newer "${sentinel_file}"
}

for e in $(new_errors); do
        sed 's/\r//' "${e}" | mail -s "OBIEE Agent Error $(basename ${e})" yourname@example.com
done

touch "${sentinel_file}"

This script can then be scheduled via the cron daemon so that it runs periodically on the server.

Be aware that with the Debug option enabled the server will accumulate a lot of log files on its file system. You can schedule another shell script to remove log files older than N days, such as:

#!/bin/bash
set -eu -o pipefail

agent_log_dir="/u01/app/obiee/instances/instance1/diagnostics/logs/OracleBISchedulerComponent/coreapplication_obisch1/"

find "${agent_log_dir}" -type f -name "Agent*" -mtime +35 -delete

Note: this has been tested on OBIEE version 11.1.1.9.0 running on Oracle Linux Server release 6.8.

References

OBIEE docs: What Additional Scheduler Configuration Settings Affect Agents?
man find: Section 2.3.2 Comparing Timestamps
man mail: mailx – send and receive Internet mail
man find: Section 3.4 Delete Files

Reporting on ODI Load Plans Executions with OBIEE

In this post I will describe a quick-and-dirty way of building reports on the ODI Load Plans Executions using Oracle Business Intelligence (OBIEE).

Although it’s probably not considered best practice, OBIEE allows you to quickly build a report upon a physical SQL statement by navigating to New > Analysis > Create Direct Database Query.

Enter the name of your Connection Pool and then paste this code into the SQL Statement textarea (replace ODI_REPO_SCHEMA with the schema name of your ODI Repository):

select start_date,
       load_plan_name,
       context_code,
       end_date,
       substr(numtodsinterval(duration, 'SECOND'), 12, 8) as duration,
       case status
          when 'W' then '<span style="color:red">Waiting</span>'
          when 'R' then '<span style="color:red">Running</span>'
          when 'E' then '<span style="color:red">Error</span>'
          when 'D' then '<span style="color:green">Done</span>'
          when 'M' then '<span style="color:red">Warning</span>'
          when 'A' then 'Already done in a previous run'
       end status_msg
  from ODI_REPO_SCHEMA.snp_lpi_run
 where start_date >= trunc(sysdate)
 order by start_date

Once you click on the Validate SQL and Retrieve Columns button, the Result Columns will appear at the bottom of the page:

odi_obiee_report

Now you need to tell OBIEE to treat the STATUS_MSG column as HTML code in order to display the message using red or green color:

odi_obiee_format_html

Once you click on the Results tab you should see a report with this layout:

odi_obiee_result

Now you can easily schedule the delivery of this report via email through an Agent (iBot).

Note: I have tested this on ODI version 12.1.3 and OBIEE version 11.1.1.9.0.

References

Oracle Data Integrator 11g and 12c Repository Description (Oracle Support Doc ID 1903225.1).

BISQLGroupProvider: Connection pool not usable

It’s possible to configure Oracle Business Intelligence to perform user authentication using an LDAP Identity Store, such as Microsoft Active Directory, while storing group information in database tables.

The setup of this configuration for OBIEE 11.1.1.9, which is mostly performed inside Oracle WebLogic Server Administration Console, is explained in the official documentation in the section Configuring LDAP as the Authentication Provider and Storing Groups in a Database.

Since the process is quite complicated, however, it’s easy to forget something and end up with a broken configuration.

In my case, after performing the configuration and restarting the whole OBIEE stack, I was unable to list groups.

Looking at the AdminServer.log file, I found the following Java Exception being logged over and over:

<BEA-240003> <Console encountered the following error oracle.bi.security.provider.wls.authentication.dbms.SQLGroupProviderDelegateException: Error listing groups:
[...]
Caused by: java.sql.SQLException: Connection pool not usable.

After double-checking the JDBC Data Source configuration and the SQL statements of the SQLGroupProvider, I finally managed to identify the problem: I had deployed the BIDatabaseGroupDS data source to the bi_server1 server, but not to the AdminServer:

BIDatabaseGroupDS

Using English locale in Oracle tools

As a developer, whenever I have a choice I install my OS with English localization, and even go as far as using a U.S. keyboard since typing curly brackets and back-ticks with the Italian keyboard layout is not fun.

Sometimes, however, you have no such luxury, for example with a company-issued notebook.

An irritating consequence of using a non-English locale is that some programs try to be clever and automatically display their user interface in your language. Although I can see the benefit for non-advanced users, this is something I personally dislike, since it makes it more difficult to search for documentation or google error messages.

In this post I will try to document in a single place how to force the user interface to appear in English in some Oracle tools I use on a daily basis.

Oracle BI Administration Tool (OBIEE 11.1)

As documented in Modifying the Language of the User Interface for the Administration Tool, create the following environment variable:

ORACLE_BI_LANG=en

Oracle Data Integrator (ODI 12.1)

As documented in MOS Note How To Set Up The GUI Runtime Language In ODI Studio (Doc ID 1271743.1), add the following lines in the file odi/studio/bin/odi.conf:

AddVMOption -Duser.language=en
AddVMOption -Duser.region=US

Oracle Warehouse Builder (OWB 11.2)

As documented in MOS Note How To Change the GUI Language in OWB 11.2 (Doc ID 1269876.1), add the following line in the file OWB_HOME/ide/bin/ide.conf:

AddVMOption -Duser.language=en

Oracle SQL Developer

As documented in Jeff Smith‘s blog post Oracle SQL Developer Around the World, add the following line in the file sqldeveloper/bin/sqldeveloper.conf:

AddVMOption -Duser.language=en