System Administration
Friday, August 29, 2014
Thursday, November 22, 2012
Configuring an NTP server without internet access (locally)
Is the first time I had to figure out how to configure an NTP (Network Time Protocol) server without internet access... most of the time you just configure your /etc/ntp.conf file to point to a public NTP server and your internal servers to point to this one acting as your server and you are done. Nothing interesting there...
First make sure you have ntp package installed in all your servers
service iptables stop
service ip6tables stop
chkconfig iptables off
chkconfig ip6tables off
Or add the required rules to allow port 123 between your servers
Backup your current configuration file in all the servers... (just in case)
cp /etc/ntp.conf /etc/ntp.conf.orig
vi /etc/ntp.conf
server 127.127.1.0
fudge 127.127.1.0 stratum 10
Notice that to access its own system clock, also called the local clock , NTP uses the pseudo IP address 127.127.1.0. This IP address must not be mixed up with 127.0.0.1, which is the IP of the localhost or loopback
Here you may want to restrict the IPs that are allowed but since this is assuming you are on a local (controlled) environment with no internet access then is not absolutely necessary
Restart the ntpd server
/etc/init.d/ntpd restart
or
service ntpd restart
On the client side you configure as follows...
vi /etc/ntp.conf
server 12.139.41.136
Where the server IP is the IP of your NTP server
Restart the ntpd server on the clients too
/etc/init.d/ntpd restart
or
service ntpd restart
To Verify your network mask you can look at your network script
cat /etc/sysconfig/network-scripts/ifcfg-eth0
Ensure NTP will start at boot in all the servers
Synchronize your local time with the server (do it 3 times):
ntpdate -u [your ntp server IP]
Determining if the NTP is synchronized properly
ntpq –p
One of the problems that I found was with the Stratum Value as you can see in the configuration file we set it to 10 you can verify the current value on the ntp server with the following command
ntpq -c rv
Now... what does that mean...
First make sure you have ntp package installed in all your servers
rpm -qa |grep ntp-4
Make sure your firewall is stopped
service iptables stop
service ip6tables stop
chkconfig iptables off
chkconfig ip6tables off
Or add the required rules to allow port 123 between your servers
iptables -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 123 -j ACCEPT
iptables -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 123 -j ACCEPT
service iptables save
service iptables restart
Backup your current configuration file in all the servers... (just in case)
cp /etc/ntp.conf /etc/ntp.conf.orig
Basically you have to configure your server pointing to itself so it will be in sync... something like this...
vi /etc/ntp.conf
server 127.127.1.0
fudge 127.127.1.0 stratum 10
Notice that to access its own system clock, also called the local clock , NTP uses the pseudo IP address 127.127.1.0. This IP address must not be mixed up with 127.0.0.1, which is the IP of the localhost or loopback
Here you may want to restrict the IPs that are allowed but since this is assuming you are on a local (controlled) environment with no internet access then is not absolutely necessary
Restart the ntpd server
/etc/init.d/ntpd restart
or
service ntpd restart
On the client side you configure as follows...
vi /etc/ntp.conf
server 12.139.41.136
Where the server IP is the IP of your NTP server
Restart the ntpd server on the clients too
/etc/init.d/ntpd restart
or
service ntpd restart
To Verify your network mask you can look at your network script
cat /etc/sysconfig/network-scripts/ifcfg-eth0
Ensure NTP will start at boot in all the servers
chkconfig ntpd on
Synchronize your local time with the server (do it 3 times):
ntpdate -u [your ntp server IP]
Determining if the NTP is synchronized properly
ntpq –p
One of the problems that I found was with the Stratum Value as you can see in the configuration file we set it to 10 you can verify the current value on the ntp server with the following command
ntpq -c rv
Now... what does that mean...
NTP increases the stratum for each level in
the hierarchy a NTP server pulling time from a "stratum 1" server
would advertise itself as "stratum 2" to its clients. A stratum value of "16" is reserved for unsynchronized servers
meaning that your internal NTP server thinks not to
have a reliable timesource in other words is not synchronizing to a higher-level
stratum server
Most of the time take like 15 minutes to lower the value... if you are at 16 you wont be able to sync the clients... Once dropped try again
If you need to do some debugging there look at the output of
Most of the time take like 15 minutes to lower the value... if you are at 16 you wont be able to sync the clients... Once dropped try again
If you need to do some debugging there look at the output of
ntpq peers
for clues for possible reasonsThursday, November 8, 2012
Portmir using screen
For those AIX lovers that have by destiny the need to play with Linux and cry because in Linux there is nothing like portmir... where there is a *like solution for this... screen... if you don´t know what screen is look at my previous post ... but here is how to configure it to share a session...
As root:
1. Set the screen binary setuid root.
sudo chmod +s /usr/bin/screen
sudo chmod 755 /var/run/screen
2. Start screen
screen -S portmir
3. Verify the username with w
4. Allow multiuser access in the screen session
CTRL-A
:multiuser on
5. Grant permission to the remote user to access the session
CTRL-A
:acladd username
6. The remote user can now connect to the session using
screen -x root/portmir
Screen
Screen as you may already know is a tool to handle multiple sessions in one window... very useful to remove load from your personal laptop.. avoid having to connect everyday to all those sessions... and having the capability to left processes running =D ...
well on Linux most of the time is already installed on AIX is never so install it... but there is a little issue on the code that need to be fixed to successfully install the tool on AIX...
Download the tool from...
ftp ftp.software.ibm.com
anonymous
cd aix/freeSoftware/aixtoolbox/RPMS/ppc/wget
bin
prompt
get wget-1.9-2.aix5.1.ppc.rpm
quit
rpm -hUv wget-1.9-2.aix5.1.ppc.rpm
wget ftp://ftp.software.ibm.com/aix/freeSoftware/aixtoolbox/RPMS/ppc/screen/screen-3.9.10-2.aix4.3.ppc.rpm
rpm -hUv screen-3.9.10-2.aix4.3.ppc.rpm
wget ftp://ftp.gnu.org/gnu/screen/screen-4.0.3.tar.gz
gunzip screen-4.0.3.tar.gz
tar -xvf screen-4.0.3.tar
cd screen-4.0.3
vi misc.c
changed following part in order to get over it:
,----[ misc.c - original part ]
| #else /* USESETENV */
| # if defined(linux) || defined(__convex__) || (BSD >= 199103)
| )
| setenv(var, value, 1);
| # else
| setenv(var, value);
| # endif /* linux || convex || BSD >= 199103 */
| #endif /* USESETENV */
| }
`----
Then I used the dirty hack by adding ", 1" to the second setenv-statement directly.
,----[ misc.c - altered part ]
| #else /* USESETENV */
| # if defined(linux) || defined(__convex__) || (BSD >= 199103) ||
defined(__aix__)
| setenv(var, value, 1);
| # else
| setenv(var, value, 1);
| # endif /* linux || convex || BSD >= 199103 */
| #endif /* USESETENV */
| }
`----
That "solved" the error-message above.
./configure
make
make install
Now that the tool is installed here some useful commands
Now to have a very nice .screenrc just copy paste the following and add your servers
that will create a nice screen and launch all the servers and if you have SSH trusted keys automatically connect to all of them ;) and look sort of like this
well on Linux most of the time is already installed on AIX is never so install it... but there is a little issue on the code that need to be fixed to successfully install the tool on AIX...
Download the tool from...
ftp ftp.software.ibm.com
anonymous
cd aix/freeSoftware/aixtoolbox/RPMS/ppc/wget
bin
prompt
get wget-1.9-2.aix5.1.ppc.rpm
quit
rpm -hUv wget-1.9-2.aix5.1.ppc.rpm
wget ftp://ftp.software.ibm.com/aix/freeSoftware/aixtoolbox/RPMS/ppc/screen/screen-3.9.10-2.aix4.3.ppc.rpm
rpm -hUv screen-3.9.10-2.aix4.3.ppc.rpm
wget ftp://ftp.gnu.org/gnu/screen/screen-4.0.3.tar.gz
gunzip screen-4.0.3.tar.gz
tar -xvf screen-4.0.3.tar
cd screen-4.0.3
vi misc.c
changed following part in order to get over it:
,----[ misc.c - original part ]
| #else /* USESETENV */
| # if defined(linux) || defined(__convex__) || (BSD >= 199103)
| )
| setenv(var, value, 1);
| # else
| setenv(var, value);
| # endif /* linux || convex || BSD >= 199103 */
| #endif /* USESETENV */
| }
`----
Then I used the dirty hack by adding ", 1" to the second setenv-statement directly.
,----[ misc.c - altered part ]
| #else /* USESETENV */
| # if defined(linux) || defined(__convex__) || (BSD >= 199103) ||
defined(__aix__)
| setenv(var, value, 1);
| # else
| setenv(var, value, 1);
| # endif /* linux || convex || BSD >= 199103 */
| #endif /* USESETENV */
| }
`----
That "solved" the error-message above.
./configure
make
make install
Now that the tool is installed here some useful commands
ctrl+a
|
Access to the screen command line...
|
p
|
previous
|
n
|
Next
|
0-9
|
Jump with ID
|
w
|
List open consoles
|
[
|
ScrollBack” (ESC to finish)
|
d
|
detached (will appear the word [detached])
|
r
|
retach
(If there are many open session will appear the list, just select the
ID to open Type "screen [-d] -r [pid.]tty.host" to resume one of them.”
|
X
|
Lock
|
:
|
set password
|
c
|
open new terminal
|
A
|
rename the current session
|
screen –x
|
share a screened session without detaching
|
k
|
kill session
|
\
|
Terminate session
|
:multiuser
|
enable multiuser mode
|
Now to have a very nice .screenrc just copy paste the following and add your servers
autodetach on # Autodetach session on hangup instead of terminating screen completely
startup_message off # Turn off the splash screen
defscrollback 30000 # Use a 30000-line scrollback buffer
scrollback 30000
termcapinfo xterm ti@:te@
vbell off # turn off visual bell
caption string "%?%F%{= Bk}%? %C%A %D %d-%m-%Y %{= kB} %t%= %?%F%{= Bk}%:%{= wk}%? %n "
hardstatus alwayslastline
#hardstatus
string '%{= kG}[ %{G}%H %{g}][%= %{=
kw}%?%-Lw%?%{r}(%{W}%n*%f%t%?(%u)%?%{r})%{w}%?%+Lw%?%?%= %{g}][%{B}
%d/%m %{W}%c %{g}]'
hardstatus string '%{= kG}[ %{R}%t %{g}]%= %{g}[%{B} %d/%m %{W}%c %{g}]'
screen -t prod-a ssh prod-a
screen -t prod-b ssh prod-b
screen -t prod-c ssh prod-c
screen -t prod-d ssh prod-d
that will create a nice screen and launch all the servers and if you have SSH trusted keys automatically connect to all of them ;) and look sort of like this
and if you hit [ ctrl+a " ] you will be able to select your server from there
A simple Throughtput test using SCP
In the past it was easy to execute a throughput test using ftp now on modern environment FTP or Telnet y most of the time not allowed so we have to do our best with SCP, SFTP and SSH for security reasons... well... having a good test on your environment from all servers to all servers and keeping that info handy can be userful when you are experiencing low performance on the network
Lets say you have servers A B and C you will execute your test as follows
A to B
A to C
B to C
that will cover pretty much all the possibilities and you can run B to A but since you already have A to B and they go thru the same wire ... what is the point?... your chooise
Step # 1: Create a large file (1 GB)
dd if=/dev/zero of=/tmp/big.file bs=1024M count=1
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 1.88378 seconds, 570 MB/s
Step # 2: Use scp to transfer file
scp -v /tmp/big.file user@remote.server.com:/tmp
Step # 3 From the end of the output capture only the required information and build your table
Source Destination MB/s Duration
A B 47.7 0.2 Seconds
Next time you have a problem... well run the test there and figure how slow you are comparing against your baseline =)
Lets say you have servers A B and C you will execute your test as follows
A to B
A to C
B to C
that will cover pretty much all the possibilities and you can run B to A but since you already have A to B and they go thru the same wire ... what is the point?... your chooise
Step # 1: Create a large file (1 GB)
dd if=/dev/zero of=/tmp/big.file bs=1024M count=1
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 1.88378 seconds, 570 MB/s
Step # 2: Use scp to transfer file
scp -v /tmp/big.file user@remote.server.com:/tmp
Step # 3 From the end of the output capture only the required information and build your table
Source Destination MB/s Duration
A B 47.7 0.2 Seconds
Next time you have a problem... well run the test there and figure how slow you are comparing against your baseline =)
ldapsearch over SSL
Recently I found with an issue with some ldap latency between a WAS server and an ldap farm having to do several hops to get there I needed to find out if the connection was working and also how long where they taking... running tcpdump or wireshark traces help but does not give you a real view of how ldap is working ... so I decided to configure ldapclient in this server and do some testing... and this might not work the same in all the environment but will be a good guide.
First install ldapclient in my case running on RHEL I also needed the openldap package to be installed once this is completed you are able to execute the ldapsearch command...
But that will be pretty much enough for a regular environment but in my case I had to go thru SSL using port 636 (secure) instead of 389 (insecure) so you have to do a modification to the /etc/openldap/ldap.conf file and add the following lines...
HOST
PORT 636
TLS_CACERT
TLS_REQCERT demand
Easy huh? now if you wonder how can you get the certificate to be used... well use this command
echo -n | openssl s_client -connect:636 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > ldapserver.pem
now how can I check what is being accepted for search in the ldap server... at least
ldapsearch -x -H ldaps:// -b "o=domain.com"
and you will receive a line sort of like this
uniquemember: uid=########,c=us,ou=ldapserver,o=domain.com
so now you can narrow your search as follows to look for us folks
ldapsearch -x -H ldaps:// -b "c=us,ou=ldapserver,o=domain.com"
and then you go to webpshere console and look for those fields that we can access as
and now you can look by mail, cn, and uid as follows
ldapsearch -x -H ldaps:// -b "c=us,ou=ldapserver,o=domain.com" "mail=name@domain.com"
Now to check the response times use the following...
while true
do
/usr/bin/time -f "\t%e" 2>> /tmp/ldapresponse.out ldapsearch -x -H ldaps:// -b "c=us,ou=ldapserver,o=domain.com" "mail=name@domain.com" > /dev/null
done
First install ldapclient in my case running on RHEL I also needed the openldap package to be installed once this is completed you are able to execute the ldapsearch command...
But that will be pretty much enough for a regular environment but in my case I had to go thru SSL using port 636 (secure) instead of 389 (insecure) so you have to do a modification to the /etc/openldap/ldap.conf file and add the following lines...
HOST
PORT 636
TLS_CACERT
TLS_REQCERT demand
Easy huh? now if you wonder how can you get the certificate to be used... well use this command
echo -n | openssl s_client -connect
now how can I check what is being accepted for search in the ldap server... at least
ldapsearch -x -H ldaps://
and you will receive a line sort of like this
uniquemember: uid=########,c=us,ou=ldapserver,o=domain.com
so now you can narrow your search as follows to look for us folks
ldapsearch -x -H ldaps://
and then you go to webpshere console and look for those fields that we can access as
and now you can look by mail, cn, and uid as follows
ldapsearch -x -H ldaps://
Now to check the response times use the following...
while true
do
/usr/bin/time -f "\t%e" 2>> /tmp/ldapresponse.out ldapsearch -x -H ldaps://
done
Thursday, October 4, 2012
Using alt_disk_copy
Introduction
Most system administrators have experienced the following scenario:
Introduction
Most system administrators have experienced the following scenario:
- A failed ML upgrade.
- It's getting to the end of the day.
- You cannot fix it.
- It's too late to get it resolved by third-party support.
- You need to back out.
This article focuses on a typical rootvg two-disk software mirror set-up. However, alt_disk_copy is not restricted to this two-disk set-up; the same principles apply to multiple software mirroring situations.
The alt_disk utilities consist of the following commands:
- alt_disk_copy performs disk cloning.
- alt_rootvg_op performs maintenance operations on the clone rootvg.
- alt_disk_mysysb performs a mksysb copy.
The filesets required for the alt commands are:
bos.alt_disk_install.boot_images bos.alt_disk_install.rte bos.msg.en_US.alt_disk_install.rte |
Overview information
Because the alt_disk_copy command takes a copy of the current running rootvg to another disk, be sure to have all the file systems mounted that you want cloned across. alt_disk_copy only copies the currently mounted file systems in rootvg. There is no need to stop processes to execute alt_disk_copy; however, this process can take some time, so it is best to do it at lunchtime or in the evening (remember it is taking a running copy). Once the copy has completed, you will be presented with two rootvg volume groups:
rootvg altinst_rootvg |
where altinst_rootvg is the cloned non-active/varied off rootvg. The cloned rootvg has all its logical volumes prefixed with the name 'alt'. The boot list is also changed to boot off altinst_rootvg. AIX likes to do things like this; it assumes you will want to boot off the cloned and not the real rootvg. If the system is now rebooted and when the system comes back up, the original rootvg will become:
old_rootvg |
The original altinst_rootvg becomes:
rootvg |
If you decide to reboot off the old_rootvg, when the system comes back up, the old_rootvg becomes:
rootvg |
The rootvg becomes:
altinst_rootvg |
Do not worry about the renaming of the original and cloned rootvg. I will demonstrate this shortly.
With a successful completion of an upgrade, the disk containing the cloned rootvg can then be destroyed using the alt_rootvg_op and mirrored back in. If the upgrade event has gone disastrously, there is no real problem--simply take a snapshot for third-party support, then boot off the good rootvg. For users to log in, it is business as normal.
When you get a response back from support on the fix, during off-line hours, simply reboot off the cloned rootvg and fix the issue. There is no need to go through the time-consuming tasks of re-applying the upgrade because you already have it on the cloned rootvg. Get the upgrade tested, and if it is all OK, destroy the cloned rootvg and mirror back in.
Do not use importvg or exportvg on the clone rootvg; use the alt commands instead.
With the cloned rootvg, you can mount the file systems by waking up the disk using alt_rootvg_op. Doing whatever works is required on the cloned file systems, and one would assume here to fix a patch of link, or gather information for third-party support, then put the disk back to sleep, which will also unmount the file systems.
Excluding directories when cloning
When cloning, you can exclude certain directories by creating the file: /etc/exclude.rootvg. The entries should start with the ^ /. characters. The '^' means to search for the string at the beginning of the line and the './' means relative to the current directory. You are advised to do this so alt_disk_copy does not misinterpret the command, as it uses grep to search for the string. So, make sure you provide the full pathname, prefixed with '^.' , for example, to exclude the following directories:
/home/reps /opt/installs |
I could insert into the /etc/exclude.rootvg file:
^./home/reps ^./opt/installs |
Make sure there are no empty lines after the last entry.
Let's get cloned!
Let's now go through a typical clone. Assume you have a software two-disk (hdisk0 and hdisk1) mirror of rootvg, and further assume that you are going to do a ML (or application upgrade, assuming it is installed in rootvg) upgrade on this system. I will demonstrate one way this can be done to clone the disk and after a successful upgrade will bring the disk back into rootvg and re-mirror. I will also demonstrate the actions you can take if the upgrade fails.
Pre-checks
Before unmirroring the rootvg, first take some time to ensure you are correctly mirrored and have no stale LV's, because if you do, the unmirrorvg will fail. Of course, you could always do a migratepv to move the missing LV's across if the unmirrorvg fails. A simple method to check that you are mirroring is to issue the command:
lsvg -l rootvg |
For each row of data output, check that the output of the PPs column is double that of the LPs column.
Another method to check to see if you are mirroring is to use: lspv -l
Next, issue the bosboot command. I personally always do this prior to either rebooting or disk operations involving rootvg; it is a good habit to have:
# bosboot -a bosboot: Boot image is 35803 512 byte blocks. |
A listing of the disks being used for this demonstration is as follows:
# lspv hdisk0 0041a97b0622ef7f rootvg active hdisk1 00452f0b2b1ec84c rootvg active |
Next, unmirror rootvg and take the disk that is going to be used for cloning out of rootvg. This demonstration uses hdisk1 to clone rootvg, so issue the unmirrorvg command:
# unmirrorvg rootvg hdisk1 0516-1246 rmlvcopy: If hd5 is the boot logical volume, please run 'chpv -c as root user to clear the boot record and avoid a potential boot off an old boot image that may reside on the disk from which this logical volume is moved/removed. 0516-1804 chvg: The quorum change takes effect immediately. 0516-1144 unmirrorvg: rootvg successfully unmirrored, user should perform bosboot of system to reinitialize boot records. Then, user must modify bootlist to just include: hdisk0. |
Next, take hdisk1 out of rootvg in readiness for the cloning:
# reducevg rootvg hdisk1 |
Confirm that the disk is now not assigned to any volume groups:
# lspv hdisk0 0041a97b0622ef7f rootvg active hdisk1 00452f0b2b1ec84c None |
Running alt_disk_copy
Now you are ready to issue the alt_disk_copy. Simply supply hdisk1 as a parameter to the command. The basic format is:
alt_disk_copy -d |
To use an exclude list, the basic format is:
alt_disk_copy -e /etc/exclude.rootvg -d |
The following output from the alt_disk_copy command has been truncated:
# alt_disk_copy -d hdisk1 Calling mkszfile to create new /image.data file. Checking disk sizes. Creating cloned rootvg volume group and associated logical volumes. Creating logical volume alt_hd5 Creating logical volume alt_hd6 Creating logical volume alt_hd8 Creating logical volume alt_hd4 Creating logical volume alt_hd2 Creating logical volume alt_hd9var Creating logical volume alt_hd3 Creating logical volume alt_hd1 Creating logical volume alt_hd10opt Creating /alt_inst/ file system. Creating /alt_inst/home file system. Creating /alt_inst/opt file system. Creating /alt_inst/tmp file system. …...... …...... for backup and restore into the alternate file system... Backing-up the rootvg files and restoring them to the alternate file system... Modifying ODM on cloned disk. Building boot image on cloned disk. forced unmount of /alt_inst/var forced unmount of /alt_inst/usr forced unmount of /alt_inst/tmp forced unmount of /alt_inst/opt forced unmount of /alt_inst/home ….. ….. Changing logical volume names in volume group descriptor area. Fixing LV control blocks... Fixing file system superblocks... Bootlist is set to the boot disk: hdisk1 |
At this stage, you now have a cloned rootvg called altinst_rootvg. Notice in the previous output alt_disk_copy has changed the bootlist to boot off the cloned rootvg, which is now hdisk1.
# lspv hdisk0 0041a97b0622ef7f rootvg active hdisk1 00452f0b2b1ec84c altinst_rootvg |
This can be confirmed by issuing the bootlist command:
# bootlist -m normal -o hdisk1 blv=hd5 |
At this point the ML upgrade can now be installed. After an ML upgrade you will need to reboot the system. For this demonstration, the ML upgrade will be installed on the real rootvg (that is hdisk0), so you need to change the bootlist now, because you want the system to come up with the new upgrade running.
# bootlist -m normal hdisk0 |
Confirm the change of the bootlist:
# bootlist -m normal -o hdisk0 blv=hd5 |
Next, install the ML upgrade, then reboot. After rebooting, the system presents the following rootvg and cloned rootvg. As can be seen, no root volume group has been renamed, because we booted off the real rootvg (hdisk0):
# lspv hdisk0 0041a97b0622ef7f rootvg active hdisk1 00452f0b2b1ec84c altinst_rootvg |
Next let's assume everything has gone OK on the upgrade and support users and the systems administrator has signed it off with no issues found. The alt_disk_copy can now be destroyed, and the disk brought back into rootvg for mirroring. Use the alt_rootvg_op command with the X parameter to destroy the cloned rootvg. The basic format is:
alt_rootvg_op -X < cloned rootvg to destroy> # alt_rootvg_op -X altinst_rootvg Bootlist is set to the boot disk: hdisk0 |
Next, extend rootvg to bring hdisk1, and then mirror up the disk:
# extendvg -f rootvg hdisk1 # mirrorvg rootvg hdisk1 0516-1804 chvg: The quorum change takes effect immediately. 0516-1126 mirrorvg: rootvg successfully mirrored, user should perform bosboot of system to initialize boot records. Then, user must modify bootlist to include: hdisk0 hdisk1. |
Change the bootlist to include both disks and run bosboot:
# bootlist -m normal -o hdisk0 hdisk1 hdisk0 blv=hd5 hdisk1 # bosboot -a bosboot: Boot image is 35803 512 byte blocks. # bootlist -m normal -o hdisk0 blv=hd5 hdisk1 blv=hd5 |
For this demonstration, that's it: mission accomplished. The pgrade is installed with no issues. The system is operational. That's pretty much how alt_disk_copy works if all goes OK. But what if the upgrade fails? What options do you have? Let's look at that next.
Recovery positions, please
Let's now assume you have just installed the ML upgrade and rebooted, and issues have been found with the operational running of AIX. Remember, you currently have the disks in the following state:
# lspv hdisk0 0041a97b0622ef7f rootvg active hdisk1 00452f0b2b1ec84c altinst_rootvg |
At this point, a snapshot should be taken of the running system, in readiness for third-party support, for the call that you will undoubtedly log. Taking stock of the current situation, you have:
- rootvg: with post-upgrade issues.
- altinst_rootvg : with good copy pre-upgrade.
To get back to the pre-upgrade, simply change the bootlist to boot off the (altinst_rootvg) hdisk1, then reboot. It's that simple:
# bootlist -m normal -o hdisk1 hdisk1 blv=hd5 # bootlist -m normal -o hdisk1 blv=hd5 # shutdown -Fr |
After the reboot, you will be presented with the following rootvg disks:
# lspv hdisk0 0041a97b0622ef7f old_rootvg hdisk1 00452f0b2b1ec84c rootvg active |
Next, issue a bosboot and confirm the bootlist:
# bosboot -a bosboot: Boot image is 35803 512 byte blocks. # bootlist -m normal -o hdisk1 blv=hd5 |
The system is now back to the pre-upgrade state.
Post upgrade fixing
At a convenient time schedule that is agreed-upon with the end users, and with information provided by third-party support, you can then boot off the ML failed upgraded disk (hdisk0) and apply a fix that might solve the issue, so change the bootlist to boot off (old_rootvg) hdisk0 and reboot:
# bootlist -m normal -o hdisk0 # shutdown -Fr |
After the reboot, in readiness to apply the fix, you will be presented with the following rootvg disks:
# lspv hdisk0 0041a97b0622ef7f rootvg active hdisk1 00452f0b2b1ec84c altinst_rootvg |
Next, apply the fix or instructions on how to fix it have been carried out, and assume the system is now operational again.
After the system has been tested and signed off bring in hdisk1, use the commands described earlier:
alt_rootvg_op -X altinst_rootvg |
Bootlist is set to the boot disk: hdisk0 # extendvg -f rootvg hdisk1 # mirrorvg rootvg hdisk1 bootlist -m normal -o hdisk0 hdisk1 hdisk0 blv=hd5 hdisk1 # bosboot -a bosboot: Boot image is 35803 512 byte blocks. # bootlist -m normal -o hdisk0 blv=hd5 hdisk1 blv=hd5 # lspv hdisk0 0041a97b0622ef7f rootvg active hdisk1 00452f0b2b1ec84c rootvg active |
Waking the disk up
Within a cloned rootvg environment, you can wake up the cloned rootvg to be active. All cloned file systems from the cloned rootvg will be mounted. It is quite useful because you have a good running system, but at the same time mount the file systems from the cloned rootvg for further investigation or file modification. When a cloned rootvg is woken up, it is renamed to:
altinst_rootvg |
Do not issue a reboot while the cloned rootvg filesystems are still mounted, because unexpected results can occur. You can also rename a cloned rootvg, which is useful when you have more than one cloned rootvg.
Assume you have the disks in the following state:
# lspv hdisk0 0041a97b0622ef7f old_rootvg hdisk1 00452f0b2b1ec84c rootvg active |
To wake up a disk, the basic format is:
alt_rootvg_op -W -d < hdisk> |
Let's now wake up old_rootvg (hdisk0):
# alt_rootvg_op -W -d hdisk0 Waking up old_rootvg volume group ... |
Checking the state of the disks, you can see the old_rootvg has been renamed to altinst_rootvg and is now active.
# lspv hdisk0 0041a97b0622ef7f altinst_rootvg active hdisk1 00452f0b2b1ec84c rootvg active |
The cloned file systems have been mounted, with the prefix of /alt_:
# df -m Filesystem MB blocks Free %Used Iused %Iused Mounted on /dev/hd4 128.00 102.31 21% 2659 11% / /dev/hd2 1968.00 111.64 95% 40407 58% /usr /dev/hd9var 112.00 77.82 31% 485 3% /var /dev/hd3 96.00 69.88 28% 330 3% /tmp /dev/hd1 208.00 118.27 44% 1987 7% /home /proc - - - - - /proc /dev/hd10opt 1712.00 1445.83 16% 6984 3% /opt /dev/alt_hd4 128.00 102.16 21% 2645 11% /alt_inst /dev/alt_hd1 208.00 33.64 84% 1987 21% /alt_inst/home /dev/alt_hd10opt 1712.00 1445.77 16% 6984 3% /alt_inst/opt /dev/alt_hd3 96.00 72.38 25% 335 2% /alt_inst/tmp /dev/alt_hd2 1968.00 100.32 95% 40407 59% /alt_inst/usr /dev/alt_hd9var 112.00 77.53 31% 477 3% /alt_inst/var |
At this point file modification or further investigation can be carried out on the cloned rootvg. Now you can access the cloned file systems. Once these tasks have been carried out, put the cloned rootvg to sleep and in the same operation issue a bosboot on that disk. The basic format of the command is:
alt_rootvg_op -S -t |
Let's now put the altinst_rootvg to sleep:
# alt_rootvg_op -S -t hdisk0 Putting volume group altinst_rootvg to sleep ... Building boot image on cloned disk. forced unmount of /alt_inst/var forced unmount of /alt_inst/usr forced unmount of /alt_inst/tmp forced unmount of /alt_inst/opt forced unmount of /alt_inst/home forced unmount of /alt_inst forced unmount of /alt_inst Fixing LV control blocks... Fixing file system superblocks... |
The current state of the disks is now:
# lspv hdisk0 0041a97b0622ef7f altinst_rootvg hdisk1 00452f0b2b1ec84c rootvg active |
From the above demonstration, you can see the cloned rootvg name stayed the same: altinst_rootvg.
It is sometimes good to go back to the original state of the disks to save confusion, especially if you have more than one cloned disk. So rename altinst_rootvg back to old_rootvg. The basic format is:
alt_rootvg_op -v |
So in this example, you would issue:
# alt_rootvg_op -v old_rootvg -d hdisk0 # lspv hdisk0 0041a97b0622ef7f old_rootvg hdisk1 00452f0b2b1ec84c rootvg active |
Of course, you could rename the cloned rootvg to something more meaningful, if so desired.
# alt_rootvg_op -v bad_rootvg -d hdisk0 bash-2.05a# lspv hdisk0 0041a97b0622ef7f bad_rootvg hdisk1 00452f0b2b1ec84c rootvg active |
You cannot rename a cloned rootvg to altinst_rootvg; it is a reserved name.
From this point, the system is now operational or not, depending on the success of the fix, using the commands described earlier.
If the fix worked on (old_rootvg) hdisk0, then run with the new ML version.
Confirm that the disk will boot off hdisk0:
# bootlist -m normal -o hdisk0 |
Reboot:
# shutdown -Fr |
Destroy the newly cloned disk (we rebooted off old_rootvg; it now becomes altinst_rootvg) hdisk1:
# alt_rootvg_op -X altinst_rootvg |
Bring in hdisk1 into rootvg for mirroring:
# extendvg -f rootvg hdisk1 # mirrorvg rootvg hdisk1 # bosboot -a # bootlist -m normal -o hdisk0 hdisk1 |
If the fix did not work, then stay at the same ML version, and fix another day:
Confirm that the disk will boot off hdisk1:
# bootlist -m normal -o hdisk1 |
Destroy cloned disk (old_rootvg) hdisk0:
# alt_rootvg_op -X old_rootvg |
Bring in hdisk0 into rootvg for mirroring:
# extendvg -f rootvg hdisk0
# mirrorvg rootvg hdisk0
# bosboot -a
# bootlist -m normal -o hdisk0 hdisk1
Commands
alt_disk_copy Clones the currently running system to an alternate disk
- To clone the running 5300-00 rootvg to hdisk3, then apply updates from /updates to bring the cloned rootvg to a 5300-01 level:
- alt_disk_copy -d hdisk3 -F 5300-01_AIX_ML -l /updates
- The bootlist would then be set to boot from hdisk3 at the next reboot.
- To clone the running rootvg to hdisk3 and hdisk4, and execute update_all on all updates from /updates:
- alt_disk_copy -d "hdisk3 hdisk4" -b update_all -l /updates
- The bootlist would then be set to boot from hdisk3 at the next reboot.
- To clone the running rootvg to hdisk1 and stop after phase 1:
- alt_disk_copy -d hdisk1 -P1
- To execute phases 2 and 3 on an existing alternate rootvg and reboot the system on successful completion:
- alt_disk_copy -d hdisk1 -P23 -r
- To clone the running system to hdisk1 and hdisk2, and to convert the file systems from JFS file systems to JFS2 file systems, run the following command:
- alt_disk_copy -B -T -d hdisk1 hdisk2
- Attention:
- Do not change the bootlist to use the cloned rootvg.
- To install a mksysb image on hdisk3 and hdisk4 , then run a customized script (/tmp/script) to copy some user files over to the alternate rootvg file systems before reboot:
- alt_disk_mksysb -m /mksysb_images/my_mksysb -d "hdisk3 hdisk4" -s /tmp/script
- To install a mksysb image on hdisk2 and stop after phase 1:
- alt_disk_mksysb -m /mksysb_images/my_mksysb -d hdisk2 -P1
- To execute phases 2 and 3 on an existing alternate rootvg on hdisk4 and reboot the system upon successful completion:
- alt_disk_mksysb -d hdisk4 -m /mksysb_images/my_mksysb -P23 -r
- To install a mksysb image on hdisk1, and to convert the file system from a JFS file system to a JFS2 file system, run the following command:
- alt_disk_mksysb -B -T -m /mksysb_images/my_mksysb -d hdisk1
- Attention:
- Do not change the bootlist to use the cloned rootvg.
alt_rootvg_op Performs operations on existing alternate rootvg volume groups.
- To remove the original rootvg ODM database entry, after booting from the new alternate disk, enter the following command:
- alt_rootvg_op -X old_rootvg
- To cleanup the current alternate disk install operation, enter the following command:
- alt_rootvg_op -X
- To determine the boot disk for a volume group with multiple physical volume, enter the following command:
- alt_rootvg_op -q -d hdisk0
- Illustrated Example
- # lspv
- hdisk0 00006091aef8b687 old_rootvg
- hdisk1 00076443210a72ea rootvg
- hdisk2 0000875f48998649 old_rootvg
- # alt_rootvg_op -q -d hdisk0
- hdisk2
- To modify an alt_disk_install volume group name, enter the following command:
- alt_rootvg_op -v alt_disk_530 -d hdisk2
- Illustrated Example
- # lspv
- hdisk0 00006091aef8b687 rootvg
- hdisk1 00000103000d1a78 rootvg
- hdisk2 000040445043d9f3 altinst_rootvg
- hdisk3 00076443210a72ea altinst_rootvg
- hdisk4 0000875f48998649 None
- hdisk5 000005317c58000e None
- # alt_rootvg_op -v alt_disk_432 -d hdisk2
- #lspv
- hdisk0 00006091aef8b687 rootvg
- hdisk1 00000103000d1a78 rootvg
- hdisk2 000040445043d9f3 alt_disk_432
- hdisk3 00076443210a72ea alt_disk_432
- hdisk4 0000875f48998649 None
- hdisk5 000005317c58000e None
- To "wake up" an original rootvg after booting from the new alternate disk, enter the following command:
- alt_rootvg_op -W -d hdisk0
- To "put to sleep" a volume group that had experienced a "wake-up" and rebuild the boot image, enter the following command:
- alt_rootvg_op -S -t
- To update the active alternate rootvg to the latest fileset levels available in /updates and install them into the alternate root volume group, enter the following command:
- alt_rootvg_op -C -b update_all -l /updates
Subscribe to:
Posts (Atom)