Compute node disk expansion resulted in mount.ocfs2: Bad magic number in superblock while opening device /dev/sda3

Compute node disk expansion resulted in mount.ocfs2: Bad magic number in superblock while opening device /dev/sda3

Before Christmas 2017 I had the honour to expand the compute node storage for a virtualised exadata. Better said, OVM on exadata. This can be done for several reasons. The most logic one, also the reason why this customer bought the disk expansion, is that you want to create more virtual machines and you have ran out of space on /EXAVMIMAGES.

The preparation is actually pretty simple. It is documented completely in the Exadata Database Machine Maintenance Guide. And look, point 2.9.11 Expanding /EXAVMIMAGES on Management Domain after Database Server Disk Expansion looks like what we need. In the second point it states

Add the disk expansion kit to the database server.
The kit consists of 4 additional hard drives to be installed in the 4 available slots. Remove the filler panels and install the drives. The drives may be installed in any order.

Even that is documented in point 2.4 Adding Disk expansion kit to Database Servers. So this is an easy task. Right? Yes! but … should I blog this if there isn’t an “oops” in it? So there are 2 very little “Gotcha’s”.

I highlight how I did it. Also Oracle Support verified this should be ok, so here we go.

Gotcha 1

Preparation

You need to set aside important information on how the system looks like.

First of all you need to ensure that reclaimdisks.sh has been run correctly after installation. As I did this installation myself, I can confirm this has been done correctly, so this one can be skipped.

Then we go the next step. Adding the disks physically in the servers. This really not so difficult, but need to be done with care and according the safety measures Oracle tells you, also watch out for ESD. They are FRU anyway, but in this case you know when the engineer should do it.
When the disks are put in the server, the raid starts to rebuild automatically. In our case it took around 14hours to finish. Dbmcli puts it in alerthistory when the rebuild is ready.

It’s also good to know how the partitions look like:

While preparing the steps, I saw that we had to recreate the partition, but not on the sector but the specified size. Personally I do not like this. So it’s good to gather the information about start and end sectors as well using parted.

Check how parted sees how the disk looks like

Also copy aside the information from df -h and the list of virtual machines using xm list.

Expansion

Partition enlargement

Then all domains can be shutdown. So do this only on one node at the time so that your database doesn’t go down

Make sure now to unmount the /EXAVMIMAGES filesystem.

It MIGHT (I had one node who needed it, another one who didn’t need it) be necessary to stop the xen deamon and the ocfs service. You can do this by running:

Then the filesystem cleanly unmounted. This is necessary because we will remove the partition in the next step.

The warning, we’re aware off. That’s because we are interfering with the disk which is actually in use and we cannot unmount everything on it, but the change has been done, we only cannot see it (yet). So then we need to create the new partition. In this case we follow the Oracle documentation:

So far so good, so we should be able to mount the partition without errors, but it won’t be bigger yet. So let’s do that.

And here is where the journey begins. Debug is not so difficult. For your interest, I had a service request already open and the engineer responded to just drop everything and restore the partition and try to recovery using -r. I’m not a fan of this, because it’s a risky operation and it’s always better to know why it happened so join me on the reasoning to find the solution, which makes me feel more comfortable than just “restoring things”:

First confirm the parted output. Remember this?

it matched exactly with the oracle documentation. So let’s see a bit deeper and check the /proc/partitions.

that doesn’t match with the documentation. Last number should be 3! So the beginning of the partition is not on the spot we expect it and let just that part contain very interesting information.

First step is to get rid of the wrongly created partition

When we are in a situation where the partition is gone, we can recreate it (the way we want), but starting from the sector we retrieved first. That way we know for sure that it starts at the same spot and it should turn out fine:

This way the partition looks EXACTLY the same as the output in the documentation, but … it did before as well.

That’s good! Remember that we still have to reboot the server to make the system reread the partition table, so the reboot must be done now.

When the system is back online, first verify if all went well:

So far so good. The filesystem mounts, but it’s still not expanded. The partition should be bigger now, lets check:

and it matches the documentation. This means the expansion of the portion has been executed successfully. Now the filesystem has to be enlarged.

Expand filesystem

The /EXAVMIMAGES is an ocfs2 filesystem. We can expand it using tunefs.ocfs2. This command should not give any output.

Looks fine. Df -h:

Yay, this is ok. Due to the reboot the user domains are already back online, but that’s ok. If they aren’t started automatically. It’s time to start them now. After they are fully booted, then you can repeat the actions on the other node.

 

Gotcha 2

This one worries me a bit to be honest. This customer has a certain amount of exadata’s. They stepped in at X2-2 and are currently in the X6-2 range. Also they try to keep up with patch levels and upgrade regularly. Which is, in my opinion, a good thing. Recently, some of the racks are extended to elastic configuration. I discovered that with the 12.2 image on the compute nodes there is something odd. By default a virtualised exadata dom0 filesystem looks like this:

The thing with a newly imaged (or newly deployed with the oracle provided image which comes from the factory:

When highlighting this to support I got an amusing talk. Apparently the disk layout is not included (yet?) in the patching / upgrading of these systems. So I asked them, when you want to expand a compute node which is already on LVM which naming you should use, or what the standards are. As in the latest EIS DVD (November2017) this was not included, neither on the oracle documentation (December2017). The answer I got was:

From a patching perspective we don’t care about the pv names as the work is done on a much higher level. For pv names, we recommend you use the same approach as for the existing disks.

When you know that by default the pvs used for vgexadb is /dev/sda2. So this story will still be a story to be continued. If you read this and know the answer, let me know please.

Lessons learned

  1. Copy more information aside from the system than Oracle tells you to do and do not forget to use common sense.
  2. Use sectors instead of other mechanisms to expand partitions or even better, use LVM.
  3. In case of doubt, open a service request to verify thing so you can be sure before continuing.

I’ve included the full output from the commands, merely for my own reference, but in case you end up in troubles, at least you know where the default partitions in a virtualised exadata starts and ends.

As always, questions, remarks? find me on twitter @vanpupi

Leave a Reply

Your email address will not be published. Required fields are marked *

1 × 1 =

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: