8/03/2008

LVM2

As mentioned as intro, a few surfing hours later, some LVM questions remain unanswered.

Managing a few hundreds of Linux (RHEL4 and RHEL5) systems on a production environment, as well as HP-UX and Solaris machines, we decided to review our strategy concerning Linux by promoting it to mission critical level.

The idea being to bring some critical central databases to the Linux platform.

Our experience being mainly based on HP-UX for this kind of function, LVM is something we are used to practice.

We first assumed the LVM2 implementation was quite similar to the HP one, but we discovered some "unexpected features" ;-o.

The Mirror Question:

Let's assume we have a server dual connected to 2 FC disk arrays located on 2 different sites to be DR compliant. This server belongs to a cluster, with a backup node on the DR site.
This machine has its 2 internal system disks on a hardware RAID controller.
It has 2 FC disks from the arrays on which we are planning to install the application (a database).
Our idea was to create several LV's mirrored on these 2 FC disks, as we would have done on HP-UX.

Indeed, LVM2 allowed us to create mirrored LV's (lvconvert -m1 OraVG/lvoldata...), but ...there are some constraints:
  1. the metadata location: corelog or disk.
  • Disk : Ok, but it is necessarly a third disk. The problem is that, if we want to remain DR compliant, this disk has to be on both arrays, which means it has to be mirrored. Where do we put its metadata ? The never ending question !
  • Corelog: The metadata is kept in memory, fine, but when the server reboots, the whole mirror will be resync'ed as if it was its first built. It would be ok for small filesystems, but large data filesystems would take a while and generate some load to be rebuilt.
Comparing to HP-UX LVM which holds the metadata in its own structure, for what reason (good I guess) the LVM2 development team implemented the mirror feature this way ?

2. The Mapper Question (not very important, just for my knowledge): for what reason this mapper thing is appended to the name of the devices, as the /dev/vgname/lvolX devices exist ? and by the way, why not to address directly things this way rather than /dev/mapper/vgname-lvolX ?

Anyway, we did things differently by using mdadm to mirror the FC LUNs, than created the PV corresponding to the md device created before and then created the LVs.

About mdadm, there is a multipath option, we didn't use it. DM-Multipath allowed us to be in a round-robin fashion.

It would have been nice to be able to use LVM2 directly on top of DM-Multipath, without having to insert the md layer in the middle.

3 comments:

Anonymous said...

Congratulation for your blog...

Unknown said...

Hi:

I was really glad to read this. We're in exactly the same stage of production as you are, and exploring the exact same issues with the exact same setup. I'm currently struggling with trying to figure out the exact format of the lvcreate --mirror command, when you use on-disk log on our test systems.

In our production systems, so far we're doing what you've done -- use mdadm mirroring.

Nevertheless, here's some other information which you may find helpful:

First, you *can* address logical volumes in a more sane and reasonable by using the format "vg_name/lv_name". For instance, something like this:

lvdisplay vg_system/lv_root

Second, on intel platforms, you can consider using Veritas Volume Manager to provide extremely high quality cluster compliant volume management. It also (obviously) integrates well with Veritas Cluster Services. IMHO these two products are among the best in their class of anything I've seen.

The drawbacks: Veritas ships this product as binary only kernel modules - this means that your redhat / suse / etc support will be limited, and that you are limited in the ability to update your kernel to only what Veritas runs on. It's also not a cheap product.

Third: IBM is working on a package of very high quality mirroring software enhancements to the stock linux kernel. A Dr. Holgar Smolinski out of Germany is the primary author. At my first glance, it appears to be on the same order of functionality as Veritas. Also, IBM is willing to sell you a redhat linux support policy that includes both this package and the rest of linux, as well.

Drawbacks: It's not yet in the kernel (although Holgar is trying to push it in, but there's no schedule for getting something like this in). This is still a special order product with IBM. And most severe drawback: it doesn't yet work with Cluster LVM on linux. This means no active/active shared filesystem clusters, although you could set your cluster manager to specifically deactivate and activate volume groups on a active/passive style HA cluster.

Please let me know if you find out anything else.

-- Pat

bblog said...

Hello Pat,

Thanks for your post.

We gave up trying to mirror with LVM2 as there are some lmitations such as breaking the mirror if you want to live exend the filesystem ontop of your mirrored LV, plus all the points that are mentioned in my original post.

Our setup with DM-MP -- MD -- LVM is quite finalized and is definitely IMHO the most "production compliant" one though it can be viewed as tricky.

I'll post the detailled setup in the next few days.

We are also considering VxVM among other things.

For our clusters, as we are a large HP MCServiceGuard clustering software on HP-UX client in Europe, we have a certain experience on this product which exists also for Linux.

This will be our prefered solution.

Things will be built the same way except the fact that mdadm will be replaced by HP XDC for the mirroring part.

HP XDC is in fact based on mdadm and has the advantage of being cluster aware.

I'll have a look at IBM plans.

Brem

PS: will be your on-disk log on a mirrored device? if so, how will you manage its own log ?