Tuesday, June 17, 2014

Okay, some of them aren't that bad...

Addendum to my previous post: last week I helped out another SE, this one a Brit, who was looking to upgrade a customer from an ancient release to both a later release and a new hosting OS (linux rather than solaris). They wanted to first install their ancient releases to new linux servers, make sure the database was happy, and then do the upgrade.

I resurrected copies of the ancient releases, which I'd initially thought were long gone, casualties of disk clean up. I Gzipped up tar files and passed them along. He thanked me profusely and offered chocolate. I figured he was joking and said "dark, please."

He just swung by my office and gave me two big bars of Cadbury Bournville classic dark chocolate.

Okay, then.


Monday, June 16, 2014

In My Less Exciting Life, I'm a Nerd.


This post is completely not horse related. Except in that it gives some idea why I need a horse to be kept sane. Possible identifying company stuff X-d out to protect the innocent.  Me, in other words.

As I said, I'm a nerd. Actually, my title is "Build/Release Engineer" but I get to be the keeper of the product installation tools too. The product is fairly complex, runs on Linux and Solaris operating systems, uses an Oracle database, and interacts with bunches of different pieces of hardware.

So though build/release engineers don't typically have much customer interaction, I sometimes have to interact with the people who do. And sometimes I wonder just how they got to be called "Systems Engineers".

So… I got a question from an Systems Engineer, a customer issue, an executable was not starting up, the error message about a library being 32bit rather than 64bit.

I asked (not unreasonably):

If you have the exact text of the error, I might be able to find it either in the readmes, cvs logs, or in my email. There were a few 32 vs 64 bit issues there for a while…

And he replied:

xxxxxxxxx main exiting: Could not open driver 'oracle' library liboradb-s.so: libclntsh.so.11.1: wrong ELF class: ELFCLASS32

It looked familiar, but as a work-around I told him to change the symbolic link for xxxxxxxx to point to the 32 bit version of the file. That worked, of course.

Since they claimed to have installed a 64 bit Oracle server and a 32 bit oracle client (we need the 32bit libs for some of our executables), I thought that perhaps the LD_LIBRARY_PATH needed to be reorganized to point to the 64 bit libs before 32 bit libs. That really didn't make much sense - 
well written code will search for the correct  one (if it needs 64 bit it will skip over the 32 bit and continue to look), but I have actually seen that be a problem with Cognos reports. Yeah, Cognos is not a great product. L

But, then I looked at the error message again, and said:

WAIT.
They didn’t install Oracle 64 bit.
I bet that’s it. Their Oracle server is 32 bit.

At this point, I'm CERTAIN that is the case. But I'm just the build/release engineer.

The SE said no, they have 64 bit server. Ok, then.

This was Friday.  He said he was going home. 

So Monday morning he pings me again. They want the 64 bit xxxxxxxxx. He’d changed the LD_LIBRARY_PATH, but it was still failing to start.


I sent this (all one email, though I’m commenting in the middle):
The only thing I can think of is that the liboradb-s.so file in /prod/oracle/product/11.2.0.3/db_1/64bit/lib directory either isn’t there (it should be linked to the “11” version of the file) or the “11” version is, despite the path saying 64bit, 32 bit.

  • Now actually, the file I meant to refer to here was the libclntsh.so.11 one, not liboradb-s.so. I saw my mistake and corrected myself in my next email.
  • So, I'd said again that they had 32 bit libraries and executables in the 64 bit path. But since he was so adamant that those files were 64 bit, I also asked again about the LD_LIBRARY_PATH, and gave my reason for it, and then my reason for doubting that was really the problem.

If neither of those are the case, you can try resetting the LD_LIBRARY_PATH to look for 64 bit first where the oracle lib paths are concerned, but I don’t think that’s the problem… where we saw that issue was with Cognos 64 bit version (later than 8.1, I think we introduced it in 8.4) – for some reason once it found a library file, it would attempt to use it even though it was the wrong type (the 32 bit version was found, the 64 bit version existed, but it stopped looking after finding the 32 bit version, then barfed). Changing the order in LD_LIBRARY_PATH so that all 64 bit libraries would resolve first fixed that problem.

Did I ask you to do ldd on the two files (or did you do it, and I’ve forgotten)?

IOW, after sourcing xxxxxxx.sh
ldd xxxxxxxxx-32
ldd xxxxxxxxx-64


He came back with:
Another attempt failed:

-bash-4.1$ xxxxxx.sh start

Checking database...

Starting XXXXXXX...
ERROR - The executable /prod/xxxx/bin/xxxxxxx failed to start.

-bash-4.1$ xxxxxxxxx
xxxxxxxx main exiting: Could not open driver 'oracle' library liboradb-s.so: libclntsh.so.11.1: wrong ELF class: ELFCLASS32

Sigh.

So I sent him this:
If you do “file /prod/xxxx/bin/xxxxxxxx-64” and “file /prod/oracle/product/11.2.0.3/db_1/64bit/lib/libclntsh.so.11”, it’s definitely x86_64, right, both cases?

For mine:
xxxx:~$ file $XXXXXXX/bin/xxxxxxxx-64
/home/xxxx/BX/bin/xxxxxxx-64: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses shared libs), not stripped
xxxx:~$ file /home/oracle/app/oracle/product/11.2.0/dbhome_1/lib/libclntsh.so.11.1
/home/oracle/app/oracle/product/11.2.0/dbhome_1/lib/libclntsh.so.11.1: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), not stripped

If so, then I have no idea what’s going on. That’s the only thing left that I haven’t seen to be proved to be the case.

Yes, I’m getting a bit tired of asking about the library really, really, being 64 bit…

So I get back:
Looks like:
-bash-4.1$ file /prod/oracle/product/11.2.0.3/db_1/64bit/lib/libclntsh.so
/prod/oracle/product/11.2.0.3/db_1/64bit/lib/libclntsh.so: symbolic link to `/prod/oracle/product/11.2.0.3/db_1/64bit/lib/libclntsh.so.11.1'
-bash-4.1$ file /prod/oracle/product/11.2.0.3/db_1/32bit/lib/libclntsh.so
libclntsh.so: symbolic link to `/prod/oracle/product/11.2.0.3/db_1/32bit/lib/libclntsh.so.11.1'
-bash-4.1$ file /prod/xxxx/bin/xxxxxxxxx
/prod/xxxx/bin/xxxxxxxx: symbolic link to `xxxxxxxx-64'
-bash-4.1$ file /prod/xxxx/bin/xxxxxxxx-64
/prod/xxxx/bin/xxxxxxx-64: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.4.0, not stripped

And I suppose I shouldn't have been surprised at this point that I had to reply:

Follow the symlink and “file” it, please?

And eureka! He replies:

Found it:
-bash-4.1$ file /prod/oracle/product/11.2.0.3/db_1/32bit/lib/libclntsh.so.11.1
/prod/oracle/product/11.2.0.3/db_1/32bit/lib/libclntsh.so.11.1: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, not stripped
-bash-4.1$ file /prod/oracle/product/11.2.0.3/db_1/64bit/lib/libclntsh.so.11.1
/prod/oracle/product/11.2.0.3/db_1/64bit/lib/libclntsh.so.11.1: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, not stripped

Client install bad?


Um. No. Not the client… But yes, THE 64BIT INSTALL ISN’T IN FACT, 64BIT BUT IS 32 BIT.

Gah.