Installation
These instructions are sufficient to build and install PlasmaStMan
,
making it readily available for usage by third-party applications.
While these instructions are enough to get started, attention needs to be paid when planning to use this storage manager from a python environment. For details see Python quirks.
Dependencies
This project depends on:
Any C++14 compiler
casacore > 3.3.0 (with 64-bit table support)
arrow >= 1.0.0-SNAPSHOT with plasma support
Compiling
This is a cmake
-based project,
so it can be built as any standard cmake
project:
$ git clone https://gitlab.com/ska-telescope/ska-sdp-plasmastman
$ cd plasma-storage-manager
$ cmake . -B build
# cmake --build build
Some of the most relevant cmake
variables
(passed on the first cmake
invocation via -Dvariable=value
)
used for compiling are:
CASACORE_ROOT_DIR
: Root of arbitrary casacore installations in case one is usedArrow_DIR
: directory containing thecmake
configuration exported by Apache Arrow (usually underlib/cmake/arrow
in the arrow installation area).Plasma_DIR
: directory containing thecmake
configuration exported by Apache Plasma (usually underlib/cmake/arrow
in the arrow installation area).CMAKE_CXX_COMPILER
: The C++ compiler to use.CMAKE_CXX_FLAGS
: Extra C++ compilation flags.CMAKE_BUILD_TYPE
: The type of build to produce, one ofDebug
,Release
andRelWithDebInfo
.BUILD_TESTING
: Whether to build unit tests or not, defaults toON
.
Testing
A set of unit tests is included and built by default. To execute them do:
$ cmake --build build --target test
The unit tests require the plasma-store-server
executable
(part of a standard C++ Arrow Plasma installation)
to be visible in the path.
If you want further control on ctest
’s command line flags
you can do:
$ cmake --build build --target test -- ARGS="<ctest command line flags>"
or alternatively:
$ cd build/
$ ctest <ctest command line flags>
Python quirks
When using PlasmaStMan
from python,
special attention needs to be paid to how
the python-casacore
and pyarrow
python packages,
if needed by your python code, are installed
to avoid some otherwise difficult to debug errors.
python-casacore
TL;DR:
Don’t install the pre-built binary wheels from PyPI.
If you can, use the kernsuite repositories to install the
casacore
libraries andpython-casacore
python package from pre-built apt packages.If installing from kernsuite is not an option, then ensure
python-casacore
is built against the samecasacore
installationPlasmaStMan
was built against.
Starting from version 3.4.0, the python-casacore
package
offers pre-built binary wheels
for some major OS and python version combinations.
These binary wheels come bundled
with a copy of the underlying casacore libraries
(libcasa_casa.so
, libcasa_tables.so
, etc)
and their dependencies.
Each of these bundled libraries actually have
a specific SONAME
s and matching filesname
(e.g. libcasa_tables-734048a7.so.6
),
thus avoiding interfering
with any system-wide installation.
On the other hand,
the plug-in mechanism used to register
third-party storage managers with casacore
involves first loading the storage manager shared library into memory,
then invoking a registration function in the library
that registers itself into a static casacore
-owned registration map,
and finally checking that the registration was successful.
This usually looks like this:
+-------------+ 1. dlopen() +----------------+
| casacore.so | -------------------> | plasmastman.so |
+-------------+ +----------------+
^ | ^ | ^ |
| | | | 2. register_plasmastman() | |
| | | \-------------------------------/ |
| | | |
| | | 3. DataMan::registerCtor() |
| | \-------------------------------------/
| |
\--/ 4. check_registration() // all good :)
However when using the binary wheels from PyPI,
and because of the difference in SONAME
between the bundled libraries
and the libraries used to compile the storage manager,
two different copies of casacore.so
are loaded into memory,
and the interaction looks like this:
+----------------------+ 1. dlopen() +----------------+ 1.1 dlopen() +-------------+
| casacore-734048a7.so | ----------------> | plasmastman.so | -------------> | casacore.so |
+----------------------+ +----------------+ +-------------+
^ | | ^ | ^
| | | 2. register_plasmastman() | | |
| | \---------------------------/ | |
| | | |
| | | 3. DataMan::registerCtor() |
| | \---------------------------------/
| |
\--/ 4. check_registration() // fails, registration cannot be found :(
In particular, the error message will look something like:
RuntimeError: Table DataManager error: Data Manager class PlasmaData is not registered
This situation is specific to the binary wheels
distributed via PyPI.
To avoid this issue one must ensure
that the python-casacore
package
uses the same libraries
the storage manager was compiled against.
This could be done either by installing python-casacore
from source and pointing it
to an existing casacore
installation
(which itself might be installed from source or not),
or by using pre-compiled packages
that don’t incur into this duplication of libraries,
like the apt packages provided by the Kernsuite project.
pyarrow
TL;DR:
Pre-built binary wheels from PyPI are incompatible with pre-built Arrow apt packages provided by Apache.
You can install a different version of
pyarrow
alongside the pre-built Arrow apt packages, but this might break in the future.You can install
pyarrow
from sources, building them against the same Arrow/Plasma installationPlasmaStMan
was built against.
Apache Arrow makes available binary wheels in PyPI
for users to install the pyarrow
python package
without needing a compiler or any other external libraries.
Like in the case of python-casacore
,
these binary wheels are bundled
with their own copy of the Arrow shared libraries
(libarrow.so
, libplasma.so
and so on).
For a given version of Arrow,
these libraries share the same SONAME
with those installed via the Arrow apt repositories.
However, the PyPI pyarroww
binary wheels
are compiled using a version of gcc
prior to the introduction
that didn’t offer
a dual ABI mechanism
(read the link for a more detailed explanation).
The effect this has is that
the arrow libraries generated by newer versions of gcc
define differently named symbols
than those generated by older versions of gcc
,
and therefore they cannot be mixed freely
(e.g., linked or dynamically loaded).
This problem has been reported,
but other than acknowledging the issue
and providing some suggestions on how to proceed,
the final response was
that this use case is not officially supported
by the Arrow published artifacts.
Because of this situation,
problems occur if the python
process
loads the storage manager,
which has been compiled against
the apt-installed Arrow libraries,
after importing the PyPI-installed pyarrow
.
In such cases the following situation occurs:
+-------------+
| | 1.1 no dlopen(), library with same SONAME already loaded
| | 1.2 check_required_symbols() // fails, symbol not found
| libarrow.so | <--------------------------\
| | |
+-------------+ 1. dlopen() +----------------+
| casacore.so | -------------------> | plasmastman.so |
+-------------+ +----------------+
In particular, the error message will look something like:
RuntimeError: Shared library plasmastman not found in CASACORE_LDPATH or (DY)LD_LIBRARY_PATH
libcasa_plasmastman.so.4: cannot open shared object file: No such file or directory
libcasa_plasmastman.so: cannot open shared object file: No such file or directory
libplasmastman.so.4: cannot open shared object file: No such file or directory
/usr/local/lib/libplasmastman.so: undefined symbol: _ZN5arrow5fieldENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt10shared_ptrINS_8DataTypeEEbS6_IKNS_16KeyValueMetadataEE
Note that if pyarrow
has not yet been imported
at the time the storage manager library is loaded
then no error occurs:
+-------------+ 1. dlopen() +----------------+ 1.1 dlopen() +-------------+
| casacore.so | ------------> | plasmastman.so | -------------> | libarrow.so |
+-------------+ +----------------+ +-------------+
| ^
| |
\---------------------------------/
1.2. check_required_symbols() // fine
The situation above is a bit brittle
as it depends on pyarrow
not being loaded at the time.
Moreover, loading it later
might also lead to the same missing symbol error.
A possibility, somewhat fragile,
is to install a version of pyarrow
from PyPI
different to that installed via apt
so the SONAME
of both libraries don’t collide.
That way, plasmastman.so
is forced into loading a different copy of the arrow library
into memory.
This results in the following:
+---------------+
| |
| |
| libarrow.3.so |
| |
+---------------+ 1. dlopen() +----------------+ 1.1 dlopen() +---------------+
| casacore.so | ------------> | plasmastman.so | -------------> | libarrow.4.so |
+---------------+ +----------------+ +---------------+
| ^
| |
\---------------------------------/
2. check_required_symbols() // all good :)
This obviously results in two copies of different versions of the Arrow library loaded into memory. Although we haven’t noticed any side-effects, this might not always be the case.
The ultimate solution is of course
to avoid the problem with bundled libraries altogether
and install pyarrow
from source,
compiling against the same installation of Arrow/Plasma
the PlasmaStMan
was compiled against.
This results on a clean environment,
but has a higher setup cost:
+-------------+
| | 1.1 no dlopen(), library with same SONAME already loaded
| | 1.2 check_required_symbols() // all good :)
| libarrow.so | <--------------------------\
| | |
+-------------+ 1. dlopen() +----------------+
| casacore.so | -------------------> | plasmastman.so |
+-------------+ +----------------+