As a precursor to writing a subordinate charm to automate this, I though it would be a good idea to outline the steps to integrating the two here so others in the community can see what I’m doing.
The steps are pretty simple, actually. You can do this with juju deployed hadoop and ceph.
1. Building the cephfs hadoop plugin
From a dev machine, clone the cephfs hadoop plugin:
git clone https://github.com/ceph/cephfs-hadoop.git
cd cephfs-hadoop
Edit the hadoop.version
property in pom.xml
to the version of hadoop in the node (2.7.3 as of 2/4/2019)
Edit the maven-compiler-plugin
version in pom.xml
to 3.8.0 (fixes builds on Java versions newer than 8)
Build the thing:
mvn -Dmaven.test.skip=true package
Note: Maven will fail if JAVA_HOME
env is not set to your jdk location.
The artifact should be in target/cephfs-hadoop-0.80.6.jar
.
2. Adding ceph libraries to hadoop
Upload cephfs-hadoop-0.80.6.jar
to /usr/lib/hadoop/lib
in the hadoop charm.
From the hadoop node, install libcephfs-java
sudo apt install libcephfs-java
Copy the libcephfs.jar
to hadoop/lib
cp /usr/share/java/libcephfs.jar /usr/lib/hadoop/lib
Hadoop won’t always load the libraries, so we need to add them to hadoop’s classpath. Add the following to /etc/hadoop/conf/hadoop-env.sh
:
export HADOOP_CLASSPATH="/usr/lib/hadoop/lib/cephfs-hadoop-0.80.6.jar"
export HADOOP_CLASSPATH="/usr/lib/hadoop/lib/libcephfs.jar:$HADOOP_CLASSPATH"
3. Configure hadoop
Remove default FS prop from /etc/hadoop/conf/core-site.xml
and add the following properties:
<property>
<name>fs.ceph.impl</name>
<value>org.apache.hadoop.fs.ceph.CephFileSystem</value>
</property>
<property>
<name>ceph.conf.file</name>
<value>/etc/ceph/ceph.conf</value>
</property>
<property>
<name>fs.default.name</name>
<value>ceph:///</value>
</property>
The rest of the information needed to connect to ceph (monitor addresses, fsid, etc) should be in /etc/ceph/ceph.conf
.
Example ceph.conf configuration:
[global]
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
mon host = 172.31.103.152:6789 172.31.103.176:6789 172.31.103.199:6789
fsid = 050fc712-25bb-11e9-8024-02b84229d652
log to syslog = false
err to syslog = false
clog to syslog = false
mon cluster log to syslog = false
debug mon = 1/5
debug osd = 1/5
mon pg warn max object skew = -1
public network =
cluster network =
public addr = 172.31.103.92
cluster addr = 172.31.103.199
[mon]
keyring = /var/lib/ceph/mon/$cluster-$id/keyring
[mds]
keyring = /var/lib/ceph/mds/$cluster-$id/keyring
4. Verify the connection
If all is well, the following will list the contents in the root of the ceph file system:
hadoop fs -ls /