Deploying Ceph in lab 11.1 fails

MichaelVonderbecke · September 2018

I'm trying to do the ceph deploy installation in lab 11.1. Step 4 under 'deploy a monitor'. But it looks like the ceph-jewel repo is not valid:

[rdo-cc][INFO ] Running command: sudo yum -y install epel-release
[rdo-cc][DEBUG ] Loaded plugins: fastestmirror, priorities
[rdo-cc][DEBUG ] Determining fastest mirrors
[rdo-cc][DEBUG ] * base: mirror.fra10.de.leaseweb.net
[rdo-cc][DEBUG ] * epel: mirror.de.leaseweb.net
[rdo-cc][DEBUG ] * extras: mirror.checkdomain.de
[rdo-cc][DEBUG ] * updates: centosmirror.netcup.net
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] Trying other mirror.
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] Trying other mirror.
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] Trying other mirror.
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] Trying other mirror.
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] Trying other mirror.
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] Trying other mirror.
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] Trying other mirror.
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] Trying other mirror.
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] Trying other mirror.
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] Trying other mirror.
[rdo-cc][WARNIN]
[rdo-cc][WARNIN]
[rdo-cc][WARNIN] One of the configured repositories failed (CentOS-7 - Ceph Jewel),
[rdo-cc][WARNIN] and yum doesn't have enough cached data to continue. At this point the only
[rdo-cc][WARNIN] safe thing yum can do is fail. There are a few ways to work "fix" this:
[rdo-cc][WARNIN]
[rdo-cc][WARNIN] 1. Contact the upstream for the repository and get them to fix the problem.
[rdo-cc][WARNIN]
[rdo-cc][WARNIN] 2. Reconfigure the baseurl/etc. for the repository, to point to a working
[rdo-cc][WARNIN] upstream. This is most often useful if you are using a newer
[rdo-cc][WARNIN] distribution release than is supported by the repository (and the
[rdo-cc][WARNIN] packages for the previous distribution release still work).
[rdo-cc][WARNIN]
[rdo-cc][WARNIN] 3. Run the command with the repository temporarily disabled
[rdo-cc][WARNIN] yum --disablerepo=centos-ceph-jewel ...
[rdo-cc][WARNIN]
[rdo-cc][WARNIN] 4. Disable the repository permanently, so yum won't use it by default. Yum
[rdo-cc][WARNIN] will then just ignore the repository until you permanently enable it
[rdo-cc][WARNIN] again or use --enablerepo for temporary usage:
[rdo-cc][WARNIN]
[rdo-cc][WARNIN] yum-config-manager --disable centos-ceph-jewel
[rdo-cc][WARNIN] or
[rdo-cc][WARNIN] subscription-manager repos --disable=centos-ceph-jewel
[rdo-cc][WARNIN]
[rdo-cc][WARNIN] 5. Configure the failing repository to be skipped, if it is unavailable.
[rdo-cc][WARNIN] Note that yum will try to contact the repo. when it runs most commands,
[rdo-cc][WARNIN] so will have to try and fail each time (and thus. yum will be be much
[rdo-cc][WARNIN] slower). If it is a very temporary problem though, this is often a nice
[rdo-cc][WARNIN] compromise:
[rdo-cc][WARNIN]
[rdo-cc][WARNIN] yum-config-manager --save --setopt=centos-ceph-jewel.skip_if_unavailable=true
[rdo-cc][WARNIN]
[rdo-cc][WARNIN] failure: repodata/repomd.xml from centos-ceph-jewel: [Errno 256] No more mirrors to try.
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y install epel-release

[ceph@rdo-cc ceph-cluster]$

serewicz · September 2018

Hello,

Which step was this issue? I see you wrote 11.1, step 4, but I show step 4 as being timedatectl It would help if you could show the command you ran in addition the the error. When I just tried to install ceph-deploy, in step 3, I also saw the HTTP 503 errors, but it worked after those errors.

Regards,

MichaelVonderbecke · September 2018

@serewicz said:
Hello,

Which step was this issue? I see you wrote 11.1, step 4, but I show step 4 as being timedatectl It would help if you could show the command you ran in addition the the error. When I just tried to install ceph-deploy, in step 3, I also saw the HTTP 503 errors, but it worked after those errors.

Regards,

Hi, it was step 4 of deploying a monitor. The command is:

[ceph@rdo-cc ceph-cluster]$ ceph-deploy install --release luminous \
rdo-cc storage1 storage2 storage3

This command fails pretty close to the beginning with the above errors, because it seems to be unable to install epel-release (I assume because epel-release is actually in the ceph-jewel repo).

All the steps leading up to this one succeeded fine. The yum steps before this one gave the same ceph-jewel repo errors but still succeeded, but step 4 of deploying a monitor fails after the errors. I tried running yum -y install epel-release as a separate command and got the same error.

serewicz · September 2018

Hello,

Thank you. I have just tried these steps and did not have any errors. There are a few warnings, but that is typical.

I see a few mentions of Jewel in your previous post output. I think there may be a typo or missing character in your start-ceph.repo file, which is why you are not seeing messages about Luminous instead. The most common one being if you type e17 (e-seventeen) instead of el7 (e-ell-seven), which it should be. Could you paste your start-ceph.repo file here. I'll copy it and see if I get the same errors.

Regards,

MichaelVonderbecke · September 2018

@serewicz said:
Hello,

Thank you. I have just tried these steps and did not have any errors. There are a few warnings, but that is typical.

I see a few mentions of Jewel in your previous post output. I think there may be a typo or missing character in your start-ceph.repo file, which is why you are not seeing messages about Luminous instead. The most common one being if you type e17 (e-seventeen) instead of el7 (e-ell-seven), which it should be. Could you paste your start-ceph.repo file here. I'll copy it and see if I get the same errors.

Regards,

I logged back into the cluster and since it had been some time, it had been reset. So, I went through the steps again just like yesterday but I did not get any errors (even the initial repo errors as before).

However, now I am getting to step 1 of "Deploy OSD nodes for the cluster" and have a couple problems.

The command 'ceph-deploy osd create --data /dev/xvdb storage1' fails with a message about /dev/xvdb not existing.

I logged into storage1 and did an lvmdiskscan and see that the devices are /dev/vda and /dev/vdb, so I assume /dev/vdb is the correct once since it is 30G.

I tried rerunning the command as:

ceph-deploy osd create --data /dev/vdb storage1

This time it gets further but:

[ceph@rdo-cc ceph-cluster]$ ceph-deploy osd create --data /dev/vdb storage1
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /bin/ceph-deploy osd create --data /dev/vdb storage1
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] bluestore : None
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7fddb117a128>
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] fs_type : xfs
[ceph_deploy.cli][INFO ] block_wal : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] journal : None
[ceph_deploy.cli][INFO ] subcommand : create
[ceph_deploy.cli][INFO ] host : storage1
[ceph_deploy.cli][INFO ] filestore : None
[ceph_deploy.cli][INFO ] func :
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] zap_disk : False
[ceph_deploy.cli][INFO ] data : /dev/vdb
[ceph_deploy.cli][INFO ] block_db : None
[ceph_deploy.cli][INFO ] dmcrypt : False
[ceph_deploy.cli][INFO ] overwrite_conf : False
[ceph_deploy.cli][INFO ] dmcrypt_key_dir : /etc/ceph/dmcrypt-keys
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] debug : False
[ceph_deploy.osd][DEBUG ] Creating OSD on cluster ceph with data device /dev/vdb
[storage1][DEBUG ] connection detected need for sudo
[storage1][DEBUG ] connected to host: storage1
[storage1][DEBUG ] detect platform information from remote host
[storage1][DEBUG ] detect machine type
[storage1][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO ] Distro info: CentOS Linux 7.5.1804 Core
[ceph_deploy.osd][DEBUG ] Deploying osd to storage1
[storage1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[storage1][DEBUG ] find the location of an executable
[storage1][INFO ] Running command: sudo /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/vdb
[storage1][WARNIN] No data was received after 300 seconds, disconnecting...
[storage1][INFO ] checking OSD status...
[storage1][DEBUG ] find the location of an executable
[storage1][INFO ] Running command: sudo /bin/ceph --cluster=ceph osd stat --format=json

[storage1][WARNIN] No data was received after 300 seconds, disconnecting...
[ceph_deploy.osd][DEBUG ] Host storage1 is now ready for osd use.

But the actual OSD is not ready:

[ceph@rdo-cc ceph-cluster]$ ceph -s
cluster:
id: 4165bd5f-f38d-4c6b-b1e3-287f800435b8
health: HEALTH_OK

services:
mon: 1 daemons, quorum rdo-cc
mgr: rdo-cc(active)
osd: 0 osds: 0 up, 0 in

data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0B
usage: 0B used, 0B / 0B avail
pgs:

I tried the same thing on storage2 and got the same timeouts and the same end result.

MichaelVonderbecke · September 2018

I tried running the command:

sudo /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/vdb

directly on storage1 to see what happens:

[ceph@storage1 ~]$ sudo /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/vdb
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new e5682c7a-ada1-48a9-bfb2-d2ab4a46aba0
stderr: 2018-09-21 18:41:51.464896 7fda7cbf6700 0 monclient(hunting): authenticate timed out after 300
stderr: 2018-09-21 18:41:51.465000 7fda7cbf6700 0 librados: client.bootstrap-osd authentication error (110) Connection timed out
stderr: [errno 110] error connecting to the cluster
--> RuntimeError: Unable to create a new OSD id

Not sure if this is actually related to the problem running the ceph-deploy command from rdo-cc or not.

vipinsagar · September 2018

@MichaelVonderbecke said:
I tried running the command:

sudo /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/vdb

directly on storage1 to see what happens:

[ceph@storage1 ~]$ sudo /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/vdb
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new e5682c7a-ada1-48a9-bfb2-d2ab4a46aba0
stderr: 2018-09-21 18:41:51.464896 7fda7cbf6700 0 monclient(hunting): authenticate timed out after 300
stderr: 2018-09-21 18:41:51.465000 7fda7cbf6700 0 librados: client.bootstrap-osd authentication error (110) Connection timed out
stderr: [errno 110] error connecting to the cluster
--> RuntimeError: Unable to create a new OSD id

Not sure if this is actually related to the problem running the ceph-deploy command from rdo-cc or not.

The timeout is caused by the IPTABLES on rdo-cc.
Quick Fix: run on rdo-cc # sudo iptables -F.
Long fix: create a rule in IP-Tables to get pass this.

Thanks!
Vipinsagar

MichaelVonderbecke · September 2018

@vipinsagar said:

@MichaelVonderbecke said:
I tried running the command:

sudo /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/vdb

directly on storage1 to see what happens:

[ceph@storage1 ~]$ sudo /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/vdb
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new e5682c7a-ada1-48a9-bfb2-d2ab4a46aba0
stderr: 2018-09-21 18:41:51.464896 7fda7cbf6700 0 monclient(hunting): authenticate timed out after 300
stderr: 2018-09-21 18:41:51.465000 7fda7cbf6700 0 librados: client.bootstrap-osd authentication error (110) Connection timed out
stderr: [errno 110] error connecting to the cluster
--> RuntimeError: Unable to create a new OSD id

Not sure if this is actually related to the problem running the ceph-deploy command from rdo-cc or not.

The timeout is caused by the IPTABLES on rdo-cc.
Quick Fix: run on rdo-cc # sudo iptables -F.
Long fix: create a rule in IP-Tables to get pass this.

Thanks!
Vipinsagar

This did fix the problem, although I found it strange because sudo iptables -L had no rules in any chains, so I'm not sure why sudo iptables -F would have actually fixed antyhing

serewicz · September 2018

It could be that the default policy of a chain somehow has been changed. If you flush it with iptables -F it will change the Policy back to default, which is wide open. I'll continue to investigate the issue.

Thanks for posting the fix!

Regards,

Deploying Ceph in lab 11.1 fails

Comments

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)