Welcome to the new Linux Foundation Forum!

Deploying Ceph in lab 11.1 fails

I'm trying to do the ceph deploy installation in lab 11.1. Step 4 under 'deploy a monitor'. But it looks like the ceph-jewel repo is not valid:

[rdo-cc][INFO ] Running command: sudo yum -y install epel-release
[rdo-cc][DEBUG ] Loaded plugins: fastestmirror, priorities
[rdo-cc][DEBUG ] Determining fastest mirrors
[rdo-cc][DEBUG ] * base: mirror.fra10.de.leaseweb.net
[rdo-cc][DEBUG ] * epel: mirror.de.leaseweb.net
[rdo-cc][DEBUG ] * extras: mirror.checkdomain.de
[rdo-cc][DEBUG ] * updates: centosmirror.netcup.net
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] Trying other mirror.
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] Trying other mirror.
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] Trying other mirror.
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] Trying other mirror.
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] Trying other mirror.
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] Trying other mirror.
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] Trying other mirror.
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] Trying other mirror.
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] Trying other mirror.
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] Trying other mirror.
[rdo-cc][WARNIN]
[rdo-cc][WARNIN]
[rdo-cc][WARNIN] One of the configured repositories failed (CentOS-7 - Ceph Jewel),
[rdo-cc][WARNIN] and yum doesn't have enough cached data to continue. At this point the only
[rdo-cc][WARNIN] safe thing yum can do is fail. There are a few ways to work "fix" this:
[rdo-cc][WARNIN]
[rdo-cc][WARNIN] 1. Contact the upstream for the repository and get them to fix the problem.
[rdo-cc][WARNIN]
[rdo-cc][WARNIN] 2. Reconfigure the baseurl/etc. for the repository, to point to a working
[rdo-cc][WARNIN] upstream. This is most often useful if you are using a newer
[rdo-cc][WARNIN] distribution release than is supported by the repository (and the
[rdo-cc][WARNIN] packages for the previous distribution release still work).
[rdo-cc][WARNIN]
[rdo-cc][WARNIN] 3. Run the command with the repository temporarily disabled
[rdo-cc][WARNIN] yum --disablerepo=centos-ceph-jewel ...
[rdo-cc][WARNIN]
[rdo-cc][WARNIN] 4. Disable the repository permanently, so yum won't use it by default. Yum
[rdo-cc][WARNIN] will then just ignore the repository until you permanently enable it
[rdo-cc][WARNIN] again or use --enablerepo for temporary usage:
[rdo-cc][WARNIN]
[rdo-cc][WARNIN] yum-config-manager --disable centos-ceph-jewel
[rdo-cc][WARNIN] or
[rdo-cc][WARNIN] subscription-manager repos --disable=centos-ceph-jewel
[rdo-cc][WARNIN]
[rdo-cc][WARNIN] 5. Configure the failing repository to be skipped, if it is unavailable.
[rdo-cc][WARNIN] Note that yum will try to contact the repo. when it runs most commands,
[rdo-cc][WARNIN] so will have to try and fail each time (and thus. yum will be be much
[rdo-cc][WARNIN] slower). If it is a very temporary problem though, this is often a nice
[rdo-cc][WARNIN] compromise:
[rdo-cc][WARNIN]
[rdo-cc][WARNIN] yum-config-manager --save --setopt=centos-ceph-jewel.skip_if_unavailable=true
[rdo-cc][WARNIN]
[rdo-cc][WARNIN] failure: repodata/repomd.xml from centos-ceph-jewel: [Errno 256] No more mirrors to try.
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][WARNIN] http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 503 - Service Unavailable
[rdo-cc][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y install epel-release

[[email protected] ceph-cluster]$

Comments

  • Hello,

    Which step was this issue? I see you wrote 11.1, step 4, but I show step 4 as being timedatectl It would help if you could show the command you ran in addition the the error. When I just tried to install ceph-deploy, in step 3, I also saw the HTTP 503 errors, but it worked after those errors.

    Regards,

  • @serewicz said:
    Hello,

    Which step was this issue? I see you wrote 11.1, step 4, but I show step 4 as being timedatectl It would help if you could show the command you ran in addition the the error. When I just tried to install ceph-deploy, in step 3, I also saw the HTTP 503 errors, but it worked after those errors.

    Regards,

    Hi, it was step 4 of deploying a monitor. The command is:

    [[email protected] ceph-cluster]$ ceph-deploy install --release luminous \
    rdo-cc storage1 storage2 storage3

    This command fails pretty close to the beginning with the above errors, because it seems to be unable to install epel-release (I assume because epel-release is actually in the ceph-jewel repo).

    All the steps leading up to this one succeeded fine. The yum steps before this one gave the same ceph-jewel repo errors but still succeeded, but step 4 of deploying a monitor fails after the errors. I tried running yum -y install epel-release as a separate command and got the same error.

  • Hello,

    Thank you. I have just tried these steps and did not have any errors. There are a few warnings, but that is typical.

    I see a few mentions of Jewel in your previous post output. I think there may be a typo or missing character in your start-ceph.repo file, which is why you are not seeing messages about Luminous instead. The most common one being if you type e17 (e-seventeen) instead of el7 (e-ell-seven), which it should be. Could you paste your start-ceph.repo file here. I'll copy it and see if I get the same errors.

    Regards,

  • @serewicz said:
    Hello,

    Thank you. I have just tried these steps and did not have any errors. There are a few warnings, but that is typical.

    I see a few mentions of Jewel in your previous post output. I think there may be a typo or missing character in your start-ceph.repo file, which is why you are not seeing messages about Luminous instead. The most common one being if you type e17 (e-seventeen) instead of el7 (e-ell-seven), which it should be. Could you paste your start-ceph.repo file here. I'll copy it and see if I get the same errors.

    Regards,

    I logged back into the cluster and since it had been some time, it had been reset. So, I went through the steps again just like yesterday but I did not get any errors (even the initial repo errors as before).

    However, now I am getting to step 1 of "Deploy OSD nodes for the cluster" and have a couple problems.

    The command 'ceph-deploy osd create --data /dev/xvdb storage1' fails with a message about /dev/xvdb not existing.

    I logged into storage1 and did an lvmdiskscan and see that the devices are /dev/vda and /dev/vdb, so I assume /dev/vdb is the correct once since it is 30G.

    I tried rerunning the command as:

    ceph-deploy osd create --data /dev/vdb storage1

    This time it gets further but:

    [[email protected] ceph-cluster]$ ceph-deploy osd create --data /dev/vdb storage1
    [ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph/.cephdeploy.conf
    [ceph_deploy.cli][INFO ] Invoked (2.0.1): /bin/ceph-deploy osd create --data /dev/vdb storage1
    [ceph_deploy.cli][INFO ] ceph-deploy options:
    [ceph_deploy.cli][INFO ] verbose : False
    [ceph_deploy.cli][INFO ] bluestore : None
    [ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7fddb117a128>
    [ceph_deploy.cli][INFO ] cluster : ceph
    [ceph_deploy.cli][INFO ] fs_type : xfs
    [ceph_deploy.cli][INFO ] block_wal : None
    [ceph_deploy.cli][INFO ] default_release : False
    [ceph_deploy.cli][INFO ] username : None
    [ceph_deploy.cli][INFO ] journal : None
    [ceph_deploy.cli][INFO ] subcommand : create
    [ceph_deploy.cli][INFO ] host : storage1
    [ceph_deploy.cli][INFO ] filestore : None
    [ceph_deploy.cli][INFO ] func :
    [ceph_deploy.cli][INFO ] ceph_conf : None
    [ceph_deploy.cli][INFO ] zap_disk : False
    [ceph_deploy.cli][INFO ] data : /dev/vdb
    [ceph_deploy.cli][INFO ] block_db : None
    [ceph_deploy.cli][INFO ] dmcrypt : False
    [ceph_deploy.cli][INFO ] overwrite_conf : False
    [ceph_deploy.cli][INFO ] dmcrypt_key_dir : /etc/ceph/dmcrypt-keys
    [ceph_deploy.cli][INFO ] quiet : False
    [ceph_deploy.cli][INFO ] debug : False
    [ceph_deploy.osd][DEBUG ] Creating OSD on cluster ceph with data device /dev/vdb
    [storage1][DEBUG ] connection detected need for sudo
    [storage1][DEBUG ] connected to host: storage1
    [storage1][DEBUG ] detect platform information from remote host
    [storage1][DEBUG ] detect machine type
    [storage1][DEBUG ] find the location of an executable
    [ceph_deploy.osd][INFO ] Distro info: CentOS Linux 7.5.1804 Core
    [ceph_deploy.osd][DEBUG ] Deploying osd to storage1
    [storage1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
    [storage1][DEBUG ] find the location of an executable
    [storage1][INFO ] Running command: sudo /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/vdb
    [storage1][WARNIN] No data was received after 300 seconds, disconnecting...
    [storage1][INFO ] checking OSD status...
    [storage1][DEBUG ] find the location of an executable
    [storage1][INFO ] Running command: sudo /bin/ceph --cluster=ceph osd stat --format=json

    [storage1][WARNIN] No data was received after 300 seconds, disconnecting...
    [ceph_deploy.osd][DEBUG ] Host storage1 is now ready for osd use.

    But the actual OSD is not ready:

    [[email protected] ceph-cluster]$ ceph -s
    cluster:
    id: 4165bd5f-f38d-4c6b-b1e3-287f800435b8
    health: HEALTH_OK

    services:
    mon: 1 daemons, quorum rdo-cc
    mgr: rdo-cc(active)
    osd: 0 osds: 0 up, 0 in

    data:
    pools: 0 pools, 0 pgs
    objects: 0 objects, 0B
    usage: 0B used, 0B / 0B avail
    pgs:

    I tried the same thing on storage2 and got the same timeouts and the same end result.

  • I tried running the command:

    sudo /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/vdb

    directly on storage1 to see what happens:

    [[email protected] ~]$ sudo /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/vdb
    Running command: /bin/ceph-authtool --gen-print-key
    Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new e5682c7a-ada1-48a9-bfb2-d2ab4a46aba0
    stderr: 2018-09-21 18:41:51.464896 7fda7cbf6700 0 monclient(hunting): authenticate timed out after 300
    stderr: 2018-09-21 18:41:51.465000 7fda7cbf6700 0 librados: client.bootstrap-osd authentication error (110) Connection timed out
    stderr: [errno 110] error connecting to the cluster
    --> RuntimeError: Unable to create a new OSD id

    Not sure if this is actually related to the problem running the ceph-deploy command from rdo-cc or not.

  • @MichaelVonderbecke said:
    I tried running the command:

    sudo /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/vdb

    directly on storage1 to see what happens:

    [[email protected] ~]$ sudo /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/vdb
    Running command: /bin/ceph-authtool --gen-print-key
    Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new e5682c7a-ada1-48a9-bfb2-d2ab4a46aba0
    stderr: 2018-09-21 18:41:51.464896 7fda7cbf6700 0 monclient(hunting): authenticate timed out after 300
    stderr: 2018-09-21 18:41:51.465000 7fda7cbf6700 0 librados: client.bootstrap-osd authentication error (110) Connection timed out
    stderr: [errno 110] error connecting to the cluster
    --> RuntimeError: Unable to create a new OSD id

    Not sure if this is actually related to the problem running the ceph-deploy command from rdo-cc or not.

    The timeout is caused by the IPTABLES on rdo-cc.
    Quick Fix: run on rdo-cc # sudo iptables -F.
    Long fix: create a rule in IP-Tables to get pass this.

    Thanks!
    Vipinsagar

  • @vipinsagar said:

    @MichaelVonderbecke said:
    I tried running the command:

    sudo /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/vdb

    directly on storage1 to see what happens:

    [[email protected] ~]$ sudo /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/vdb
    Running command: /bin/ceph-authtool --gen-print-key
    Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new e5682c7a-ada1-48a9-bfb2-d2ab4a46aba0
    stderr: 2018-09-21 18:41:51.464896 7fda7cbf6700 0 monclient(hunting): authenticate timed out after 300
    stderr: 2018-09-21 18:41:51.465000 7fda7cbf6700 0 librados: client.bootstrap-osd authentication error (110) Connection timed out
    stderr: [errno 110] error connecting to the cluster
    --> RuntimeError: Unable to create a new OSD id

    Not sure if this is actually related to the problem running the ceph-deploy command from rdo-cc or not.

    The timeout is caused by the IPTABLES on rdo-cc.
    Quick Fix: run on rdo-cc # sudo iptables -F.
    Long fix: create a rule in IP-Tables to get pass this.

    Thanks!
    Vipinsagar

    This did fix the problem, although I found it strange because sudo iptables -L had no rules in any chains, so I'm not sure why sudo iptables -F would have actually fixed antyhing :)

  • It could be that the default policy of a chain somehow has been changed. If you flush it with iptables -F it will change the Policy back to default, which is wide open. I'll continue to investigate the issue.

    Thanks for posting the fix!

    Regards,

Sign In or Register to comment.