Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No TieBreaker added in case of it's absence #361

Open
duckhawk opened this issue Jun 19, 2023 · 3 comments
Open

No TieBreaker added in case of it's absence #361

duckhawk opened this issue Jun 19, 2023 · 3 comments

Comments

@duckhawk
Copy link

If there is not enough online nodes in cluster, no TieBreaker created for resources with replica factor 2.

If no TieBreaker, in this case (or it wasn't created because any other reason) Linstor won't create TieBreaker in future.

Herewith, if I look advices, I see that there is problem
linstor r advice: Resource has 2 replicas but no tie-breaker, could lead to split brain

@ghernadi
Copy link
Contributor

ghernadi commented Jul 20, 2023

Hello,
can you please elaborate a bit more on this issue? I could not reproduce it with v1.23.0:

Logs
linstor n c bravo
linstor n c charlie
linstor sp c lvm bravo lvmpool scratch
linstor sp c lvm charlie lvmpool scratch
linstor rd c rsc
linstor vd c rsc 1G
linstor r c bravo charlie rsc -s lvmpool
linstor --no-utf8 --no-color r l -a
+-------------------------------------------------------------------------------------+
| ResourceName | Node    | Port | Usage  | Conns |        State | CreatedOn           |
|=====================================================================================|
| rsc          | bravo   | 7000 | Unused | Ok    |     UpToDate | 2023-07-20 08:11:46 |
| rsc          | charlie | 7000 | Unused | Ok    | Inconsistent | 2023-07-20 08:11:46 |
+-------------------------------------------------------------------------------------+

linstor --no-utf8 --no-color n c delta
SUCCESS:
Description:
    New node 'delta' registered.
Details:
    Node 'delta' UUID is: 90543b6f-6cba-4763-b773-0366f7c6c936
SUCCESS:
Description:
    Node 'delta' authenticated
Details:
    Supported storage providers: [diskless, lvm, lvm_thin, zfs, zfs_thin, file, file_thin, remote_spdk, openflex_target, ebs_init, ebs_target]
    Supported resource layers  : [drbd, luks, nvme, writecache, cache, bcache, openflex, storage]
    Unsupported storage providers:
        SPDK: IO exception occured when running 'rpc.py spdk_get_version': Cannot run program "rpc.py": error=2, No such file or directory
        EXOS: IO exception occured when running 'lsscsi --version': Cannot run program "lsscsi": error=2, No such file or directory
              '/bin/bash -c cat /sys/class/sas_phy/*/sas_address' returned with exit code 1
              '/bin/bash -c cat /sys/class/sas_device/end_device-*/sas_address' returned with exit code 1
SUCCESS:
    Successfully set property key(s): StorPoolName
INFO:
    Tie breaker resource 'rsc' created on DfltDisklessStorPool
INFO:
    Resource-definition property 'DrbdOptions/Resource/quorum' updated from 'off' to 'majority' by auto-quorum
INFO:
    Resource-definition property 'DrbdOptions/Resource/on-no-quorum' updated from 'off' to 'io-error' by auto-quorum
SUCCESS:
    Created resource 'rsc' on 'delta'
SUCCESS:
    Added peer(s) 'delta' to resource 'rsc' on 'bravo'
SUCCESS:
    Added peer(s) 'delta' to resource 'rsc' on 'charlie'
SUCCESS:
Description:
    Resource 'rsc' on 'delta' ready
Details:
    Node: delta

linstor --no-utf8 --no-color r l -a
+-------------------------------------------------------------------------------------------+
| ResourceName | Node    | Port | Usage  | Conns |              State | CreatedOn           |
|===========================================================================================|
| rsc          | bravo   | 7000 | Unused | Ok    |           UpToDate | 2023-07-20 08:11:46 |
| rsc          | charlie | 7000 | Unused | Ok    | SyncTarget(37.36%) | 2023-07-20 08:11:46 |
| rsc          | delta   | 7000 | Unused | Ok    |         TieBreaker | 2023-07-20 08:11:51 |
+-------------------------------------------------------------------------------------------+

Please note the

INFO:
    Tie breaker resource 'rsc' created on DfltDisklessStorPool

during linstor n c delta.

Also you might want to check linstor resource-definition list-properties <resource_name> whether DrbdOptions/auto-add-quorum-tiebreaker is set to False or not, since Linstor disables auto-tiebreaker if someone actively deletes the tiebreaker (could also be done by a plugin):

linstor --no-utf8 --no-color r d delta rsc
INFO:
    Disabling auto-tiebreaker on resource-definition 'rsc' as tiebreaker resource was manually deleted
SUCCESS:
.....

linstor --no-utf8 --no-color rd lp rsc
+-----------------------------------------------------------+
| Key                                    | Value            |
|===========================================================|
| DrbdOptions/Resource/quorum            | off              |
| DrbdOptions/auto-add-quorum-tiebreaker | False            |
| DrbdOptions/auto-verify-alg            | crct10dif-pclmul |
| DrbdPrimarySetOn                       | BRAVO            |
+-----------------------------------------------------------+

If you can reproduce your issue, please add the needed steps as well as the version of the Linstor controller.

@Ulrar
Copy link

Ulrar commented Nov 1, 2023

Hi,
I don't know if it's the same issue as OP, but I am also missing TieBreakers right now using piraeus-operator. The way I got here :

  • Installed one node only, call it A
  • Added a second node, call it B.
  • Bumped the placement count to 2, which worked as expected
  • Added a third node, call it C. It was used to automatically create TieBreakers as expected
  • Evacuated node B, which moved all to C as expected
  • Restored B

Now whatever I do, I can't get it to create TieBreakers on B. I tried toggling auto-add-quorum-tiebreaker on the controller off then on but that didn't fix it. That property does not exist on the resources themselves however, maybe that's the issue ? Presumably evacuating one node in a 3 node cluster counts as actively deleting it ?

Is there a way to get linstor to re-evaluate and create the missing tie breakers now without needing to re-create the resources ?
linstor controller 1.25.0; GIT-hash: ac6be8b59c99ae4157b4368df646cf530444d70f

@TheSiman
Copy link

Hi,
I've been playing around with linstor on xcp-ng (XOSTOR) and I've ran into this while testing node failures/replacement. I've ran into multiple possible states this happens in. The cases are slightly different, but in all of them there are missing tiebreakers.

case 1:
Steps:

  • node evacuate
    Result:
  • DrbdOptions/auto-add-quorum-tiebreaker = False (on resource definition)
  • lost tiebreakers don't get recreated

case 2:
Steps:

  • node lost
    Result:
  • DrbdOptions/auto-add-quorum-tiebreaker = False (on resource definition)
  • lost tiebreakers don't get recreated

(XOSTOR specific) case 3:
Steps:

  • Remove node using "removeHost" function of linstor-manager plugin
    Result:
  • DrbdOptions/auto-add-quorum-tiebreaker = True (on resource definition)
  • lost tiebreakers don't get recreated

I have noticed that if I re-enable automatic tiebreakers on a resource, that new tiebreakers are created (even if the drbd option for auto tiebreakers was never set to False on the resource definition). I use this workaround now:

linstor resource-definition list -p | grep -o "xcp-volume\\S\*" | sort -u | xargs -I {} linstor resource-definition set-property {} DrbdOptions/auto-add-quorum-tiebreaker True (this example is xcp-ng specific, but the way it works is not i think)

systemctl restart linstor-controller (this is needed, because tiebreakers get created in an "Unknown" state, after controller restart they are fine)

Now while I think this workaround is a method to, in a way, re-evaluate the tiebreakers, it's likely not viable for many use-cases. I hope it can be of some use to uncover the root cause though or as a stopgap solution for someone with a similar use-case.

Linstor version: Linstor controller version: linstor controller 1.26.1; GIT-hash: 12746ac9c6e7882807972c3df56e9a89eccad4e5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants