Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SparseVectorStats #108793

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

kderusso
Copy link
Member

@kderusso kderusso commented May 17, 2024

Relates to #98275

Adds statistics on the number of sparse_vector fields in an index or cluster.

Because we can't pull dimensionality from the Lucene index, this relies on sparse_vector mappings to identify the documents to calculate fields for.

Here is an example script to test:

PUT my-index-1
{
  "mappings": {
    "properties": {
      "sparse_field1": {
        "type": "sparse_vector"
      },
      "sparse_field2": {
        "type": "sparse_vector"
      },
      "dense_field": {
        "type": "dense_vector"
      },
      "nonsparse_field": {
        "type": "keyword"
      }
    }
  },
  "settings": {
    "index.number_of_shards": 1,
    "index.number_of_replicas": 1
  }
}

// my-index-2 has 3 sparse_vector fields
PUT my-index-2
{
  "mappings": {
    "properties": {
      "sparse_field1": {
        "type": "sparse_vector"
      },
      "sparse_field2": {
        "type": "sparse_vector"
      },
      "sparse_field3": {
        "type": "sparse_vector"
      },
      "nonsparse_field": {
        "type": "keyword"
      }
    }
  },
  "settings": {
    "index.number_of_shards": 1,
    "index.number_of_replicas": 1
  }
}

PUT my-index-1/_doc/1
{
  "sparse_field1": { "a": 1, "b": 2 },
  "sparse_field2": { "c": 3, "d": 4, "e": 5},
  "dense_field": [1, 2, 3]
}

PUT my-index-1/_doc/2
{
  "sparse_field1": { "f": 1, "g": 2, "h": 3, "i": 4},
  "sparse_field2": { "j": 5, "k": 6, "l": 7},
  "dense_field": [2, 3, 4]
}

PUT my-index-2/_doc/1
{
  "sparse_field1": { "m": 1, "n": 2 },
  "sparse_field2": { "o": 3, "p": 4, "q": 5},
  "sparse_field3": { "a": 1, "b": 2 },
  "nonsparse_field": "cupcakes"
}

PUT my-index-2/_doc/2
{
  "nonsparse_field": "eclairs"
}

POST my-index-1/_refresh

POST my-index-2/_refresh

// _all returns 7 sparse_vector fields, which are then broken down by index
GET _stats/sparse_vector

// correctly returns 4 sparse_vector fields
GET my-index-1/_stats/sparse_vector

// correctly returns 3 sparse_vector fields
GET my-index-2/_stats/sparse_vector

GET /_nodes/stats

GET /_cat/shards?h=i,dvc,svc

Copy link

Documentation preview:

++++
<titleabbrev>cat nodes</titleabbrev>
++++

[IMPORTANT]
====
cat APIs are only intended for human consumption using the command line or {kib}
console. They are _not_ intended for use by applications. For application
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the docs refactoring could belong to a separate PR, labeled with "docs", so it's easier to review

@kderusso kderusso force-pushed the kderusso/sparse-vector-stats branch from eadbf60 to 254a1d6 Compare May 24, 2024 17:41
@@ -149,9 +155,6 @@ Node uptime, such as `17.3m`.
`completion.size`, `cs`, `completionSize`::
Size of completion, such as `0b`.

`dense_vector.value_count`, `dvc`, `denseVectorCount`::
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: @jimczi this does not work, so I removed it from the documentation. It can be added back if we decide to fix support for _cat/nodes.

@kderusso kderusso changed the title WIP: Add SparseVectorStats Add SparseVectorStats May 29, 2024
@kderusso kderusso force-pushed the kderusso/sparse-vector-stats branch 2 times, most recently from 89df8dd to 1158acc Compare May 29, 2024 21:04
@kderusso kderusso force-pushed the kderusso/sparse-vector-stats branch from 1158acc to eff9791 Compare May 30, 2024 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants