[specification] ISD003 upload-charm-docs: Contents Index and Navigation Table

Abstract

The upload-charm-docs GitHub action currently auto-generates the navigation table based on file hierarchy and alphabetical order of file names. To improve the flow of documentation, the optimal order of items on the navigation tab is not always alphabetical. Additionally, users need to include items in the navigation table which are not discourse posts and add hidden items so that they can be linked.

Rationale

This change adds support for users to customise the ordering of items on the navigation table to improve the documentation reader experience. This means that, for example, the getting started guide could be at the top whilst the technical reference would be listed later. Users can still chose not to customise the ordering and retain the default of alphabetical ordering. It also adds support for including links to pages other than discourse in the navigation table, such as a link to the charm actions. Finally, this change also adds support for items that should not appear on the navigation table but should be linkable from elsewhere.

Specification

Overview

The index.md file will have an optional # Contents section which will be interpreted as a contents index to other files within the documentation. This will be converted to the navigation table stored on discourse. *.md files not in the list will be added in alphabetical order after any listed items for that folder. Items on the list that are commented out, will be interpreted to be unlisted items and will be added to the discourse navigation table without a level. After writing the navigation table to discourse, the equivalent contents section will be added back to the repository as a PR.

Contents Index

The following is an example for what the # Contents section would look like and the resulting navigation table on discourse. index.md:

# Contents

1. [Reference](reference)
  1. [Actions](https://charmhub.io/nginx-ingress-integrator/actions)
  2. [Configurations](https://charmhub.io/nginx-ingress-integrator/configure)
  3. [Integrations](reference/integrations.md)

On discourse:

# Navigation

| level | path | navlink |
| --- | --- | --- |
| 1 | reference | [Reference]() |
| 2 | charmhub-io-nginx-ingress-integrator-actions | [Actions](https://charmhub.io/nginx-ingress-integrator/actions) |
| 2 | charmhub-io-nginx-ingress-integrator-configure | [Configurations](https://charmhub.io/nginx-ingress-integrator/configure) |
| 2 | reference-integrations | [Integrations](/t/nginx-ingress-integrator-docs-reference-integrations/7756) |

There would be a few valid permutations of markdown lists, for example:

# Contents

1. [Reference](reference)
  a. [Actions](https://charmhub.io/nginx-ingress-integrator/actions)
  b. [Configurations](https://charmhub.io/nginx-ingress-integrator/configure)
  c. [Integrations](reference/integrations.md)

# Contents

* [Reference](reference)
  * [Actions](https://charmhub.io/nginx-ingress-integrator/actions)
  * [Configurations](https://charmhub.io/nginx-ingress-integrator/configure)
  * [Integrations](reference/integrations.md)

# Contents

- [Reference](reference)
  - [Actions](https://charmhub.io/nginx-ingress-integrator/actions)
  - [Configurations](https://charmhub.io/nginx-ingress-integrator/configure)
  - [Integrations](reference/integrations.md)

# Contents

- [Reference](reference)
  1. [Actions](https://charmhub.io/nginx-ingress-integrator/actions)
  2. [Configurations](https://charmhub.io/nginx-ingress-integrator/configure)
  3. [Integrations](reference/integrations.md)

The links can be 3 things:

  • A local link to a directory (e.g., [Tutorials](tutorials) which links to the tutorials directory)
  • A local link to a file (e.g., [Getting Started](tutorials/getting-started.md) which links to the tutorials/getting-started.md file)
  • An external link (e.g., [Configurations](https://charmhub.io/nginx-ingress-integrator/configure))

Items may be wrapped in <!-- ... --> which is interpreted as the line is commented out and the wrapper is ignored when parsing the line.

*.md and directories in docs not listed in the contents index will be added in alphabetical order after any items that are listed. This is to ensure backwards compatibility.

References are checked for validity. A link to a file or directory that does not exist will result in an error. A link to an external reference that results in a non 2XX HTTP response to a HEAD request will result in an error.

Translation to Discourse

  • The list hierarchy indicates the level on the navigation table, this is checked against the file structure and results in an error/ warning to the user if it is not a match
  • Files and directories don’t have to be listed, if they are not listed they are injected in the appropriate location after any listed items (for backwards compatibility and ease of use) in alphabetical order
  • Commented out lines will appear on the navigation table without a level

Migration

The migration algorithm will be adapted to generate the contents index:

  • Items without a level will be interpreted to be within the most recently encountered group with a level.
  • Groups without a level will be ignored. This is because it is not clear what the purpose of a group without a level is and it is not possible to know whether any following items are intended to be within that group or a different group since the path is not guaranteed to include the group.
  • This migration algorithm can be adapted in the future to look at the path to guess whether items should be within a group. This is not expected to be a frequent requirement (migration is only done once per project) and it is easy for users to correct any misinterpretations after the migration is complete.

Potentially Useful Markdown Libraries

A key requirement is to be able to interpret hierarchical markdown lists. None of the assessed markdown libraries seem to support this. The Python-Markdown library was checked but doesn’t help with interpreting the hierarchy of list items:

print(test)
# Contents

1. [Reference](reference)
  a. [Actions](https://charmhub.io/nginx-ingress-integrator/actions)
  b. [Configurations](https://charmhub.io/nginx-ingress-integrator/configure)
  c. [Integrations](reference/integrations.md)
>>> markdown.convert(test)
'<h1>Contents</h1>\n<ol>\n<li><a href="reference">Reference</a>\n  a. <a href="https://charmhub.io/nginx-ingress-integrator/actions">Actions</a>\n  b. <a href="https://charmhub.io/nginx-ingress-integrator/configure">Configurations</a>\n  c. <a href="reference/integrations.md">Integrations</a></li>\n</ol>'

https://github.com/miyuchina/mistletoe produces:

# Contents

1. one
  1. one one
  2. one two
    1. one two one
    2. one two two
  3. one three
2. two
3. three
{
    "type": "Document",
    "footnotes": {},
    "children": [
        {"type": "Heading", "level": 1, "children": [{"type": "RawText", "content": "Contents"}]},
        {
            "type": "List",
            "children": [
                {
                    "type": "ListItem",
                    "leader": "1.",
                    "prepend": 3,
                    "children": [
                        {"type": "Paragraph", "children": [{"type": "RawText", "content": "one"}]}
                    ],
                    "loose": False,
                },
                {
                    "type": "ListItem",
                    "leader": "1.",
                    "prepend": 5,
                    "children": [
                        {
                            "type": "Paragraph",
                            "children": [{"type": "RawText", "content": "one one"}],
                        }
                    ],
                    "loose": False,
                },
                {
                    "type": "ListItem",
                    "leader": "2.",
                    "prepend": 5,
                    "children": [
                        {
                            "type": "Paragraph",
                            "children": [{"type": "RawText", "content": "one two"}],
                        }
                    ],
                    "loose": False,
                },
                {
                    "type": "ListItem",
                    "leader": "1.",
                    "prepend": 7,
                    "children": [
                        {
                            "type": "Paragraph",
                            "children": [{"type": "RawText", "content": "one two one"}],
                        }
                    ],
                    "loose": False,
                },
                {
                    "type": "ListItem",
                    "leader": "2.",
                    "prepend": 7,
                    "children": [
                        {
                            "type": "Paragraph",
                            "children": [{"type": "RawText", "content": "one two two"}],
                        }
                    ],
                    "loose": False,
                },
                {
                    "type": "ListItem",
                    "leader": "3.",
                    "prepend": 5,
                    "children": [
                        {
                            "type": "Paragraph",
                            "children": [{"type": "RawText", "content": "one three"}],
                        }
                    ],
                    "loose": False,
                },
                {
                    "type": "ListItem",
                    "leader": "2.",
                    "prepend": 3,
                    "children": [
                        {"type": "Paragraph", "children": [{"type": "RawText", "content": "two"}]}
                    ],
                    "loose": False,
                },
                {
                    "type": "ListItem",
                    "leader": "3.",
                    "prepend": 3,
                    "children": [
                        {
                            "type": "Paragraph",
                            "children": [{"type": "RawText", "content": "three"}],
                        }
                    ],
                    "loose": False,
                },
            ],
            "loose": False,
            "start": 1,
        },
    ],
}

Which is not very helpful. Using markdown_it — markdown-it-py produces:

print(node.pretty(indent=2, show_text=True))
<root>
  <heading>
    <inline>
      <text>
        Contents
  <ordered_list>
    <list_item>
      <paragraph>
        <inline>
          <link href='reference'>
            <text>
              Reference
    <list_item>
      <paragraph>
        <inline>
          <link href='https://charmhub.io/nginx-ingress-integrator/actions'>
            <text>
              Actions
    <list_item>
      <paragraph>
        <inline>
          <link href='https://charmhub.io/nginx-ingress-integrator/configure'>
            <text>
              Configurations
    <list_item>
      <paragraph>
        <inline>
          <link href='reference/integrations.md'>
            <text>
              Integrations

Contents Index Interpretation Algorithm

The required information from each list item is (1) the hierarchy, (2) the title of the link and (3) the value of the link. The hierarchy is needed to be able to validate the structure of the list matches the structure of the directory. Any list item line is generalised to: <comment>?<whitespace><leader><reference> where:

  • <comment> is optionally whether the line is wrapped with <!-- ... -->
  • whitespace indicates the hierarchy by the number of characters, only spaces are supported, any other whitespace here will raise an exception
  • leader can be ((\d\.)|(\*)|(-))\s*
  • reference is basically some sort of link (local to a file or directory or a remote link)

There would be 3 kinds of lines, file, link and directory;

  • directory:
    • indicated by that the reference points to a local directory (the reference must resolve to an actual directory)
    • Can be followed by any of the other items with a higher, equal or lower hierarchy
    • If it is followed by an item with a lower hierarchy, keep track of the reference for inclusion in nested checks
  • link:
    • indicated by that the reference points to a remote host (a HTTP request to that link must return a response that is not 4XX which hopefully is 2XX or 3XX although could be 5XX)
    • Can only be followed by items at the same or lower hierarchy
  • file:
    • Indicated by that the reference points to a local file (the reference must resolve to an actual file)
    • Can only be followed by items at the same or lower hierarchy

This is a classic recursive problem. The following data structure would keep track of items:

class ListItem(typing.NamedTuple):
    """"
    Attrs:
        hierarchy: The number of parent items to the root of the list.
        reference_title: The name of the reference
        reference_value: The link to the referenced item
        rank: The number of preceding elements in the list at any hierarchy
        hidden: Whether the item should not be displayed on the navigation table
    """"
    hierarchy: int
    reference_title: str
    reference_value: str
    rank: int
    hidden: bool

Proposed algorithm:

  1. The base case is there is no new item, raise StopIteration
  2. The non-recursive case is at a given aggregate_directory, hierarchy (defaults to 0) and whitespace_expectation(defaults to the number of whitespace characters at the root), the current item has the same whitespace count as the expectation and is a file or link, yield the item
  3. An error case is that the current item is a file or link at a higher whitespace count than the expectation and the previous item is not a directory, raise an exception
  4. Another non-recursive case is that the current item is a file or link at a higher whitespace count than the expectation and the previous item is a directory, yield the item and update whitespace_expectation with the whitespace character count of the item
  5. The recursive case to go one level deeper is that the current item is a directory at the same hierarchy, yield the item and do a recursive call incrementing the hierarchy and appending the directory to aggregate_directory and pass in the current whitespace_expectation
  6. The recursive case to go one level higher is for any item where the whitespace count is less than the whitespace_expectation, do not yield the item and return keeping the item to be processed

Matching Disk and Contents Index Items

The other piece is how to combine the index list with the disk list. This is maybe where a tree data structure would help, although I have been thinking about it an can’t think of how it could help in O(N) time. The following algorithm should be O(N).

We need a few data structures:

  1. List of all the PathInfo sorted by alphabetical_rank
  2. Iterable of ListItem sorted by their rank in index.md
  3. A dictionary that has the local_path of the PathInfos with a value of whether the item has been yielded
  4. A dictionary that has the local_path as a key and the respective PathInfo as the value
  5. A dictionary with all the directories of PathInfo and their index into the PathInfo list

The algorithm to get the PathInfo in the correct order is:

  1. iterate through the ListItem and yield the PathInfo using the local_path to PathInfo lookup. That lookup can only fail for remote links which will also need to be yielded.
  2. Each time a PathInfo is yielded, mark it off in the local_path to whether it has been yielded dictionary
  3. Whenever a decrease in hierarchy of ListItem is detected, use the current ListItem directory to lookup the starting index using the directory to PathInfo index and yield items sequentially that have not yet been yielded whilst the PathInfo still has the directory at the start of it path, mark them as yielded, then continue yielding items based on ListItem
  4. Yield all the remaining PathInfo that have not yet been yielded

Execution Plan

  1. The contents index for items that only point to local files and directories
  2. Commented out items end up on discourse without a level
  3. Items on the contents index can point at other resources
  4. Migration to generate the contents index
  5. Action to create a PR for any items not on the contents index
  6. Checks that external references return a 2XX response on a HEAD request

Further Information

Link to the GitHub issue: https://github.com/canonical/upload-charm-docs/issues/26

1 Like