Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lexer handling newlines incorrectly in some cases #3167

Closed
Synthetic-Dev opened this issue Jan 18, 2024 · 1 comment · Fixed by #3264
Closed

Lexer handling newlines incorrectly in some cases #3167

Synthetic-Dev opened this issue Jan 18, 2024 · 1 comment · Fixed by #3264
Labels

Comments

@Synthetic-Dev
Copy link

Synthetic-Dev commented Jan 18, 2024

Marked version:
11.1.1

Describe the bug
When using the lexer it seems to leave newlines at the end of some tokens instead of tokenizing them

To Reproduce
Input (hr):

console.log(lexer.lex("---------------------------------\n\nhi"))

Output (hr):

[
    {
        "type": "hr",
        "raw": "---------------------------------\n\n"
    },
    {
        "type": "paragraph",
        "raw": "hi",
        "text": "hi",
        "tokens": [
            {
                "type": "text",
                "raw": "hi",
                "text": "hi"
            }
        ]
    }
]

and input (blockquote):

console.log(lexer.lex("> blockquote\n\nhi"))

Output (blockquote):

[
    {
        "type": "blockquote",
        "raw": "> blockquote\n\n",
        "tokens": [
            {
                "type": "paragraph",
                "raw": "blockquote",
                "text": "blockquote",
                "tokens": [
                    {
                        "type": "text",
                        "raw": "blockquote",
                        "text": "blockquote"
                    }
                ]
            }
        ],
        "text": "blockquote"
    },
    {
        "type": "paragraph",
        "raw": "hi",
        "text": "hi",
        "tokens": [
            {
                "type": "text",
                "raw": "hi",
                "text": "hi"
            }
        ]
    }
]

For both of these examples you can see that the 2 newlines are being ignored and not tokenized by the lexer.
This is with gfm: true and breaks: true

Expected behavior
For hr input:

[
    {
        "type": "hr",
        "raw": "---------------------------------"
    },
    {
        "type": "space",
        "raw": "\n\n"
    },
    {
        "type": "paragraph",
        "raw": "hi",
        "text": "hi",
        "tokens": [
            {
                "type": "text",
                "raw": "hi",
                "text": "hi"
            }
        ]
    }
]

For blockquote input:

[
    {
        "type": "blockquote",
        "raw": "> blockquote",
        "tokens": [
            {
                "type": "paragraph",
                "raw": "blockquote",
                "text": "blockquote",
                "tokens": [
                    {
                        "type": "text",
                        "raw": "blockquote",
                        "text": "blockquote"
                    }
                ]
            }
        ],
        "text": "blockquote"
    },
    {
        "type": "br",
        "raw": "\n"
    },
    {
        "type": "paragraph",
        "raw": "hi",
        "text": "hi",
        "tokens": [
            {
                "type": "text",
                "raw": "hi",
                "text": "hi"
            }
        ]
    }
]
@UziTech
Copy link
Member

UziTech commented Jan 19, 2024

The space token is used in places where it is needed. For example if two paragraphs are next to each other they become one paragraph token unless there is a blank line (space token) between them.

If you want to create a PR to add space tokens after each block token that would be fine, but I think it will be a breaking change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants