Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many packages does not distinguish halfwidth and fullwidth characters #310

Open
4 tasks done
simnalamburt opened this issue Mar 12, 2024 · 19 comments
Open
4 tasks done
Labels
🤞 phase/open Post is being triaged manually

Comments

@simnalamburt
Copy link

Initial checklist

Affected packages and versions

remark-lint-table-pipe-alignment@3.1.3

Link to runnable example

No response

Steps to reproduce

Open below with browser:

<!doctype html>
<script type="module">
import {reporter} from 'https://esm.sh/vfile-reporter@8.1.0?bundle'
import {remark} from 'https://esm.sh/remark@15.0.1?bundle'
import remarkGfm from 'https://esm.sh/remark-gfm@4.0.0?bundle'
import remarkLint from 'https://esm.sh/remark-lint@9.1.2?build'
import remarkLintTablePipeAlignment from 'https://esm.sh/remark-lint-table-pipe-alignment@3.1.3?bundle'

const file = await remark()
  .use(remarkGfm)
  .use(remarkLint)
  .use(remarkLintTablePipeAlignment)
  .process(`
False positive
--------
Below should be OK but linter will report it as error.

| Alpha | 한글 |
|-------|------|
| a     | b    |

False negative
--------
Below should be error but linter will report it as OK.

| Alpha | 한글 |
|-------|------|
| a     | b  |
`)

document.body.innerHTML = `<pre>${reporter(file)}</pre>`
</script>

Expected behavior

The first one should be fine and the second one should be an error.

16:16-16:17 warning Misaligned table fence table-pipe-alignment remark-lint

⚠ 1 warning

Actual behavior

However, the first one gives an error and the second one does not.

8:16-8:17 warning Misaligned table fence table-pipe-alignment remark-lint

⚠ 1 warning

Runtime

Node v17, Node v16, Node v14, Node v12, Deno, Electron, Other (please specify in steps to reproduce)

Package manager

npm 8, npm 7, npm 6, yarn 2, yarn 1, pnpm, Other (please specify in steps to reproduce)

OS

Windows, Linux, macOS, Other (please specify in steps to reproduce)

Build and bundle tools

Webpack, Rollup, esbuild, Parcel, Create React App, Gatsby, Next.js, Remix, Docusaurus, Snowpack, Vite, Other (please specify in steps to reproduce)

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Mar 12, 2024
@wooorm
Copy link
Member

wooorm commented Mar 13, 2024

Hey!

This rule measures characters.
The size of displayed characters is different in different places. Characters will be different in a terminal, different here on GitHub, different in your editor, different in my editor.

There isn’t a good solution. I recommend turning this rule off instead. See https://github.com/remarkjs/remark-lint/tree/main/packages/remark-lint-table-pipe-alignment#fix for more info

@simnalamburt
Copy link
Author

simnalamburt commented Mar 13, 2024

Majority of programmers codes with fixed sized font and width of CJK characters in such situation are well defined and documented in here: https://unicode.org/Public/UNIDATA/EastAsianWidth.txt

We at least need to create a new option to respect the size of full width character. Something like "fixed width font mode"

@wooorm
Copy link
Member

wooorm commented Mar 13, 2024

Have you tested your ideas in different code editors?

@simnalamburt
Copy link
Author

simnalamburt commented Mar 13, 2024

I'm Korean. Yes I have been used CJK letters more than 20 years in many editors including vi, nvim, Vim, nano, micro, ed, terminal, emacs, VSCode, atom, zed, notepad.exe, notepad++, Visual Studio, sublime. In these editors, CJK letters will be displayed as fullwidth character (which have double width of halfwidth character) unless you explicitly change your font into something other than fixed width font.

@simnalamburt
Copy link
Author

simnalamburt commented Mar 13, 2024

Just in case you might not familiar with CJK: https://en.wikipedia.org/wiki/Halfwidth_and_fullwidth_forms

image

image

@wooorm
Copy link
Member

wooorm commented Mar 13, 2024

I’m looking at your code example here on GH and they’re both displayed “misaligned”

@simnalamburt
Copy link
Author

I don't think it's a good idea to ignore this issue to satisfy GitHub's misrepresentation of CJK characters.

@simnalamburt
Copy link
Author

I at least want to solve this problem by creating a linter option that respects the size of full-width characters. Since it's an option, it shouldn't affect existing behavior. I strongly believe that this should be the default behavior, not an option to opt-in, and this is what any CJK developer would have done. But if you disagree, I'd be happy to see it fixed as an option. What do you think?

@wooorm
Copy link
Member

wooorm commented Mar 13, 2024

GH is using the web. CSS. A monospaced font. Many editors use the web/css/monospace.

My point is that all tools are going to display it differently.
What is the point in having an option to this rule which will look fine for some users and broken for other users.

That's what the rule currently does: broken for some, good for some. So I suggest not using this rule.

You say that all tools behave one way. I show GH doing it differently. I know editors display emoji differently. Can you provide screenshots for different tools displaying your examples correctly?

@simnalamburt
Copy link
Author

This it it

image

Emoji containing version if you need:

image

@simnalamburt
Copy link
Author

"remark-lint-table-pipe-alignment" have been existed for a long time because it is useful to some users even if it's fine for some non-CJK users and broken for CJK users.

And so the new fullwidth-respecting option will be the same. It will be pretty much useful for CJK users.

@wooorm
Copy link
Member

wooorm commented Mar 13, 2024

Everyone uses emoji. Not just Koreans.

different tools

Can you please show different editors? And label which editors you show?

@simnalamburt
Copy link
Author

neovim:
image

iTerm2:
image

VSCode:
image

zed:
image

I'll upload windows examples tomorrow since I cannot access my windows devices for right now.

@xnuk
Copy link

xnuk commented Mar 13, 2024

GH is using the web. CSS. A monospaced font. Many editors use the web/css/monospace.

And many editors try to vertically align a CJK character with 2 latin alphabets, regardless of font configs. Terminal-based editors already do this - every Unicode-aware terminal emulators try to treat a CJK character width as 2 latin chars by default.

Ace, a web-based editor (which is used by Rust Playground), tries to vertically align intentionally. They measures font size and manually put twice size of it, like: <span style="width: 14.3984px;" class="ace_cjk">한</span>.

image

GitHub's align isn't right because, in my computer, they don't use a monospaced font, but they use two fonts: 'Liberation Mono' and 'Noto Sans Mono CJK KR' (because Liberation Mono doesn't have any CJK chars). Using only 'Noto Sans Mono CJK KR' mitigates the problem:

image

Using popular alternative fonts like Neo둥근모 or D2Coding just fits right:

Neo둥근모 D2Coding
image image

@xnuk
Copy link

xnuk commented Mar 14, 2024

Yes, character width can be different by editors, and by fonts. But the point is, you should Unicode-aware if you do "align" something, and 'treat CJK character width as same as two latin alphabets' is a good practice and sensible one. wcwidth-like functions can help this.

simnalamburt added a commit to simnalamburt/remark-lint that referenced this issue Mar 19, 2024
@simnalamburt
Copy link
Author

I've created a patch that addresses this issue as follows: 1e659c2

What do you think? (I didn't know how to make it an opt-in option, so I included it in the default behavior for now)

simnalamburt added a commit to simnalamburt/remark-lint that referenced this issue Mar 19, 2024
simnalamburt added a commit to portone-io/developers.portone.io that referenced this issue Mar 19, 2024
simnalamburt added a commit to portone-io/developers.portone.io that referenced this issue Mar 19, 2024
simnalamburt added a commit to portone-io/developers.portone.io that referenced this issue Mar 19, 2024
simnalamburt added a commit to portone-io/developers.portone.io that referenced this issue Mar 19, 2024
simnalamburt added a commit to simnalamburt/remark-lint that referenced this issue Mar 19, 2024
@simnalamburt
Copy link
Author

Similar issue found at 'remark-lint-table-cell-padding'. While code below is compact enough, remark-lint-table-cell-padding reports linter errors.

image

Plaintext:

|Enum Value  |Description   |
|------------|--------------|
|ORDER       |주문 정산     |
|ORDER_CANCEL|주문 취소 정산|
|MANUAL      |수동 정산     |

@simnalamburt
Copy link
Author

simnalamburt commented Mar 19, 2024

Similar issue found at 'remark-lint-maximum-line-length'. While code below is obviously crossed 80-width column, remark-lint-maximum-line-length does not report error.

image

Plaintext:

Mercury mercury mercury mercury mercury mercury mercury mercury mercury mercury
mercury.

한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~ 한국어~

Most formatters like Prettier, rustfmt, ... properly handles fullwidth characters in this situations:

image

@simnalamburt simnalamburt changed the title 'remark-lint-table-pipe-alignment' does not distinguish halfwidth and fullwidth characters Many packages does not distinguish halfwidth and fullwidth characters Mar 19, 2024
@wooorm
Copy link
Member

wooorm commented Apr 2, 2024

GitHub's align isn't right because, in my computer, they don't use a monospaced font, but they use two fonts: 'Liberation Mono' and 'Noto Sans Mono CJK KR' (because Liberation Mono doesn't have any CJK chars). Using only 'Noto Sans Mono CJK KR' mitigates the problem:

Every tool (that is reasonably good) does that, your editor too.
There are many Unicode characters. Fonts don’t have glyphs for all Unicode characters.
So they use a “font stack”. A preferred font renders a glyph, if not found, then the next font is used, and so forth.

Using popular alternative fonts like Neo둥근모 or D2Coding just fits right:

Right, so this comes back to my original point. How things show, depends on the final user’s computer, and what that user chooses.
Of course, CJK users have configured their editor to display CJK well.
Then, some other user comes in, who uses GitHub, and there the alignment is different.

Different users will see things differently. It is not known to remark-lint how things will display.

But the point is, you should Unicode-aware if you do "align" something, and 'treat CJK character width as same as two latin alphabets' is a good practice and sensible one. wcwidth-like functions can help this.

That is a strong statements while the topic remains vague.

What is “Unicode-aware”? Please explain or link to a specification or algorithm.
What about Emoji?
What about ANSI colors?
Control characters?
What do you want to do about ambiguous characters?
Which Unicode version do you want to go with?

Why should remark-lint align differently than GitHub?

wcwidth is indeed interesting. There is also https://www.npmjs.com/package/wcwidth, which is a JavaScript version, but not maintained. I have also used string-width in the past.

We do support configurable functions in remark-gfm: https://github.com/remarkjs/remark-gfm#example-stringlength.
I am open to such functionality of course.
But it needs to work well.

And, could there be a good default that works for every user?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤞 phase/open Post is being triaged manually
Development

No branches or pull requests

3 participants