Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(stdlib): Add Uri module #1970

Merged
merged 6 commits into from Mar 2, 2024
Merged

Conversation

alex-snezhko
Copy link
Member

@alex-snezhko alex-snezhko commented Jan 16, 2024

URI parsing and utility module, mostly conforming to RFC 3986.

module Main

include "uri"

let uri = Result.unwrap(Uri.parse("https://grain-lang.org"))
print(Uri.scheme(uri)) // https
print(Uri.host(uri)) // grain-lang.org

Uri.encode("a:b") // a%3Ab

Copy link
Member

@spotandjake spotandjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great.

Also looks like the docs need to be regenerated.

stdlib/uri.gr Outdated Show resolved Hide resolved
stdlib/uri.gr Outdated Show resolved Hide resolved
stdlib/uri.gr Outdated Show resolved Hide resolved
stdlib/uri.gr Outdated Show resolved Hide resolved
@marcusroberts
Copy link
Member

marcusroberts commented Jan 28, 2024

Alex, this looks fantastic and really useful. I also learnt a lot about what are valid URLs (I swore you could only include? once!). My only comment before I press approve is that pretty much every language calls urlEncode what you call percentageEncode, and I wondered if we should standardize on that?

(although wikipedia says it's officially percent encode... https://en.wikipedia.org/wiki/Percent-encoding)

Copy link
Member

@ospencer ospencer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What made you return an abstract record and have access methods rather than just return the record?

@alex-snezhko
Copy link
Member Author

@ospencer This is following the pattern I've seen used in OCaml libraries for encapsulating data/providing a uniform interface for simple & computed data. In this particular instance not much is really being encapsulated but I think having these access methods still provides a degree of uniformity. Kind of like the idea behind properties in C# or getters in Java, where the interface to a data type can remain disconnected from the implementation.

I guess just providing the record directly here would be simpler though and would make the API smaller. I'm not heavily set in either direction so I'm open to arguments in either direction

@ospencer
Copy link
Member

I don't mind it, I was just curious! I looked at the API first and it made me think some kind of computation was happening since I already called this parse function but then I'm still asking about specific pieces. I feel like in one, isolated case, I would just want the record, but we can think about it in the context of similar libraries we want to have.

@alex-snezhko
Copy link
Member Author

Yeah that makes sense. I can change it to a provided record

@ospencer
Copy link
Member

I'd wait to hear what the stdlib experts say :P

@alex-snezhko
Copy link
Member Author

@marcusroberts good point, I also just did a survey of a handful of languages and they all seem to prefer the terminology of "URL encoding". Though RFC 3986 uses the terminology of "percent encoding" so I guess it's a decision on whether we want to use the official terminology or the common lingo 🤷

@phated
Copy link
Member

phated commented Jan 29, 2024

it made me think some kind of computation was happening

I think I agree here. Usually if you are calling a function on some abstract type, it'd be due to delayed processing. For example, a Rust library I was using was storing the full cookie key=value string as &'a str and then storing the key and value range as integers. When you want to get the key, you'd call cookie.key() and cookie.value() which would do the processing. Since you are immediately doing processing in parse, it's probably good to just return the record.

The caveat to the above is returning records with functions attached; instead, I would say should be functions that take the record (such as our Queue, Map, etc data structures).

@spotandjake
Copy link
Member

@marcusroberts good point, I also just did a survey of a handful of languages and they all seem to prefer the terminology of "URL encoding". Though RFC 3986 uses the terminology of "percent encoding" so I guess it's a decision on whether we want to use the official terminology or the common lingo 🤷

I think we should use the common terminology but keep our docs searchable by including the percent encoding wording somewhere in the description.

@alex-snezhko
Copy link
Member Author

@spotandjake @marcusroberts What about just encode and decode? The fact that it's "URL" encoding should be implicit since it's in this module

@alex-snezhko alex-snezhko force-pushed the stdlib-uri branch 2 times, most recently from 4cadfc8 to 8eaff8d Compare February 26, 2024 00:02
Copy link
Member

@marcusroberts marcusroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for changing the name - it looks great to me now!

Copy link
Member

@ospencer ospencer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on changing all of the places we say percentEncode/percentDecode to just encode/decode?

stdlib/uri.gr Outdated Show resolved Hide resolved
stdlib/uri.gr Outdated Show resolved Hide resolved
return true
}

// Lowercase all non-percent-encoded alphabetical characters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make issues for String.lowercaseAscii and String.uppercaseAscii and todos that reference it here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean here? This function uses the char toAsciiLowercase function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of exploding and iterating the Char version, it'd be handy to just call lowercase on the string. Unless I didn't read this hard enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'd still need some custom logic to only lowercase the characters that aren't percent-encode triples so not sure if there'd be a super huge advantage to using lowercase on the whole string

stdlib/uri.gr Show resolved Hide resolved
stdlib/uri.gr Outdated Show resolved Hide resolved
stdlib/uri.gr Outdated Show resolved Hide resolved
Copy link
Member

@ospencer ospencer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phenomenal work on this! I learned a ton about URIs just from reading your implementation. Excited to use this soon.

@phated
Copy link
Member

phated commented Mar 2, 2024

Looks like this needs docs regenerated

stdlib/uri.gr Outdated Show resolved Hide resolved
stdlib/uri.gr Outdated Show resolved Hide resolved
stdlib/uri.gr Outdated Show resolved Hide resolved
stdlib/uri.gr Outdated Show resolved Hide resolved
stdlib/uri.gr Outdated Show resolved Hide resolved
stdlib/uri.gr Outdated Show resolved Hide resolved
stdlib/uri.gr Outdated Show resolved Hide resolved
stdlib/uri.gr Outdated Show resolved Hide resolved
stdlib/uri.gr Outdated Show resolved Hide resolved
stdlib/uri.gr Outdated Show resolved Hide resolved
stdlib/uri.gr Outdated Show resolved Hide resolved
Copy link
Member

@phated phated left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I did those as individuals. I always thought "surely this will be the last one" until the actual last one.

@phated phated changed the title feat(stdlib): URI module feat(stdlib): Add Uri module Mar 2, 2024
@phated phated enabled auto-merge March 2, 2024 23:01
@phated phated added this pull request to the merge queue Mar 2, 2024
Merged via the queue into grain-lang:main with commit 5cf726e Mar 2, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

5 participants