Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Printing and parsing of header timestamps #270

Open
yawaramin opened this issue Apr 29, 2023 · 7 comments
Open

Printing and parsing of header timestamps #270

yawaramin opened this issue Apr 29, 2023 · 7 comments

Comments

@yawaramin
Copy link
Contributor

Headers need to use a well-defined timestamp format: https://www.rfc-editor.org/rfc/rfc2616#section-3.3

E.g. Wed, 21 Oct 2015 07:28:00 GMT

Dream can already print to this format:

let expires =

It would be good to provide support in Dream itself for printing and parsing this format as other headers also need to use it, e.g.

Maybe an API like:

val timestamp : ?epoch_s:float -> unit -> string
(** [timestamp ?epoch_s] () is a string representation of [epoch_s] (epoch
    seconds) if provided, else the current timestamp. *)

val of_timestamp : string -> (float, string) result
(** [of_timestamp str] is the time in epoch seconds parsed from the [str]
    representation, or an error message if parse failed. *)
@aantron
Copy link
Owner

aantron commented May 3, 2023

For the optional argument in val timestamp, it may be clearer to separate this into a timestamp function and a now function, or or simply point users to where they can get the current timestamp in the stdlib.

of_timestamp should be from_timestamp, to be consistent with the rest of Dream naming. Should the return type be a result? What would we put into the string in case of error?

What's the current way of doing this in OCaml? I assume that there is a library that handles dates and times in this format, or multiple libraries.

@yawaramin
Copy link
Contributor Author

Tbh I only looked at the Ptime library as it's currently already being used by Dream and I was hesitant to add yet more libraries to the dependency cone. I will paste below what I implemented in my code. About the function naming and splitting, that makes sense. Agreed.

About the result type–tbh I am good with any approach to handle parse failure–I just went with result as a first suggestion because I thought it might be acceptable to most people. But I am OK with using an option or even throwing an Invalid_argument exception, after all Dream is already catching and logging exceptions. For the error message it would probably just be the input string itself.

let of_timestamp str =
  let opt_date_time = try
    Scanf.sscanf str "%c%c%c%c%c%02d %s %04d %02d:%02d:%02d GMT"
    @@ fun _ _ _ _ _ day month year hour minute second ->
    let month = match month with
      | "Jan" -> 1
      | "Feb" -> 2
      | "Mar" -> 3
      | "Apr" -> 4
      | "May" -> 5
      | "Jun" -> 6
      | "Jul" -> 7
      | "Aug" -> 8
      | "Sep" -> 9
      | "Oct" -> 10
      | "Nov" -> 11
      | "Dec" -> 12
      | _ -> invalid_arg month
    in
    Some ((year, month, day), ((hour, minute, second), 0))
  with _ -> None
  in
  Ptime.of_date_time
  |> Option.bind opt_date_time
  |> Option.map Ptime.to_float_s

let timestamp ?s () =
  let ptime = Option.get @@ Ptime.of_float_s @@ match s with
    | Some s -> s
    | None -> Unix.time ()
  in
  let ((year, month, day), ((hour, minute, second), _)) = Ptime.to_date_time ptime in
  let weekday = match Ptime.weekday ptime with
    | `Sun -> "Sun"
    | `Mon -> "Mon"
    | `Tue -> "Tue"
    | `Wed -> "Wed"
    | `Thu -> "Thu"
    | `Fri -> "Fri"
    | `Sat -> "Sat"
  in
  let month = match month with
    | 1 -> "Jan"
    | 2 -> "Feb"
    | 3 -> "Mar"
    | 4 -> "Apr"
    | 5 -> "May"
    | 6 -> "Jun"
    | 7 -> "Jul"
    | 8 -> "Aug"
    | 9 -> "Sep"
    | 10 -> "Oct"
    | 11 -> "Nov"
    | 12 -> "Dec"
    | _ -> invalid_arg @@ string_of_int month
  in
  Printf.sprintf "%s, %02d %s %04d %02d:%02d:%02d GMT" weekday day month year hour minute second

@aantron
Copy link
Owner

aantron commented May 10, 2023

Thanks!

An HTTP date-time parser has to parse three formats (as also per the link you posted, thanks!). I looked through opam packages for existing such parsers and found HTTP Date and Timedesc. Both of these can parse all the formats required by HTTP, and output the format preferred by HTTP.

(For the future: I also looked at Calendar and Ptime itself, but neither library offers any support for HTTP specifically).

You should be able to use either one of these immediately:

let () =
  "Sun, 06 Nov 1994 08:49:37 GMT"
  |> Http_date.decode
  |> Http_date.encode
  |> print_endline

let () =
  "Sun, 06 Nov 1994 08:49:37 GMT"
  |> Timedesc.of_http
  |> Result.get_ok
  |> Timedesc.to_http
  |> print_endline

...and we can recommend either one.

However, neither one uses floats directly for time values. Dream does use floats wherever possible, and avoids "mystery types" that are not clearly necessary and could be confusing. I think we would want to choose one of these and wrap it so that we get float.

Also, as a note for the future, HTTP Date currently depends on Menhir and Timedesc currently depends on Angstrom. Both of these seem too heavy for the simplistic input formats they have to parse, but both Menhir and Angstrom are currently in the dependency cone of Dream anyway, so it doesn't matter for now.

@aantron
Copy link
Owner

aantron commented May 10, 2023

To pick between HTTP Date and Timedesc, I would probably look at the intersection of the dependency cone of each with the current one of Dream, using opam list --required-by --recursive timedesc, etc., and just pick the one with the least new packages out of the intersection. Also worth noting is that HTTP Date has bikallem/http-date#2, but the workaround is given in the issue.

@yawaramin
Copy link
Contributor Author

yawaramin commented May 11, 2023

I am wondering if we need a new library for this? What do you think of just rolling a parser? This seems to be working fine for me:

(* Stdlib.int_of_string accepts hex/octal, oops *)
let int_of_string str =
  try Scanf.sscanf str "%d" Fun.id with Scanf.Scan_failure _ -> invalid_arg str

let month_num = function
  | "Jan" -> 1
  | "Feb" -> 2
  | "Mar" -> 3
  | "Apr" -> 4
  | "May" -> 5
  | "Jun" -> 6
  | "Jul" -> 7
  | "Aug" -> 8
  | "Sep" -> 9
  | "Oct" -> 10
  | "Nov" -> 11
  | "Dec" -> 12
  | month -> invalid_arg month

let date yyyy mmm dd =
  int_of_string yyyy, month_num mmm, int_of_string dd

let time hhmmss = match String.split_on_char ':' hhmmss with
  | [hh; mm; ss] ->
    (int_of_string hh, int_of_string mm, int_of_string ss), 0
  | _ ->
    invalid_arg hhmmss

let from_timestamp str =
  let tup = match String.split_on_char ' ' str with
    (* Sun Nov  6 08:49:37 1994 ; ANSI C's asctime() format *)
    | [_; mmm; ""; d; hhmmss; yyyy] ->
      (date yyyy mmm d, time hhmmss)

    (* Sun, 06 Nov 1994 08:49:37 GMT ; RFC 822, updated by RFC 1123 *)
    | [_; dd; mmm; yyyy; hhmmss; _] ->
      (date yyyy mmm dd, time hhmmss)

    (* Sun Nov 16 08:49:37 1994 ; ANSI C's asctime() format *)
    | [_; mmm; dd; hhmmss; yyyy] ->
      (date yyyy mmm dd, time hhmmss)

    (* Sunday, 06-Nov-94 08:49:37 GMT ; RFC 850, obsoleted by RFC 1036 *)
    | [_; dd_mmm_yy; hhmmss; _] ->
      begin match String.split_on_char '-' dd_mmm_yy with
      | [dd; mmm; yy] ->
        let yy, mm, dd = date yy mmm dd in
        let yy = if yy < 100 then yy + 1900 else yy in (* Y2K problem*)
        ((yy, mm, dd), time hhmmss)
      | _ ->
        invalid_arg dd_mmm_yy
      end
    | _ ->
      invalid_arg str
  in
  match Ptime.of_date_time tup with
  | Some ptime -> Ptime.to_float_s ptime
  | None -> invalid_arg str

I use the String.split_on_char function to tokenize and parse things all the time, it works pretty nicely. We just ignore the received day of week, which works out if we are being liberal in what we accept like the RFCs seem to want.

@aantron
Copy link
Owner

aantron commented May 11, 2023

We should be able to hand-write non-allocating simple parsers for this, if we go that route. These are very simple formats. But the easiest way to immediately implement it is to depend on an existing library. We can change the choice later if there is some good reason to do so, especially since the library chosen will be an implementation detail, since we will be wrapping the call to it anyway.

@aantron
Copy link
Owner

aantron commented May 11, 2023

Also want to note for myself: depending on HTML Date requires a --dev-repo pin of odate in order to develop Dream's docs because of conflicting constraints on Menhir with Soupault's transitive dependencies. Not relevant for actual users of Dream, but something that I should put into CONTRIBUTING.md if we choose HTML Date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants