Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an MSSQL driver #12

Open
brendanlong opened this issue Apr 7, 2018 · 19 comments
Open

Add an MSSQL driver #12

brendanlong opened this issue Apr 7, 2018 · 19 comments
Assignees

Comments

@brendanlong
Copy link
Contributor

Just creating this to document a work-in progress driver using the FreeTDS module. Unfortunately FreeTDS doesn't seem to have an async interface so we'll have to just run everything in background threads.

Also, FreeTDS requires that everything run in the same background thread, which I'm not sure the current Preemptive module does?

WIP here: https://github.com/brendanlong/ocaml-caqti/commits/mssql

@brendanlong
Copy link
Contributor Author

brendanlong commented Apr 7, 2018

There's still a lot to do:

  • Implement call
  • Handle Dblib exceptions correctly
  • Ensure we don't use the same connection on multiple threads at the same time (something like Async.Sequencer, or just run everything on one thread)
  • Handle cases like multiple rows being returned when we only expect one
  • Implement decoding
  • Handle conversions for non-perfect type matches (i.e. we should convert int to int64 if necessary)
  • Handle the horrible hacks needed to make datetimes work right. In an internal library at my company we handle this by doing "SELECT CAST('2017-02-02' AS DATETIME) AS x" and checking if the returned month is 1 or 2 (this has to be done at runtime since FreeTDS inexplicably gives you no way to determine how this option was set)
  • Handle parameters. In our internal library we parse the SQL and encode them since ocaml-freetds doesn't support parameterized queries.
  • Update tests. For example, SQL Server temporary tables have to start with # apparently.

@hcarty
Copy link

hcarty commented Apr 7, 2018

One approach I've used when a C binding needs to have all operations run in the same thread under Lwt_preemptive is to create a long-lived thread which does the initialization, work and cleanup. Requests are handled by passing tasks and results to/from this thread.

You could also look at Mwt which hasn't been released formally but could be if there is a use for it.

@brendanlong
Copy link
Contributor Author

brendanlong commented Apr 7, 2018

Hmm I'm having trouble getting the driver to load. If I do it the same way as the other drivers I get errors like:

Failed to load driver for mssql://sa/master: error loading shared library: ./../lib-driver/caqti_driver_mssql.cmxs: undefined symbol: ocaml_freetds_dbsqlexec

https://circleci.com/gh/brendanlong/ocaml-caqti/26

It seems like jbuilder has a different way of linking dynamic libraries which looks like this:

(executable
 ((name caqti_driver_mssql)
  (public_name caqti-driver-mssql)
  (modes (shared_object))
  (modules (Caqti_driver_mssql))
  (libraries (caqti freetds))))

but if I do that, I get:

Failed to load driver for mssql://localhost: The driver for mssql did not register itself after apparently loading.

@brendanlong
Copy link
Contributor Author

With debugging:

(cd _build/default/tests && ./test_sql_lwt.exe)
[DEBUG] Caqti_dynload: recorded_predicates = native
[DEBUG] Caqti_dynload: recorded_packages.core = bigarray caqti ptime re re.posix result sexplib sexplib.0 stringext unix uri
[DEBUG] Caqti_dynload: recorded_packages.load = 
[DEBUG] Caqti_dynload: requested package: caqti-driver-sqlite3
[DEBUG] Fl_dynload: not loading: result
[DEBUG] Fl_dynload: not loading: ptime
[DEBUG] Fl_dynload: not loading: re
[DEBUG] Fl_dynload: not loading: re.posix
[DEBUG] Fl_dynload: not loading: unix
[DEBUG] Fl_dynload: not loading: bigarray
[DEBUG] Fl_dynload: not loading: sexplib.0
[DEBUG] Fl_dynload: not loading: sexplib
[DEBUG] Fl_dynload: not loading: stringext
[DEBUG] Fl_dynload: not loading: uri
[DEBUG] Fl_dynload: not loading: caqti
[DEBUG] Fl_dynload: about to load: sqlite3
[DEBUG] Fl_dynload: files="sqlite3.cmxs"
[DEBUG] Fl_dynload: loading "sqlite3.cmxs"
[DEBUG] Fl_dynload: about to load: caqti-driver-sqlite3
[DEBUG] Fl_dynload: files="caqti_driver_sqlite3.cmxs"
[DEBUG] Fl_dynload: loading "caqti_driver_sqlite3.cmxs"
[DEBUG] Caqti_dynload: requested package: caqti-driver-mssql
[DEBUG] Fl_dynload: not loading: result
[DEBUG] Fl_dynload: not loading: ptime
[DEBUG] Fl_dynload: not loading: re
[DEBUG] Fl_dynload: not loading: re.posix
[DEBUG] Fl_dynload: not loading: unix
[DEBUG] Fl_dynload: not loading: bigarray
[DEBUG] Fl_dynload: not loading: sexplib.0
[DEBUG] Fl_dynload: not loading: sexplib
[DEBUG] Fl_dynload: not loading: stringext
[DEBUG] Fl_dynload: not loading: uri
[DEBUG] Fl_dynload: not loading: caqti
[DEBUG] Fl_dynload: about to load: freetds
[DEBUG] Fl_dynload: files=""
[DEBUG] Fl_dynload: about to load: caqti-driver-mssql
[DEBUG] Fl_dynload: files="caqti_driver_mssql.cmxs"
[DEBUG] Fl_dynload: loading "caqti_driver_mssql.cmxs"
Failed to load driver for <mssql://localhost>: error loading shared library: ./../lib-driver/caqti_driver_mssql.cmxs: undefined symbol: ocaml_freetds_dbsqlexec

I wonder if the FreeTDS module is missing something?

@paurkedal paurkedal self-assigned this Apr 8, 2018
@paurkedal
Copy link
Owner

[DEBUG] Fl_dynload: about to load: freetds
[DEBUG] Fl_dynload: files=""

It looks like freetds/META is missing plugin(byte) and plugin(native) stanzas, cf sqlite3/META.

@paurkedal
Copy link
Owner

Thanks for doing this!

Do you have a reference to the documentation which require everything to run in one thread?

For the month-related issue, I suggest running a query on when establishing an new connection, and caching the result. See set_utc_req for the MariaDB driver.

@paurkedal
Copy link
Owner

paurkedal commented Apr 8, 2018

Two request:

  • Please make sure the code compiles with the --dev option. I always use it during development, and occasionally jbuilder runtest --dev --workspace jbuilder-workspace.dev, though I can do the latter myself before I merge if you don't have all switches locally.
  • Please use even indentation for non-pattern lines by adding an extra column below patterns where | is indented to an odd column.

@paurkedal
Copy link
Owner

Handle parameters. In our internal library we parse the SQL and encode them since ocaml-freetds doesn't support parameterized queries.

This is a tricky one, and the main reason I haven't added an ODBC driver. The issue with ODBC is there there are multiple databases behind it having different quoting conventions, and a mismatch of quoting algorithms comes with a high risk of causing an SQL injection vulnerability. Preferably we should rely on algorithms implemented by vendor. If the vendor has precise documentation of something which can be used across versions, that would suffice too.

@rgrinberg
Copy link

@brendanlong the buggy META file that @paurkedal pointed out suggests that you should switch FreeTDS to dune. Writing META by hand isn't wise - it's best to let dune do it for you.

@brendanlong
Copy link
Contributor Author

brendanlong commented Apr 8, 2018

Ok, I'll open a pull request with freetds. Thanks for helping me find out where the problem was!

@paurkedal Sorry about the semi-messy code right now. ocp-indent is doing weird things to the code and I didn't feel like manually fixing formatting until I was done (since ocp-indent would just break it again every time I save..). I'll definitely clean it up to match your coding style before opening a real pull request.

Maybe I imagined the single-thread thing? The FAQ says FreeTDS isn't threadsafe but it looks like it doesn't particularly care which thread you run each function on, as long as you never do multiple at the same time.

@paurkedal
Copy link
Owner

From the FreeTDS doc:

Different threads may all use separate connections without interfering with each other. Threads may not share a DBPROCESS or CS_CONNECTION without controlling access via a mutex.

The anti-thesis of the second statement suggests to me we should be good, since access to the connection will be serialized though the main thread. That is, presuming Lwt/Async communication between threads include the needed memory barriers, but how could it not? See also the second MT case for sqlite3 for comparison, where I assumed sequential access from different worker threads would be fine.

@brendanlong
Copy link
Contributor Author

Yeah my remaining concern is that we still need to serialize usage if we're not using the same thread every time. For example, if someone runs this:

Deferred.both (Db.exec "SELECT 1", Db.exec "SELECT 1")

We don't want both requests to be executed concurrently on different threads using the same connection.

@paurkedal
Copy link
Owner

That's not safe for other databases either, since results are attached to the DB connection. To do queries in parallel, one can pull multiple connections from a pool:

Deferred.both (Db_pool.use (fun (module Db) -> Db.exec "SELECT 1", ...))

I better take a note about it in the documentation.

@brendanlong
Copy link
Contributor Author

I wonder if there's a way to make the API handle that case properly without being too complicated. It seems really bad that a straightforward use of the Async API will segfault your program. :\

@paurkedal
Copy link
Owner

Yes, at call and the convenience API in Caqti_connection_sig.S could have a lock or at least detect concurrent usage. That would not give concurrency, but at least avoid segfault. I think I prefer raising an error over locking, since the intention of the user was probably to parallelise.

@paurkedal paurkedal assigned brendanlong and unassigned paurkedal Apr 15, 2018
@paurkedal
Copy link
Owner

I had a look at the issue with concurrent usage. I had already added a check in Caqti_connect.Make.Connection_of_base.use, though it needed an adjustment to it race-proof for preemptive threading.

P.S. I didn't mean to take this issue myself.

@brendanlong
Copy link
Contributor Author

I got the changes I needed into ocaml-freetds but now I'm not quite sure how to approach the parameters, since I'm pretty sure FreeTDS doesn't support them. I can relatively easily fake it by looking for $ parameters in the query string, properly encoding any query parameters, and then formatting them in, but I'm not sure if I should do that in the MSSQL driver or try to make a more generic version that does it for any driver with ~parameter_style:None. I guess maybe I should start with the more specific version and generalize from there if it works well..

@paurkedal
Copy link
Owner

paurkedal commented Apr 15, 2018

Good. And yes, the parameters will be tricky, and I think it's more than the need to rewrite the query string:

My main concern is to deliver the strings in a way that is guaranteed secure for any database and version supported by FreeTDS. E.g. In standard SQL ' would be escaped as '', but what if the DB also supports backslash escapes? An attacker could send a string \'; DROP DATABASE blah;, which would be escaped as \''; DROP DATABASE blah; where the second quote ends the quoted string and leaves the remainder for arbitrary execution. On the other hand, if we escape ' as \' and the database does not support backslash escaping, the attacker could send '; DROP DATABASE blah;. One could assume that a) at least standard SQL escape is supported, that b) backslashes, if supported, would act as expected, and that 3) there are no other ways to cancel these escapes. Then escaping ' as '' and \ as \\ would only lead to incorrect strings if backslash escapes are not supported.

It may be simpler if the database is known to be MSSQL. I am not familiar with it myself, but if it has some generic escape mechanism like BASE64 or hexadecimal, that should be a safe option.

(Added: Consider DROP DATABASE blah a placeholder for more useful payloads; it probably won't even work in most environments due to access restrictions, and would be quickly discovered. Some RDBMs may also prohibit use of ; in query strings, though that would not prevent things like ', user_password, ' in a select list or ' || 1 || '' = ' in the RHS string of an equality in a where clause to extend the scope.)

@paurkedal
Copy link
Owner

Once we decide a suitable way of passing strings, the FreeTDS case should not be fundamentally different from the other drivers. Note that queries strings are delivered in parsed form (Caqti_request.query), so all drivers will construct the final query string themselves. This allows using either parameter convention across all RDBMs. In particular both the Sqlite3 and MariaDB drivers reshuffles parameters. The MSSQL driver will paste the literal values directly into the final query in reshuffled order instead of binding them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants