Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process Request for routing rules #325

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

willmostly
Copy link
Contributor

Description

Extract information from the HttpServeletRequest into a class more convenient for usage in routing logic.

Additional context and related issues

Routing rules are authored in the MVEL language used by the jeasy rules engine. MVEL has limited functionality in comparison to standard Java, and cannot use libraries not included with the JDK. This PR passes an additional processedRequest object to the rules engine with getters for the user, query type, and SQL details such as referenced tables and schemas.

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( x) Release notes are required, with the following suggested text:

* Process request to facilitate rule writing

@cla-bot cla-bot bot added the cla-signed label Apr 24, 2024
gateway-ha/pom.xml Show resolved Hide resolved
return new String(preparedStatement, UTF_8);
}

private void getTables(Node node, Set<QualifiedName> tables)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a unit test for this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is tested by TestRoutingGroupSelector.testRequestProcessorQueryDetails. For the SQL

SELECT x.*, y.*, z.* FROM catx.schemx.tblx a, schemy.tbly y, tblz z

with default catalog and schema of cat_default and schem_"default, The matched rule checks

processedRequest.tablesContains("cat_default.\"schem_\\\"default\".tblz")
  && processedRequest.tablesContains("cat_default.schemy.tbly")
  && processedRequest.tablesContains("catx.schemx.tblx")
  && processedRequest.getSchemas().contains("schemy")
  && processedRequest.getCatalogs().contains("catx")

What would you like to see in a unit test?

Comment on lines 457 to 484
public String getDefaultCatalog()
{
return defaultCatalog;
}

public Set<String> getCatalogs()
{
return catalogs;
}

public Set<String> getSchemas()
{
return schemas;
}

public Set<String> getCatalogSchemas()
{
return catalogSchemas;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are intended for use in routing rules, but are unused in the Trino Gateway code. I added more cases to routing_rules_processed_request.yml to cover getDefaultCatalog() and getCatalogSchemas() and provide usage examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebyhr , any feedback?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add @SuppressWarnings("unused") annotation to unused methods.

Copy link
Member

@vishalya vishalya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have my first round of comments and questions in.

@vishalya
Copy link
Member

Are you handling the DDL statements like create and drop table? They are useful in case of memory catalog.

@willmostly willmostly force-pushed the will/process_request branch 4 times, most recently from ba2f218 to 73c5ae6 Compare May 22, 2024 21:11
@willmostly
Copy link
Contributor Author

@vishalya DDL statements are handled, please check provideTableExtractionQueries for the full list of statement types. LMK if I missed any.

@willmostly willmostly requested a review from vishalya June 6, 2024 13:31
Copy link
Member

@vishalya vishalya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are getting there.

@@ -38,6 +38,8 @@ public class HaGatewayConfiguration
private OAuth2GatewayCookieConfiguration oauth2GatewayCookieConfiguration = new OAuth2GatewayCookieConfiguration();
private GatewayCookieConfiguration gatewayCookieConfiguration = new GatewayCookieConfiguration();

private RequestAnalyzerConfig processedRequestConfig = new RequestAnalyzerConfig();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the variable name and the accessor methods as per the class.

*/
package io.trino.gateway.ha.config;

public class RequestAnalyzerConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you adding the document for the configuration?


public TrinoRequestUser(HttpServletRequest request, RequestAnalyzerConfig config)
{
isGetUserFromOAuth = config.isGetUserFromOauth();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify how does this flow work, the backend trino can have multiple authentication mechanisms. so are you setting the flag for an individual request? Also gateway and trino can in theory have different IDPs.


private String getUserInfo(String token)
{
OkHttpClient client = new OkHttpClient();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering if the connection is going to happen for each processed request, if so do we need any further optimization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call out. I've added a cache with a 10 minute ttl to avoid bumping into rate limits

import jakarta.servlet.http.HttpServletRequest;
import okhttp3.Call;
import okhttp3.HttpUrl;
import okhttp3.OkHttpClient;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use Airlift HTTP client instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

5 participants