Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose individual apis #177

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

andrewstucki
Copy link

@andrewstucki andrewstucki commented Aug 6, 2020

I believe this is one of the major asks in #172, the other being something like an elasticsearch.NewTransport function that you could pass to each of these if you so desired. Basically rather than having one giant API object, we also expose the ability to decompose the client into individual APIs so someone can use just what they want and avoid the penalty of instantiating every client.

Edit:

I went ahead and just exposed a method for creating an *estransport.Client with the same configuration structure as the top-level elasticsearch.NewClient and named it elasticsearch.NewTransportClient.

This will basically allow people to do what's being requested, something along the lines of:

tp, _ := elasticsearch.NewTransportClient(elasticsearch.Config{})
bulk := esapi.NewBulk(tp)

// use bulk api client here

Also, this has the added benefit of, with a few changes being a potential way to slim down the esutil.BulkIndexer implementation, since really, the only thing that the BulkIndexer needs is an *estransport.Client rather than the whole wrapped API.

@andrewstucki andrewstucki requested a review from karmi August 6, 2020 01:52
@andrewstucki
Copy link
Author

Just as a quick demonstration of what the original issue was reporting, taking one of the examples:

package main

import (
	"log"

	"github.com/elastic/go-elasticsearch/v8"
)

func main() {
	es, err := elasticsearch.NewDefaultClient()
	if err != nil {
		log.Fatalf("Error creating the client: %s\n", err)
	}

	res, err := es.Info()
	if err != nil {
		log.Fatalf("Error getting the response: %s\n", err)
	}
	defer res.Body.Close()

	log.Print(res.String())
}

and replacing it with the equivalent

package main

import (
	"log"

	"github.com/elastic/go-elasticsearch/v8"
	"github.com/elastic/go-elasticsearch/v8/esapi"
)

func main() {
	ts, err := elasticsearch.NewTransportClient(elasticsearch.Config{})
	if err != nil {
		log.Fatalf("Error creating the client: %s\n", err)
	}
	info := esapi.NewInfo(ts)

	res, err := info()
	if err != nil {
		log.Fatalf("Error getting the response: %s\n", err)
	}
	defer res.Body.Close()

	log.Print(res.String())
}

gives me an approximately 4MB binary difference on my mac when compiled with:

bash-3.2$ CGO_ENABLED=0 go build -ldflags '-s'
bash-3.2$ ls -la full-client info-only
-rwxr-xr-x  1 andrew.stucki  staff  11946532 Aug  6 00:25 full-client
-rwxr-xr-x  1 andrew.stucki  staff   7735528 Aug  6 00:29 info-only

Alternatively you could do the same with transport.Perform calls directly, but you lose out on the ergonomics of the convenience wrappers to these APIs.

@karmi
Copy link
Contributor

karmi commented Aug 6, 2020

Thanks for looking into this, @andrewstucki! I'm out of office until the end of the month, so it might take some time to fully process the changes, merge them, etc. We can have a conversation here in the meantime, though. Couple of comments:

  • We need to separate changes to the internal packages (generator) and the other packages. Also, I'll need to double-check the behaviour on the main branch and the 7.x and 6.x branches.

  • I see that you've moved the logic for handling values from environment, defaults, etc into the elasticsearch.NewTransportClient(). I'm not 100% sure about the naming for that... Another approach would be to have something like a "build config" method (not sure about the naming there at all)?

  • Maybe we could leverage estransport.New() instead of elasticsearch.NewTransportClient() instead of adding a method to the main elasticsearch package? I do understand that we want to properly handle environment variables, defaults, etc.

  • Just so I have the full picture, the resulting binary is slimmer because the compiler is able to add only the specific APIs, that is the "Info" API, in your example?

(...) with a few changes being a potential way to slim down the esutil.BulkIndexer implementation, since really, the only thing that the BulkIndexer needs is an *estransport.Client rather than the whole wrapped API

I definitely agree that trimming down the size for the bulk indexer is important. We'll need to tweak it a little bit, as you suggest, though. One way I have suggested elsewhere is to change it from accepting *elasticsearch.Client directly to accepting an interface. Since it builds the esapi.BulkRequest{} and passes the configured client to the Do() method, that should work rather well.

req := esapi.BulkRequest{
Index: w.bi.config.Index,
Body: w.buf,
Pipeline: w.bi.config.Pipeline,
Refresh: w.bi.config.Refresh,
Routing: w.bi.config.Routing,
Source: w.bi.config.Source,
SourceExcludes: w.bi.config.SourceExcludes,
SourceIncludes: w.bi.config.SourceIncludes,
Timeout: w.bi.config.Timeout,
WaitForActiveShards: w.bi.config.WaitForActiveShards,
Pretty: w.bi.config.Pretty,
Human: w.bi.config.Human,
ErrorTrace: w.bi.config.ErrorTrace,
FilterPath: w.bi.config.FilterPath,
Header: w.bi.config.Header,
}
res, err := req.Do(ctx, w.bi.config.Client)

@andrewstucki
Copy link
Author

  • We need to separate changes to the internal packages (generator) and the other packages. Also, I'll need to double-check the behaviour on the main branch and the 7.x and 6.x branches.

That should be fine. You need me to do only the generator + regenerated API functions/structs here and then I can open up a separate PR for the client changes?

  • I see that you've moved the logic for handling values from environment, defaults, etc into the elasticsearch.NewTransportClient(). I'm not 100% sure about the naming for that... Another approach would be to have something like a "build config" method (not sure about the naming there at all)?

Yeah, that name needs some work. Regarding just building the config itself v. an underlying client, the other aspect to that method is copying over all of the fields from an elasticsearch.Config to the estransport.Config, which is a bit tedious to do manually. That's why I went with directly returning an estransport.Client--not sure if you have any other ideas, but I would imagine there could be a larger refactor to make this a bit more slick without doing the (slightly abstraction breaking thing) of making an elasticsearch package function directly initialize and return and estransport package struct.

  • Just so I have the full picture, the resulting binary is slimmer because the compiler is able to add only the specific APIs, that is the "Info" API, in your example?

Yep, it's basically that the compiler is able to prune a whole bunch of code that's never used. When someone initializes a esapi.API v. a single API, the compiler has to keep all of the fields of the esapi.API struct around, in the case of the single API it can actually prune out that struct and all of the other APIs that aren't used.

I definitely agree that trimming down the size for the bulk indexer is important. We'll need to tweak it a little bit, as you suggest, though. One way I have suggested elsewhere is to change it from accepting *elasticsearch.Client directly to accepting an interface. Since it builds the esapi.BulkRequest{} and passes the configured client to the Do() method, that should work rather well.

Yeah, I think that once there's a simple way of doing a fairly straightforward initialization of an estransport.Client we should be able to change the *elasticsearch.Client field to use the esapi.Transport interface that the Do method takes. Since the transport Client itself implements that interface, everything should just work whether you pass an *elasticsearch.Client (for the discovery, additional APIs etc.) or an *estransport.Client.

@karmi
Copy link
Contributor

karmi commented Aug 8, 2020

Hello! With regards to separating the changes, it's perfectly fine to just rebase the commits, and force push to this branch/PR. Thinking about it, perhaps it's better to leave out the changes to the APIs from this PR, since I tend to commit those in a fairly uniform manner, and always isolated from other changes (see https://github.com/elastic/go-elasticsearch/commits/master/esapi).

I'll need to think more about the whole "simplified client" / "just transport client" aspect, but I think you've nailed the neccessary changes to the generator and the API constructors.

Totally agreed about the change to the bulk indexer, I always envisioned making the Client field accept an interface and not *elasticsearch.Client, but because it was a large undertaking, I wanted to make smaller steps.

Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good change overall, just one question about node discovery.

Comment on lines +106 to +108
if cfg.DiscoverNodesOnStart {
go client.DiscoverNodes()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for starting discovery in NewClient and not NewTransportClient?

It seems surprising that cfg.DiscoverNodesOnStart would be ignored when calling NewTransportClient, and probably not a desired change in behaviour for users who are only trying to slim down the API surface area.

Maybe we should add a DiscoverNodesOnStart flag to estransport.Config, and start the goroutine there?

@karmi
Copy link
Contributor

karmi commented Sep 8, 2020

I finally had some time to look into the issue and think about the problem today. While I'm definitely not opposed to exposing the constructors of the individual APIs, I think the current package structure allows for "almost" achieving the stated goal of reducing the build size.

It occured to me that if the esapi and estransport packages would be used in isolation, it is entirely possible to use only a single API, and the estransport.Client instead of elasticsearch.Client. I've rewritted the example you're using with the "Info" API as follows:

package main

import (
	"context"
	"log"
	"net/url"

	"github.com/elastic/go-elasticsearch/v8/esapi"
	"github.com/elastic/go-elasticsearch/v8/estransport"
)

func main() {
	clusterURL, _ := url.Parse("http://localhost:9200")
	tp, err := estransport.New(
		estransport.Config{
			URLs: []*url.URL{clusterURL},
		},
	)
	if err != nil {
		log.Fatalf("Error creating the client: %s\n", err)
	}

	req := esapi.InfoRequest{}

	res, err := req.Do(context.Background(), tp)
	if err != nil {
		log.Fatalf("Error getting the response: %s\n", err)
	}
	defer res.Body.Close()

	log.Print(res.String())
}

When I check the size, I'm getting this output:

$ ls -lah *
-rwxr-xr-x  1 karmi  staff    12M Sep  8 16:44 test-full
-rwxr-xr-x  1 karmi  staff   7.4M Sep  8 16:44 test-info-only

It needs to be noted that estransport.New() and elasticsearch.NewClient() are not fully functionally equivalent, ie. the discovery options and the CloudID support are missing. But it's possible to call DiscoverNodes() manually, and the CloudID support can be added easily.

Not sure which way to go now. Does anybody has any thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants