Expose individual apis #177

andrewstucki · 2020-08-06T01:52:08Z

I believe this is one of the major asks in #172, the other being something like an elasticsearch.NewTransport function that you could pass to each of these if you so desired. Basically rather than having one giant API object, we also expose the ability to decompose the client into individual APIs so someone can use just what they want and avoid the penalty of instantiating every client.

Edit:

I went ahead and just exposed a method for creating an *estransport.Client with the same configuration structure as the top-level elasticsearch.NewClient and named it elasticsearch.NewTransportClient.

This will basically allow people to do what's being requested, something along the lines of:

tp, _ := elasticsearch.NewTransportClient(elasticsearch.Config{})
bulk := esapi.NewBulk(tp)

// use bulk api client here

Also, this has the added benefit of, with a few changes being a potential way to slim down the esutil.BulkIndexer implementation, since really, the only thing that the BulkIndexer needs is an *estransport.Client rather than the whole wrapped API.

…ration object as the regular client

andrewstucki · 2020-08-06T04:36:29Z

Just as a quick demonstration of what the original issue was reporting, taking one of the examples:

package main

import (
	"log"

	"github.com/elastic/go-elasticsearch/v8"
)

func main() {
	es, err := elasticsearch.NewDefaultClient()
	if err != nil {
		log.Fatalf("Error creating the client: %s\n", err)
	}

	res, err := es.Info()
	if err != nil {
		log.Fatalf("Error getting the response: %s\n", err)
	}
	defer res.Body.Close()

	log.Print(res.String())
}

and replacing it with the equivalent

package main

import (
	"log"

	"github.com/elastic/go-elasticsearch/v8"
	"github.com/elastic/go-elasticsearch/v8/esapi"
)

func main() {
	ts, err := elasticsearch.NewTransportClient(elasticsearch.Config{})
	if err != nil {
		log.Fatalf("Error creating the client: %s\n", err)
	}
	info := esapi.NewInfo(ts)

	res, err := info()
	if err != nil {
		log.Fatalf("Error getting the response: %s\n", err)
	}
	defer res.Body.Close()

	log.Print(res.String())
}

gives me an approximately 4MB binary difference on my mac when compiled with:

bash-3.2$ CGO_ENABLED=0 go build -ldflags '-s'

bash-3.2$ ls -la full-client info-only
-rwxr-xr-x  1 andrew.stucki  staff  11946532 Aug  6 00:25 full-client
-rwxr-xr-x  1 andrew.stucki  staff   7735528 Aug  6 00:29 info-only

Alternatively you could do the same with transport.Perform calls directly, but you lose out on the ergonomics of the convenience wrappers to these APIs.

karmi · 2020-08-06T07:03:44Z

Thanks for looking into this, @andrewstucki! I'm out of office until the end of the month, so it might take some time to fully process the changes, merge them, etc. We can have a conversation here in the meantime, though. Couple of comments:

We need to separate changes to the internal packages (generator) and the other packages. Also, I'll need to double-check the behaviour on the main branch and the 7.x and 6.x branches.
I see that you've moved the logic for handling values from environment, defaults, etc into the elasticsearch.NewTransportClient(). I'm not 100% sure about the naming for that... Another approach would be to have something like a "build config" method (not sure about the naming there at all)?
Maybe we could leverage estransport.New() instead of elasticsearch.NewTransportClient() instead of adding a method to the main elasticsearch package? I do understand that we want to properly handle environment variables, defaults, etc.
Just so I have the full picture, the resulting binary is slimmer because the compiler is able to add only the specific APIs, that is the "Info" API, in your example?

(...) with a few changes being a potential way to slim down the esutil.BulkIndexer implementation, since really, the only thing that the BulkIndexer needs is an *estransport.Client rather than the whole wrapped API

I definitely agree that trimming down the size for the bulk indexer is important. We'll need to tweak it a little bit, as you suggest, though. One way I have suggested elsewhere is to change it from accepting *elasticsearch.Client directly to accepting an interface. Since it builds the esapi.BulkRequest{} and passes the configured client to the Do() method, that should work rather well.

go-elasticsearch/esutil/bulk_indexer.go

Lines 477 to 496 in 527225d

    
           req := esapi.BulkRequest{ 
        
           	Index: w.bi.config.Index, 
        
           	Body:  w.buf, 
        
           	Pipeline:            w.bi.config.Pipeline, 
        
           	Refresh:             w.bi.config.Refresh, 
        
           	Routing:             w.bi.config.Routing, 
        
           	Source:              w.bi.config.Source, 
        
           	SourceExcludes:      w.bi.config.SourceExcludes, 
        
           	SourceIncludes:      w.bi.config.SourceIncludes, 
        
           	Timeout:             w.bi.config.Timeout, 
        
           	WaitForActiveShards: w.bi.config.WaitForActiveShards, 
        
           	Pretty:     w.bi.config.Pretty, 
        
           	Human:      w.bi.config.Human, 
        
           	ErrorTrace: w.bi.config.ErrorTrace, 
        
           	FilterPath: w.bi.config.FilterPath, 
        
           	Header:     w.bi.config.Header, 
        
           } 
        
           res, err := req.Do(ctx, w.bi.config.Client)

andrewstucki · 2020-08-06T11:29:30Z

We need to separate changes to the internal packages (generator) and the other packages. Also, I'll need to double-check the behaviour on the main branch and the 7.x and 6.x branches.

That should be fine. You need me to do only the generator + regenerated API functions/structs here and then I can open up a separate PR for the client changes?

I see that you've moved the logic for handling values from environment, defaults, etc into the elasticsearch.NewTransportClient(). I'm not 100% sure about the naming for that... Another approach would be to have something like a "build config" method (not sure about the naming there at all)?

Yeah, that name needs some work. Regarding just building the config itself v. an underlying client, the other aspect to that method is copying over all of the fields from an elasticsearch.Config to the estransport.Config, which is a bit tedious to do manually. That's why I went with directly returning an estransport.Client--not sure if you have any other ideas, but I would imagine there could be a larger refactor to make this a bit more slick without doing the (slightly abstraction breaking thing) of making an elasticsearch package function directly initialize and return and estransport package struct.

Just so I have the full picture, the resulting binary is slimmer because the compiler is able to add only the specific APIs, that is the "Info" API, in your example?

Yep, it's basically that the compiler is able to prune a whole bunch of code that's never used. When someone initializes a esapi.API v. a single API, the compiler has to keep all of the fields of the esapi.API struct around, in the case of the single API it can actually prune out that struct and all of the other APIs that aren't used.

I definitely agree that trimming down the size for the bulk indexer is important. We'll need to tweak it a little bit, as you suggest, though. One way I have suggested elsewhere is to change it from accepting *elasticsearch.Client directly to accepting an interface. Since it builds the esapi.BulkRequest{} and passes the configured client to the Do() method, that should work rather well.

Yeah, I think that once there's a simple way of doing a fairly straightforward initialization of an estransport.Client we should be able to change the *elasticsearch.Client field to use the esapi.Transport interface that the Do method takes. Since the transport Client itself implements that interface, everything should just work whether you pass an *elasticsearch.Client (for the discovery, additional APIs etc.) or an *estransport.Client.

karmi · 2020-08-08T07:26:03Z

Hello! With regards to separating the changes, it's perfectly fine to just rebase the commits, and force push to this branch/PR. Thinking about it, perhaps it's better to leave out the changes to the APIs from this PR, since I tend to commit those in a fairly uniform manner, and always isolated from other changes (see https://github.com/elastic/go-elasticsearch/commits/master/esapi).

I'll need to think more about the whole "simplified client" / "just transport client" aspect, but I think you've nailed the neccessary changes to the generator and the API constructors.

Totally agreed about the change to the bulk indexer, I always envisioned making the Client field accept an interface and not *elasticsearch.Client, but because it was a large undertaking, I wanted to make smaller steps.

axw

Looks like a good change overall, just one question about node discovery.

axw · 2020-09-01T05:01:03Z

elasticsearch.go

+	if cfg.DiscoverNodesOnStart {
+		go client.DiscoverNodes()
+	}


What's the reason for starting discovery in NewClient and not NewTransportClient?

It seems surprising that cfg.DiscoverNodesOnStart would be ignored when calling NewTransportClient, and probably not a desired change in behaviour for users who are only trying to slim down the API surface area.

Maybe we should add a DiscoverNodesOnStart flag to estransport.Config, and start the goroutine there?

karmi · 2020-09-08T14:59:38Z

I finally had some time to look into the issue and think about the problem today. While I'm definitely not opposed to exposing the constructors of the individual APIs, I think the current package structure allows for "almost" achieving the stated goal of reducing the build size.

It occured to me that if the esapi and estransport packages would be used in isolation, it is entirely possible to use only a single API, and the estransport.Client instead of elasticsearch.Client. I've rewritted the example you're using with the "Info" API as follows:

package main

import (
	"context"
	"log"
	"net/url"

	"github.com/elastic/go-elasticsearch/v8/esapi"
	"github.com/elastic/go-elasticsearch/v8/estransport"
)

func main() {
	clusterURL, _ := url.Parse("http://localhost:9200")
	tp, err := estransport.New(
		estransport.Config{
			URLs: []*url.URL{clusterURL},
		},
	)
	if err != nil {
		log.Fatalf("Error creating the client: %s\n", err)
	}

	req := esapi.InfoRequest{}

	res, err := req.Do(context.Background(), tp)
	if err != nil {
		log.Fatalf("Error getting the response: %s\n", err)
	}
	defer res.Body.Close()

	log.Print(res.String())
}

When I check the size, I'm getting this output:

$ ls -lah *
-rwxr-xr-x  1 karmi  staff    12M Sep  8 16:44 test-full
-rwxr-xr-x  1 karmi  staff   7.4M Sep  8 16:44 test-info-only

It needs to be noted that estransport.New() and elasticsearch.NewClient() are not fully functionally equivalent, ie. the discovery options and the CloudID support are missing. But it's possible to call DiscoverNodes() manually, and the CloudID support can be added easily.

Not sure which way to go now. Does anybody has any thoughts?

expose individual apis

7dd9b73

andrewstucki requested a review from karmi August 6, 2020 01:52

andrewstucki added 3 commits August 5, 2020 21:53

revert go.mod change

bedb0f4

expose the rest of the apis

a25a121

expose a method for creating a transport client with the same configu…

055996c

…ration object as the regular client

axw reviewed Sep 1, 2020

View reviewed changes

karmi mentioned this pull request Sep 8, 2020

API object exposes a huge surface. #172

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose individual apis #177

Expose individual apis #177

andrewstucki commented Aug 6, 2020 •

edited

andrewstucki commented Aug 6, 2020

karmi commented Aug 6, 2020

andrewstucki commented Aug 6, 2020

karmi commented Aug 8, 2020

axw left a comment

axw Sep 1, 2020

karmi commented Sep 8, 2020

Expose individual apis #177

Are you sure you want to change the base?

Expose individual apis #177

Conversation

andrewstucki commented Aug 6, 2020 • edited

andrewstucki commented Aug 6, 2020

karmi commented Aug 6, 2020

andrewstucki commented Aug 6, 2020

karmi commented Aug 8, 2020

axw left a comment

Choose a reason for hiding this comment

axw Sep 1, 2020

Choose a reason for hiding this comment

karmi commented Sep 8, 2020

andrewstucki commented Aug 6, 2020 •

edited