Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't analyze Canillita #29

Open
elbrujohalcon opened this issue Apr 3, 2017 · 15 comments
Open

Can't analyze Canillita #29

elbrujohalcon opened this issue Apr 3, 2017 · 15 comments

Comments

@elbrujohalcon
Copy link

When I try to analyze inaka/canillita, I found an error.
These are the steps I follow:

$ git clone https://github.com/inaka/canillita.git
$ cd canillita
$ make rel
$ _rel/canillita/bin/canillita console

…then, in another shell…

$ ./erlangpl --node canillita@127.0.0.1 --cookie canillita -vvv

…where I get the following output…

DEBUG: Plain args: ["./erlangpl","--node","canillita@127.0.0.1","--cookie",
                    "canillita","-vvv"]
DEBUG: Args: [{node,"canillita@127.0.0.1"},
              {cookie,"canillita"},
              {verbose,3},
              "./erlangpl"]
DEBUG: Loading: epl/priv/.gitignore
DEBUG: Loading: epl/priv/htdocs/asset-manifest.json
DEBUG: Loading: epl/priv/htdocs/favicon.ico
DEBUG: Loading: epl/priv/htdocs/index.html
DEBUG: Loading: epl/priv/htdocs/static/css/main.b813327a.css
DEBUG: Loading: epl/priv/htdocs/static/css/main.b813327a.css.map
DEBUG: Loading: epl/priv/htdocs/static/js/main.1ce390f5.js
DEBUG: Loading: epl/priv/htdocs/static/media/fontawesome-webfont.674f50d2.eot
DEBUG: Loading: epl/priv/htdocs/static/media/fontawesome-webfont.912ec66d.svg
DEBUG: Loading: epl/priv/htdocs/static/media/fontawesome-webfont.af7ae505.woff2
DEBUG: Loading: epl/priv/htdocs/static/media/fontawesome-webfont.b06871f2.ttf
DEBUG: Loading: epl/priv/htdocs/static/media/fontawesome-webfont.fee66e71.woff
DEBUG: Loading: epl/priv/htdocs/static/media/glyphicons-halflings-regular.448c34a5.woff2
DEBUG: Loading: epl/priv/htdocs/static/media/glyphicons-halflings-regular.89889688.svg
DEBUG: Loading: epl/priv/htdocs/static/media/glyphicons-halflings-regular.e18bbf61.ttf
DEBUG: Loading: epl/priv/htdocs/static/media/glyphicons-halflings-regular.f4769f9b.eot
DEBUG: Loading: epl/priv/htdocs/static/media/glyphicons-halflings-regular.fa277232.woff
DEBUG: Loading: epl/priv/htdocs/static/media/header_copy.4bb83c59.png
INFO:  Plugins []
INFO:  Started plugins: [{epl_dashboard_EPL,{ok,<0.90.0>}},
                         {epl_st_EPL,{ok,<0.91.0>}},
                         {epl_version_EPL,{ok,<0.92.0>}},
                         {epl_traffic_EPL,{ok,<0.93.0>}}]
Visit http://localhost:8000/

but once I open my browser at localhost:8000, I get the following error report in that console:

ERROR: timed out while collecting data from node

=ERROR REPORT==== 3-Apr-2017::10:53:07 ===
** Generic server epl_st terminating
** Last message in was {data,
                           {'canillita@127.0.0.1',{1491,227582,456213}},
                           [{process_count,135},
                            {memory_total,37707288},
                            {spawn,[]},
                            {exit,[]},
                            {send,
                                [{{#Port<10020.7107>,<10020.254.0>},1,0},
                                 {{<10020.254.0>,<10020.259.0>},1,0}]},
                            {send_self,[]},
                            {'receive',
                                [{#Port<10020.7107>,1,12},
                                 {<10020.158.0>,5,0},
                                 {<10020.254.0>,1,3}]},
                            {trace,[]}]}
** When Server state == {state,[]}
** Reason for termination ==
** {{timeout,{gen_server,call,
                         [epl_tracer,
                          {command,#Fun<supervisor.which_children.1>,
                                   [<10020.184.0>]}]}},
    [{gen_server,call,2,[{file,"gen_server.erl"},{line,204}]},
     {epl_st,command,2,[{file,"src/epl_st.erl"},{line,120}]},
     {epl_st,generate_sup_tree,1,[{file,"src/epl_st.erl"},{line,104}]},
     {epl_st,'-handle_info/2-fun-0-',2,[{file,"src/epl_st.erl"},{line,76}]},
     {lists,foldl,3,[{file,"lists.erl"},{line,1263}]},
     {epl_st,handle_info,2,[{file,"src/epl_st.erl"},{line,75}]},
     {gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,601}]},
     {gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,667}]}]}

=INFO REPORT==== 3-Apr-2017::10:53:07 ===
    application: epl_st
    exited: shutdown
    type: temporary
ERROR: timed out while collecting data from node
ERROR: timed out while collecting data from node
ERROR: timed out while collecting data from node
ERROR: timed out while collecting data from node
ERROR: timed out while collecting data from node
@arkgil
Copy link
Contributor

arkgil commented Apr 3, 2017

Does canillita start a lot of workers under some supervisor? That's my first guess, because one of the plugins timed out while querying tracer process for children of one of the supervisors.

@ruanpienaar
Copy link

ruanpienaar commented Apr 3, 2017

Or is that machine under heavy load, by not responding within the default of 5000ms ?
From the crash it's doesn't look insanely busy. mmm

@elbrujohalcon
Copy link
Author

Neither of those.

(canillita@127.0.0.1)4> [begin timer:sleep(5000), length(processes()) end || _ <- lists:seq(1, 10)].
[132,132,132,132,132,132,132,132,132,132]

(no processes created for about a minute)

And the machine is my computer… super quiet and peaceful, I promise :)

@elbrujohalcon
Copy link
Author

you can certainly try it in your computers as well… canillita is open-source

@ruanpienaar
Copy link

Well, i see there's missing tests! i might volunteer later ;)

@arkgil
Copy link
Contributor

arkgil commented Apr 3, 2017

It looks like supervisor:which_children is a gen_server:call with timeout set to infinity under the hood. The only question is: why doesn't it return?

@arkgil
Copy link
Contributor

arkgil commented Apr 3, 2017

I suspect this line might be a problem. Canillita doesn't start its top level supervisor, it returns its own PID in start/2 application callback, so call to application_master:get_child/1 returns application master itself. Application master expects only certain messages (https://github.com/erlang/otp/blob/1526eaead833b3bdcd3555a12e2af62c359e7868/lib/kernel/src/application_master.erl#L347) and discards any message it doesn't understand. Since we're waiting infinitely, there is a timeout on the side of gen_server:call.

In this case I'd recommend adding dumb top supervisor to canillita, if that wouldn't interfere with its design.

You can build erlangpl without failing plugin though. Clone the repo and follow these steps:

  • make ui
  • make rebar
  • make
  • rm -rf deps/epl_st
  • uncomment this line
  • ./bootstrap

This should build erlangpl without plugin which causes the crash. Let me know if you encounter any more problems 🙂

@arkgil
Copy link
Contributor

arkgil commented Apr 3, 2017

We're working on the plugin system which would allow builidng erlangpl with only selected plugins a breeze, so such cases will be much easier to troubleshoot.

@elbrujohalcon
Copy link
Author

Well, @arkgil … almost any app built with sumo_rest (like Canillita) will not have a supervision tree. They just don't need one.

I understand that returning self() might be incorrect. What would be the idiomatic OTP-ish way to implement an app that doesn't require a supervision tree, at all?

@arkgil
Copy link
Contributor

arkgil commented Apr 3, 2017

I wouldn't call myself an authority when it comes to OTP, but in such cases I always start dumb dangling supervisor. Of course we could check if returned PID is not the same as application master's PID, but even Erlang Manual mentions that start/2 should return top supervisor's PID. Also all standard OTP applications follow this behaviour.

I'm trying to run observer with canillita to se how it handles such cases.

@elbrujohalcon
Copy link
Author

Good points, @arkgil. I'm looking for the right way to solve this issue with my colleagues. I agree that returning {ok, self()} is not the way to go.

@arkgil
Copy link
Contributor

arkgil commented Apr 3, 2017

Setting up dummy supervisor isn't the most OTPish way too. In the end, why would we need one if there is nothing to supervise?

@ruanpienaar
Copy link

Would it help in anyway to be a bit more rebust?
i've seen people do the {ok,self()} thing more than once.

@arkgil
Copy link
Contributor

arkgil commented Apr 4, 2017

We might check if pid returned by application_master:get_child/1 is not the same as application master's pid. But it's only one of the cases. As a placeholder somebody could start a process which loops infinitely and discards all messages, which will result in the same crash as reported in this issue.

I'd be very happy to fix this, so if you have any idea, please let us know here 🙂

@michalslaski
Copy link
Member

Sure, any suggestions?

On the other hand I'm glad we dscovered this issue in Canillita. It manifests in obscure way, but indicates lack of OTP compliance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants