Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on load from within protobuf in precompiled tensorflow 1.1 #26

Open
admsyn opened this issue Jun 14, 2017 · 3 comments
Open

Crash on load from within protobuf in precompiled tensorflow 1.1 #26

admsyn opened this issue Jun 14, 2017 · 3 comments

Comments

@admsyn
Copy link
Contributor

admsyn commented Jun 14, 2017

Hey Memo! This is with a fresh Ubuntu 17.04 and OF master (2c7b719).

fl@mallet:~/workspace/openFrameworks/addons/ofxMSATensorFlow/example-pix2pix/bin$ gdb -q ./example-pix2pix
Reading symbols from ./example-pix2pix...(no debugging symbols found)...done.
(gdb) run
Starting program: /home/fl/workspace/openFrameworks/addons/ofxMSATensorFlow/example-pix2pix/bin/example-pix2pix
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
0x00007ffff113404d in google::protobuf::FileOptions::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*) ()
   from /home/fl/workspace/openFrameworks/addons/ofxMSATensorFlow/libs/tensorflow/lib/linux64/libtensorflow_cc.so
(gdb)
  • I get this from example-basic as well
  • The build config is the "default" from running make
  • libtensorflow_cc.so is from lib_TF1.1_linux64_OPT_CUDA8.0_CUDNN5.1_2017_05_17.tar.gz
  • I also have tensorflow-gpu, the magenta repo, and fast-style-transfer installed and working fine separately (virtual envs, not used in the context of OF or ofxMSATensorFlow)

Full linux version:

Linux mallet 4.10.0-22-generic #24-Ubuntu SMP Mon May 22 17:43:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

More on this story as it develops..

EDIT: This does not happen with lib_TF1.0_linux64_NOPT_CUDA8.0_CUDNN5.1_2017_02_22.tar.gz, which works

@admsyn admsyn changed the title Crash on load from within protobuf lib in precompiled tensorflow Crash on load from within protobuf in precompiled tensorflow 1.1 Jun 14, 2017
@memo
Copy link
Owner

memo commented Jun 14, 2017

Hey, I'm still on 16.04 so I don't know if that is related in any way.

Interesting that there's no issue with the older lib. AND that one is a NOPT build (with debug info). I just realised that I didn't provide a NOPT build for TF1.1. I'll try to do this, but I'm travelling these days and could be difficult. Was the crash during a debug build or release? Could you try a release build? Previously I was encountering crashes (segmentation fault) when running a debug build app with the release build lib. However this hasn't been an issue for me lately.

@admsyn
Copy link
Contributor Author

admsyn commented Jun 15, 2017

This comment implies it could be an SSE support issue, and given that the non-optimized version is the one that works on my setup it seems reasonable that it has something to do with that..

I'll continue digging into it.

This processor is an old-ish i7-3820. Here's my grep sse < /cpu/procinfo :

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts

@somewacko
Copy link

(Haven't used this project, but chiming in from a TF thread)

@admsyn Are you dynamically linking another version of protobuf (either in your program, or possibly in another library you're using). One incredibly sneaky thing that TF's C++ API does is that it requires you to include all of it core internal code which includes TF's protobuf objects in its headers, which causes your program to become implicitly dependent on protobuf. Because of this, if you dynamically load a version of protobuf that is different than the one TF is compiled with somewhere, you get mysterious crashes like these since there is a mismatch between the protobuf your program is using, and the protobuf that is statically linked inside of the TF library. It's possible that this was causing the crash you're experiencing.

I've ran into this with a different project, and the workaround seems to be one of:

  1. Make sure you are using the same version of protobuf everywhere, including downstream dependencies (can be difficult/impossible for complex projects)
  2. Use a script in TensorFlow's source code that converts your protobuf source so that the namespace is named proto3 instead of proto.
  3. Use the C API, which provides an actual interface layer that separates your code from TF's internals (not applicable for this project though)

The core problem is with how TF's C++ API is designed, so there's no real solution for this, but it is something to be aware of when integrating TF in other C++ projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants