Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cpp] How to extract only part of files when creating database #16237

Closed
lianxv-primer opened this issue Apr 17, 2024 · 11 comments
Closed

[cpp] How to extract only part of files when creating database #16237

lianxv-primer opened this issue Apr 17, 2024 · 11 comments
Labels
question Further information is requested

Comments

@lianxv-primer
Copy link

Our project involves many third-party libraries, what I concerned is the part we write. So I want to extract only party of files when creating database to reduce time costs and size of database.

My database creating command like :
codeql database create C:\test\codeql-database --source-root "E:\test-project-code\src" --language=cpp --command="call build_win_codeql.bat" --threads=0 --verbose --overwrite --mode=clear --min-disk-free=100000

I read the docs that we can customize the behavior of extractors by setting extractor configuration. but the cpp extrator options may be none :
"extractor_options" : { }

How can I optimize the database creating commands

@lianxv-primer lianxv-primer added the question Further information is requested label Apr 17, 2024
@jketema
Copy link
Contributor

jketema commented Apr 17, 2024

Hi @lianxv-primer,

I would strongly recommend against doing anything like this, as some security problems may depend on the ability to analyse dataflow through some of your third-party libraries. This will no longer be possible when those libraries are not present in the database.

If you really do want to do this, and your build system supports incrementally rebuilding the source code, then you could try to attempt the following (note that we in no way support this):

  1. Build your code as you do normally
  2. Delete all object files, precompiled headers, libraries, and executables that relate to your own source code (keep the ones that relate to third-party libraries)
  3. Run codeql database create such that the supplied command only rebuilds the deleted files.

@lianxv-primer
Copy link
Author

Hi @lianxv-primer,

I would strongly recommend against doing anything like this, as some security problems may depend on the ability to analyse dataflow through some of your third-party libraries. This will no longer be possible when those libraries are not present in the database.

If you really do want to do this, and your build system supports incrementally rebuilding the source code, then you could try to attempt the following (note that we in no way support this):

  1. Build your code as you do normally
  2. Delete all object files, precompiled headers, libraries, and executables that relate to your own source code (keep the ones that relate to third-party libraries)
  3. Run codeql database create such that the supplied command only rebuilds the deleted files.

ok , this sounds like a solution when have to do this.
When I create databases, I get many errors like this:

[2024-04-17 00:06:20] Importing 96f07e7c3376ac5f473f4bab.trap (trace_log.cc.11991bd7_0.trap.tar.br) for no link target (6974499 of 7859950) [2024-04-17 00:06:20] [ERROR] dataset import> 9fafff7e4bf067c0c66b0ca7.trap (connection.cc.a0da4be2_0.trap.tar.br) for no link target, 38: com.semmle.util.exception.CatastrophicError: ID 94380083 is already mapped to 19293495 com.semmle.inmemory.util.DiskIdStore.append(DiskIdStore.java:63) com.semmle.inmemory.util.NonSequentialDiskPool.insert(NonSequentialDiskPool.java:105) com.semmle.inmemory.util.NonSequentialDiskPool.insertBucket(NonSequentialDiskPool.java:47) com.semmle.inmemory.util.DiskPool.getIdWithFreshness(DiskPool.java:171) com.semmle.inmemory.util.NonSequentialDiskPool.getIdWithFreshness(NonSequentialDiskPool.java:14) com.semmle.inmemory.populate.IdPool.lookupWithFreshness(IdPool.java:66) com.semmle.inmemory.trap.SynchronizedIdAllocator.getElementIdAndFreshness(SynchronizedIdAllocator.java:23) com.semmle.inmemory.trap.FreshIdAllocator.getElementId(FreshIdAllocator.java:22) com.semmle.inmemory.trap.TRAPReader$MetaStringBuilder.getElementId(TRAPReader.java:1016) com.semmle.inmemory.trap.TRAPReader.computeID(TRAPReader.java:1090) com.semmle.inmemory.trap.TRAPReader.computeID(TRAPReader.java:1085) com.semmle.inmemory.trap.TRAPReader.scanLabelKey(TRAPReader.java:829) com.semmle.inmemory.trap.TRAPReader.scanLabelValue(TRAPReader.java:801) com.semmle.inmemory.trap.TRAPReader.scanTuplesAndLabels(TRAPReader.java:505) com.semmle.inmemory.trap.TRAPReader.importTuples(TRAPReader.java:414) com.semmle.inmemory.trap.ImportTasksProcessor.process(ImportTasksProcessor.java:234) com.semmle.inmemory.trap.ImportTasksProcessor.lambda$importTrap$1(ImportTasksProcessor.java:154) com.semmle.util.concurrent.FutureUtils.lambda$mapAsync_$8(FutureUtils.java:161) java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source) java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) java.base/java.lang.Thread.run(Unknown Source) at (failed to read line: asked for -1 bytes which seems wrong!)

is it something wrong ?

@jketema
Copy link
Contributor

jketema commented Apr 17, 2024

This even happens when you delete the database directory, before doing the third step?

@lianxv-primer
Copy link
Author

lianxv-primer commented Apr 17, 2024

This even happens when you delete the database directory, before doing the third step?

No, directly rebuild all project.
all build-trace like :

...
...
[2024-04-16 11:42:00] [build-stdout] MSBuild version 17.9.8+b34f75857 for .NET Framework
[2024-04-16 11:42:00] [build-stdout] Build started 4/16/2024 11:42:00 AM.
...
[2024-04-16 21:04:28] [build-stdout] 4134 Warning(s)
[2024-04-16 21:04:28] [build-stdout] 0 Error(s)
[2024-04-16 21:04:28] [build-stdout] Time Elapsed 09:22:26.82
[2024-04-16 21:04:28] Plumbing command codeql database trace-command completed.
[2024-04-16 21:04:28] [PROGRESS] database create> Finalizing database at C:\test\devops-codeql-database.
[2024-04-16 21:04:28] Running plumbing command: codeql database finalize --threads=0 --mode=clear --min-disk-free=10000 --no-db-cluster -- C:\test\devops-codeql-database
[2024-04-16 21:04:28] Using pre-finalize script C:\codeql-home\codeql\cpp\tools\pre-finalize.cmd.
[2024-04-16 21:04:28] [PROGRESS] database finalize> Running pre-finalize script C:\codeql-home\codeql\cpp\tools\pre-finalize.cmd in C:\devops\p-6ac70a2931f74eb2a2452fb8f52372e1\src.
[2024-04-16 21:04:28] Running plumbing command: codeql database trace-command --working-dir=C:\devops\p-6ac70a2931f74eb2a2452fb8f52372e1\src --no-tracing --threads=0 -- C:\test\devops-codeql-database C:\codeql-home\codeql\cpp\tools\pre-finalize.cmd
[2024-04-16 21:04:28] [PROGRESS] database trace-command> Running command in C:\devops\p-6ac70a2931f74eb2a2452fb8f52372e1\src: [C:\codeql-home\codeql\cpp\tools\pre-finalize.cmd]
[2024-04-16 21:04:28] Plumbing command codeql database trace-command completed.
[2024-04-16 21:04:28] [PROGRESS] database finalize> Running TRAP import for CodeQL database at C:\test\devops-codeql-database...
[2024-04-16 21:04:28] Running plumbing command: codeql dataset import --dbscheme=C:\codeql-home\codeql\cpp\semmlecode.cpp.dbscheme --threads=0 -- C:\test\devops-codeql-database\db-cpp C:\test\devops-codeql-database\trap\cpp
[2024-04-16 21:04:28] Clearing disk cache since the version file C:\test\devops-codeql-database\db-cpp\default\cache\version does not exist
[2024-04-16 21:04:29] Tuple pool not found. Clearing relations with cached strings
[2024-04-16 21:04:29] Trimming disk cache at C:\test\devops-codeql-database\db-cpp\default\cache in mode clear.
[2024-04-16 21:04:29] Sequence stamp origin is -6195435911585769745
[2024-04-16 21:04:29] Pausing evaluation to hard-clear memory at sequence stamp o+0
[2024-04-16 21:04:29] Unpausing evaluation
[2024-04-16 21:04:29] Pausing evaluation to quickly trim disk at sequence stamp o+1
[2024-04-16 21:04:29] Unpausing evaluation
[2024-04-16 21:04:29] Pausing evaluation to zealously trim disk at sequence stamp o+2
[2024-04-16 21:04:29] Unpausing evaluation
[2024-04-16 21:04:29] Trimming completed (12ms): Purged everything.
[2024-04-16 21:04:29] Scanning for files in C:\test\devops-codeql-database\trap\cpp
[2024-04-16 21:05:01] Found 18594 files on disk containing 7859950 TRAP files (239.02 GiB)
[2024-04-16 21:05:01] [PROGRESS] dataset import> Grouping TRAP files by link target
[2024-04-16 21:11:37] [PROGRESS] dataset import> Grouping unlinked TRAP files together
[2024-04-16 21:12:11] [PROGRESS] dataset import> Scanning TRAP files
...
[2024-04-16 21:55:32] Scanning trace_log.cc.11991bd7.trap (trace_log.cc.11991bd7_0.trap.tar.br) (6891731 of 7859950)
...
[2024-04-17 00:06:20] Importing 96f07e7c3376ac5f473f4bab.trap (trace_log.cc.11991bd7_0.trap.tar.br) for no link target (6974499 of 7859950) [2024-04-17 00:06:20] [ERROR] dataset import> 9fafff7e4bf067c0c66b0ca7.trap (connection.cc.a0da4be2_0.trap.tar.br) for no link target, 38: com.semmle.util.exception.CatastrophicError: ID 94380083 is already mapped to 19293495 com.semmle.inmemory.util.DiskIdStore.append(DiskIdStore.java:63) com.semmle.inmemory.util.NonSequentialDiskPool.insert(NonSequentialDiskPool.java:105) com.semmle.inmemory.util.NonSequentialDiskPool.insertBucket(NonSequentialDiskPool.java:47) com.semmle.inmemory.util.DiskPool.getIdWithFreshness(DiskPool.java:171) com.semmle.inmemory.util.NonSequentialDiskPool.getIdWithFreshness(NonSequentialDiskPool.java:14) com.semmle.inmemory.populate.IdPool.lookupWithFreshness(IdPool.java:66) com.semmle.inmemory.trap.SynchronizedIdAllocator.getElementIdAndFreshness(SynchronizedIdAllocator.java:23) com.semmle.inmemory.trap.FreshIdAllocator.getElementId(FreshIdAllocator.java:22) com.semmle.inmemory.trap.TRAPReader$MetaStringBuilder.getElementId(TRAPReader.java:1016) com.semmle.inmemory.trap.TRAPReader.computeID(TRAPReader.java:1090) com.semmle.inmemory.trap.TRAPReader.computeID(TRAPReader.java:1085) com.semmle.inmemory.trap.TRAPReader.scanLabelKey(TRAPReader.java:829) com.semmle.inmemory.trap.TRAPReader.scanLabelValue(TRAPReader.java:801) com.semmle.inmemory.trap.TRAPReader.scanTuplesAndLabels(TRAPReader.java:505) com.semmle.inmemory.trap.TRAPReader.importTuples(TRAPReader.java:414) com.semmle.inmemory.trap.ImportTasksProcessor.process(ImportTasksProcessor.java:234) com.semmle.inmemory.trap.ImportTasksProcessor.lambda$importTrap$1(ImportTasksProcessor.java:154) com.semmle.util.concurrent.FutureUtils.lambda$mapAsync_$8(FutureUtils.java:161) java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source) java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) java.base/java.lang.Thread.run(Unknown Source) at (failed to read line: asked for -1 bytes which seems wrong!)
...
...

@jketema
Copy link
Contributor

jketema commented Apr 17, 2024

This even happens when you delete the database directory, before doing the third step?

No, directly rebuild all project.

I don't understand the answer. Did you delete the database directory immediately before running codeql database create? If not could you try that? I'd like be be sure the problems are not due to stale data in the database directory.

@lianxv-primer
Copy link
Author

This even happens when you delete the database directory, before doing the third step?

No, directly rebuild all project.

I don't understand the answer. Did you delete the database directory immediately before running codeql database create? If not could you try that? I'd like be be sure the problems are not due to stale data in the database directory.

yes, I delete the database directory before running codeql database create. And I use --overwrite options.

@jketema
Copy link
Contributor

jketema commented Apr 17, 2024

Thanks for confirming. This means the approach is suggested apparently doesn't work in your case. Note that, as the approach is not supported, I cannot do much more for you here.

@lianxv-primer
Copy link
Author

Thanks for confirming. This means the approach is suggested apparently doesn't work in your case. Note that, as the approach is not supported, I cannot do much more for you here.

I haven’t using the suggested method yet. The log above is the result of my full compilation yesterday.

@jketema
Copy link
Contributor

jketema commented Apr 17, 2024

I haven’t using the suggested method yet. The log above is the result of my full compilation yesterday.

Apologies. I misunderstood in that case. Could you open a new issue for that, so we can discuss that separately?

@lianxv-primer
Copy link
Author

lianxv-primer commented Apr 17, 2024

I haven’t using the suggested method yet. The log above is the result of my full compilation yesterday.

Apologies. I misunderstood in that case. Could you open a new issue for that, so we can discuss that separately?

sure! new issue address:#16239

@jketema
Copy link
Contributor

jketema commented Apr 17, 2024

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants