Parallelizing PhyML bootstrapping #149

sevance · 2021-07-23T17:55:16Z

Hi all! Thanks so much for PhyML - has been super helpful in my microbiome analyses.

I have found that when I set bootstrapping value to >1, the program seems to create one tree at a time instead of in parallel. I wonder if I am leaving out an argument to allow it to use more threads or something similar? My run on 23 genes from 130 species with a bootstrap value of 100 took six days to run. The run on all 97 genes is now on day 18. I was hoping to also run PhyML with bootstrap = 1000, but the job will certainly exceed our cluster's time limit. If you have any advice on how to decrease the run time of the tool that would be very helpful!

sevance · 2021-07-29T17:59:59Z

In case anyone else runs across this problem I did find this in the PhyML manual.

Copy the phyml binary file into your working directory (I downloaded PhyML on to our university cluster so mine was located in /tools/miniconda3/envs/mafft/bin/phyml)
Install or load MPI
conda install -c conda-forge mpi
Run PhyML using mpirun - the -np flag designated number of cores, -b is bootstrap value
mpirun -np 10 ./phyml -d aa -b 100 --leave_duplicates -i inputfile

However, when I run this specifying any amount of cores (even 1) using -np, it the job fails with:

--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 16
slots that were requested by the application:

  ./phyml

Either request fewer slots for your application, or make more slots
available for use.

A "slot" is the Open MPI term for an allocatable unit where we can
launch a process.  The number of slots available are defined by the
environment in which Open MPI processes are run:

  1. Hostfile, via "slots=N" clauses (N defaults to number of
     processor cores if not provided)
  2. The --host command line parameter, via a ":N" suffix on the
     hostname (N defaults to 1 if not provided)
  3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
  4. If none of a hostfile, the --host command line parameter, or an
     RM is present, Open MPI defaults to the number of processor cores

In all the above cases, if you want Open MPI to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.

Alternatively, you can use the --oversubscribe option to ignore the
number of available slots when deciding the number of processes to
launch.
---------

This is even if I use the --oversubscribe flag. I am specifying the same number of CPUs in my job submission, so it's not clear to me why PhyML is unhappy. My command is below (ssub is a wrapper for sbatch job scheduler, -n name, -t time, -c CPUs):

ssub -n phy_16s_bin_tree -t 72 -c 16 "mpirun -np 1 ./phyml -d aa -b 100 --leave_duplicates --oversubscribe -i 16S_bin_tree_mafft_output.fas"

liamxg · 2024-04-08T00:15:48Z

Dear @sevance,
No model is specified in your command.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelizing PhyML bootstrapping #149

Parallelizing PhyML bootstrapping #149

sevance commented Jul 23, 2021

sevance commented Jul 29, 2021 •

edited

liamxg commented Apr 8, 2024

Parallelizing PhyML bootstrapping #149

Parallelizing PhyML bootstrapping #149

Comments

sevance commented Jul 23, 2021

sevance commented Jul 29, 2021 • edited

liamxg commented Apr 8, 2024

sevance commented Jul 29, 2021 •

edited