Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelizing PhyML bootstrapping #149

Open
sevance opened this issue Jul 23, 2021 · 2 comments
Open

Parallelizing PhyML bootstrapping #149

sevance opened this issue Jul 23, 2021 · 2 comments

Comments

@sevance
Copy link

sevance commented Jul 23, 2021

Hi all! Thanks so much for PhyML - has been super helpful in my microbiome analyses.

I have found that when I set bootstrapping value to >1, the program seems to create one tree at a time instead of in parallel. I wonder if I am leaving out an argument to allow it to use more threads or something similar? My run on 23 genes from 130 species with a bootstrap value of 100 took six days to run. The run on all 97 genes is now on day 18. I was hoping to also run PhyML with bootstrap = 1000, but the job will certainly exceed our cluster's time limit. If you have any advice on how to decrease the run time of the tool that would be very helpful!

@sevance
Copy link
Author

sevance commented Jul 29, 2021

In case anyone else runs across this problem I did find this in the PhyML manual.

  1. Copy the phyml binary file into your working directory (I downloaded PhyML on to our university cluster so mine was located in /tools/miniconda3/envs/mafft/bin/phyml)
  2. Install or load MPI
    conda install -c conda-forge mpi
  3. Run PhyML using mpirun - the -np flag designated number of cores, -b is bootstrap value
    mpirun -np 10 ./phyml -d aa -b 100 --leave_duplicates -i inputfile

However, when I run this specifying any amount of cores (even 1) using -np, it the job fails with:

--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 16
slots that were requested by the application:

  ./phyml

Either request fewer slots for your application, or make more slots
available for use.

A "slot" is the Open MPI term for an allocatable unit where we can
launch a process.  The number of slots available are defined by the
environment in which Open MPI processes are run:

  1. Hostfile, via "slots=N" clauses (N defaults to number of
     processor cores if not provided)
  2. The --host command line parameter, via a ":N" suffix on the
     hostname (N defaults to 1 if not provided)
  3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
  4. If none of a hostfile, the --host command line parameter, or an
     RM is present, Open MPI defaults to the number of processor cores

In all the above cases, if you want Open MPI to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.

Alternatively, you can use the --oversubscribe option to ignore the
number of available slots when deciding the number of processes to
launch.
---------

This is even if I use the --oversubscribe flag. I am specifying the same number of CPUs in my job submission, so it's not clear to me why PhyML is unhappy. My command is below (ssub is a wrapper for sbatch job scheduler, -n name, -t time, -c CPUs):

ssub -n phy_16s_bin_tree -t 72 -c 16 "mpirun -np 1 ./phyml -d aa -b 100 --leave_duplicates --oversubscribe -i 16S_bin_tree_mafft_output.fas"

@liamxg
Copy link

liamxg commented Apr 8, 2024

Dear @sevance,
No model is specified in your command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants