You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I downloaded the BindingMOAD dataset and extracted it into the "/data/BindingMOAD_2020_ab_processed_biounit" directory.
I ran "datasets/esm_embedding_preparation.py", But file run error "NotADirectoryError: [Errno 20] Not a directory: '/mnt/data/sjb/chem/DiffDock/DiffDock-main//data/BindingMOAD_2020_ab_processed_biounit/pdb_protein/._1xq6_1_protein.pdb/ _1xq6_1_protein.pdb_protein.pdb'".
I modified the default code to use moad and found that after the MOAD dataset set was decompressed, some files started with '.', I modified line 73 of "esm_embedding_preparation.py". From "names = [n[:6] for n in names]" to "names = [n[:6] for n in names if not n.startswith('.')]", the error that the file cannot be found is no longer reported.
But there are new problems, the following is all the errors:
(diffdock) sjb@amax:/mnt/data/sjb/chem/DiffDock/DiffDock-main$ python datasets/esm_embedding_preparation.py
0% | | 4/54984 [00:00<23:40, 38.69 it/s] / home/SJB /. Conda/envs/diffdock/lib/python3.9 / site - packages/Bio/PDB/StructureBuilder py: 127: PDBConstructionWarning: WARNING: Residue (' ', 5, ' ') redefined at line 2353.
warnings.warn(
/ home/SJB /. Conda/envs/diffdock/lib/python3.9 / site - packages/Bio/PDB/StructureBuilder py: 149: PDBConstructionWarning: WARNING: Residue (' ', 5, ' ','SER') already defined with the same name at line 2353.
warnings.warn(
/ home/SJB /. Conda/envs/diffdock/lib/python3.9 / site - packages/Bio/PDB/PDBParser py: 340: PDBConstructionWarning: PDBConstructionException: Atom N defined twice in residue <Residue SER het= resseq=5 icode= > at line 2353.
Exception ignored.
Some atoms or residues may be missing in the data structure.
warnings.warn(
/ home/SJB /. Conda/envs/diffdock/lib/python3.9 / site - packages/Bio/PDB/PDBParser py: 340: PDBConstructionWarning: PDBConstructionException: Atom CA defined twice in residue <Residue SER het= resseq=5 icode= > at line 2355.
Exception ignored.
Some atoms or residues may be missing in the data structure.
warnings.warn(
/ home/SJB /. Conda/envs/diffdock/lib/python3.9 / site - packages/Bio/PDB/PDBParser py: 340: PDBConstructionWarning: PDBConstructionException: Atom C defined twice in residue <Residue SER het= resseq=5 icode= > at line 2357.
Exception ignored.
Some atoms or residues may be missing in the data structure.
warnings.warn(
/ home/SJB /. Conda/envs/diffdock/lib/python3.9 / site - packages/Bio/PDB/PDBParser py: 340: PDBConstructionWarning: PDBConstructionException: Atom O defined twice in residue <Residue SER het= resseq=5 icode= > at line 2359.
Exception ignored.
Traceback (most recent call last):
File "/mnt/data/sjb/chem/DiffDock/DiffDock-main/datasets/esm_embedding_preparation.py", line 88, in <module>
l = get_structure_from_file(rec_path)
File "/mnt/data/sjb/chem/DiffDock/DiffDock-main/datasets/esm_embedding_preparation.py", line 28, in get_structure_from_file
structure = biopython_parser.get_structure('random_id', file_path)
The File "/ home/SJB /. Conda/envs/diffdock/lib/python3.9 / site - packages/Bio/PDB/PDBParser py", line 92, in get_structure
self._parse(lines)
The File "/ home/SJB /. Conda/envs/diffdock/lib/python3.9 / site - packages/Bio/PDB/PDBParser py", line 115, in _parse
self.trailer = self._parse_coordinates(coords_trailer)
The File "/ home/SJB /. Conda/envs/diffdock/lib/python3.9 / site - packages/Bio/PDB/PDBParser py", line 240, in _parse_coordinates
structure_builder.init_residue(
The File "/ home/SJB /. Conda/envs/diffdock/lib/python3.9 / site - packages/Bio/PDB/StructureBuilder py", line 177, in init_residue
self.chain.add(self.residue)
The File "/ home/SJB /. Conda envs/diffdock/lib/python3.9 / site - packages/Bio/PDB/Entity. Py", line 215, in the add
raise PDBConstructionException("%s defined twice" % entity_id)
TypeError: not all arguments converted during string formatting
Dear Author How should this be resolved?If you can answer, thank you very much for your help
The text was updated successfully, but these errors were encountered:
This seems to be an issue with the biopython package. I checked the "biopython/Bio/PDB/Entity. py" section of biopython.
They has been changed from
PDBConstructionException ("% s defined twice"%entity_id)
to
Ifself. has_id (entity_id):
RaisePDBConstructionException (f"{entity_id} defined twice")
I upgraded the version of biopython from1.76 to 1.83, although it can run, the MOAD dataset still displays a large number of "{entity_id} defined twices". This is completely different from the PDBbind dataset. Why is this?
My error
Generate the ESM2 embeddings for the proteins:
I downloaded the BindingMOAD dataset and extracted it into the "/data/BindingMOAD_2020_ab_processed_biounit" directory.
I ran "datasets/esm_embedding_preparation.py", But file run error "NotADirectoryError: [Errno 20] Not a directory: '/mnt/data/sjb/chem/DiffDock/DiffDock-main//data/BindingMOAD_2020_ab_processed_biounit/pdb_protein/._1xq6_1_protein.pdb/ _1xq6_1_protein.pdb_protein.pdb'".
I modified the default code to use moad and found that after the MOAD dataset set was decompressed, some files started with '.', I modified line 73 of "esm_embedding_preparation.py". From "names = [n[:6] for n in names]" to "names = [n[:6] for n in names if not n.startswith('.')]", the error that the file cannot be found is no longer reported.
But there are new problems, the following is all the errors:
Dear Author How should this be resolved?If you can answer, thank you very much for your help
The text was updated successfully, but these errors were encountered: