Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to replicate results? #224

Open
123sjbsjb opened this issue May 7, 2024 · 1 comment
Open

How to replicate results? #224

123sjbsjb opened this issue May 7, 2024 · 1 comment

Comments

@123sjbsjb
Copy link

My error

Generate the ESM2 embeddings for the proteins:

python datasets/esm_embedding_preparation.py

I downloaded the BindingMOAD dataset and extracted it into the "/data/BindingMOAD_2020_ab_processed_biounit" directory.
I ran "datasets/esm_embedding_preparation.py", But file run error "NotADirectoryError: [Errno 20] Not a directory: '/mnt/data/sjb/chem/DiffDock/DiffDock-main//data/BindingMOAD_2020_ab_processed_biounit/pdb_protein/._1xq6_1_protein.pdb/ _1xq6_1_protein.pdb_protein.pdb'".
1
I modified the default code to use moad and found that after the MOAD dataset set was decompressed, some files started with '.', I modified line 73 of "esm_embedding_preparation.py". From "names = [n[:6] for n in names]" to "names = [n[:6] for n in names if not n.startswith('.')]", the error that the file cannot be found is no longer reported.
2

But there are new problems, the following is all the errors:

(diffdock) sjb@amax:/mnt/data/sjb/chem/DiffDock/DiffDock-main$ python datasets/esm_embedding_preparation.py
0% | | 4/54984 [00:00<23:40, 38.69 it/s] / home/SJB /. Conda/envs/diffdock/lib/python3.9 / site - packages/Bio/PDB/StructureBuilder py: 127: PDBConstructionWarning: WARNING: Residue (' ', 5, ' ') redefined at line 2353.
warnings.warn(
/ home/SJB /. Conda/envs/diffdock/lib/python3.9 / site - packages/Bio/PDB/StructureBuilder py: 149: PDBConstructionWarning: WARNING: Residue (' ', 5, ' ','SER') already defined with the same name at line  2353.
warnings.warn(
/ home/SJB /. Conda/envs/diffdock/lib/python3.9 / site - packages/Bio/PDB/PDBParser py: 340: PDBConstructionWarning: PDBConstructionException: Atom N defined twice in residue <Residue SER het=  resseq=5 icode= > at line 2353.
Exception ignored.
Some atoms or residues may be missing in the data structure.
warnings.warn(
/ home/SJB /. Conda/envs/diffdock/lib/python3.9 / site - packages/Bio/PDB/PDBParser py: 340: PDBConstructionWarning: PDBConstructionException: Atom CA defined twice in residue <Residue SER het=  resseq=5 icode= > at line 2355.
Exception ignored.
Some atoms or residues may be missing in the data structure.
warnings.warn(
/ home/SJB /. Conda/envs/diffdock/lib/python3.9 / site - packages/Bio/PDB/PDBParser py: 340: PDBConstructionWarning: PDBConstructionException: Atom C defined twice in residue <Residue SER het=  resseq=5 icode= > at line 2357.
Exception ignored.
Some atoms or residues may be missing in the data structure.
warnings.warn(
/ home/SJB /. Conda/envs/diffdock/lib/python3.9 / site - packages/Bio/PDB/PDBParser py: 340: PDBConstructionWarning: PDBConstructionException: Atom O defined twice in residue <Residue SER het=  resseq=5 icode= > at line 2359.
Exception ignored.
Traceback (most recent call last):
File "/mnt/data/sjb/chem/DiffDock/DiffDock-main/datasets/esm_embedding_preparation.py", line 88, in <module>
l = get_structure_from_file(rec_path)
File "/mnt/data/sjb/chem/DiffDock/DiffDock-main/datasets/esm_embedding_preparation.py", line 28,  in get_structure_from_file
structure = biopython_parser.get_structure('random_id', file_path)
The File "/ home/SJB /. Conda/envs/diffdock/lib/python3.9 / site - packages/Bio/PDB/PDBParser py", line 92, in get_structure
self._parse(lines)
The File "/ home/SJB /. Conda/envs/diffdock/lib/python3.9 / site - packages/Bio/PDB/PDBParser py", line 115, in _parse
self.trailer = self._parse_coordinates(coords_trailer)
The File "/ home/SJB /. Conda/envs/diffdock/lib/python3.9 / site - packages/Bio/PDB/PDBParser py", line 240, in _parse_coordinates
structure_builder.init_residue(
The File "/ home/SJB /. Conda/envs/diffdock/lib/python3.9 / site - packages/Bio/PDB/StructureBuilder py", line 177, in init_residue
self.chain.add(self.residue)
The File "/ home/SJB /. Conda envs/diffdock/lib/python3.9 / site - packages/Bio/PDB/Entity. Py", line 215, in the add
raise PDBConstructionException("%s defined twice" % entity_id)
TypeError: not all arguments converted during string formatting

Dear Author How should this be resolved?If you can answer, thank you very much for your help

@123sjbsjb
Copy link
Author

123sjbsjb commented May 8, 2024

This seems to be an issue with the biopython package. I checked the "biopython/Bio/PDB/Entity. py" section of biopython.
They has been changed from

PDBConstructionException ("% s defined twice"% entity_id)

to

If self. has_id (entity_id):
Raise PDBConstructionException (f "{entity_id} defined twice")

I upgraded the version of biopython from1.76 to 1.83, although it can run, the MOAD dataset still displays a large number of "{entity_id} defined twices". This is completely different from the PDBbind dataset. Why is this?
3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant