You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
spacy.prefer_gpu()
nlp=spacy.load(
"en_core_web_trf",
disable=['tagger', 'ner', 'lemmatizer', 'textcat']
)
node="""Some really long string, 3000 characters"""# simulating 96 pretty long docsnodes= [node*25]*96
Then, run each of the below lines separately and time it:
# 1 minute 7.5 s
[list(doc.sents) for doc in nlp.pipe(nodes, batch_size=96)]
# 1 minute 7.3 s
[list(doc.sents) for doc in nlp.pipe(nodes, batch_size=32)]
# 1 m 8.2 s
[list(doc.sents) for doc in nlp.pipe(nodes, batch_size=1)]
Running the same thing with en_core_web_lg results in substantial gains due to batching. Largest batch size is roughly 1/4 of the runtime of batch_size=1.
My understanding from the documentation and this issue is that we should expect significant gains from batching, as observed with en_core_web_lg. However, using en_core_web_trf does not yield significant gains from batching.
I'm wondering if this is a bug, or if we should not expect improved performance due to batching for a Transformer-Parser pipeline. Thanks for this awesome package, and in advance for your help!
The text was updated successfully, but these errors were encountered:
How to reproduce the behaviour
Then, run each of the below lines separately and time it:
Running the same thing with
en_core_web_lg
results in substantial gains due to batching. Largest batch size is roughly 1/4 of the runtime of batch_size=1.Your Environment
Using a single RTX A6000
python -m spacy info --markdown:
Info about spaCy
Expected Behavior
My understanding from the documentation and this issue is that we should expect significant gains from batching, as observed with
en_core_web_lg
. However, usingen_core_web_trf
does not yield significant gains from batching.I'm wondering if this is a bug, or if we should not expect improved performance due to batching for a Transformer-Parser pipeline. Thanks for this awesome package, and in advance for your help!
The text was updated successfully, but these errors were encountered: