Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract text from Word textboxes [proposed label: enhancement] #688

Open
Mrodent opened this issue Mar 16, 2024 · 3 comments
Open

Extract text from Word textboxes [proposed label: enhancement] #688

Mrodent opened this issue Mar 16, 2024 · 3 comments

Comments

@Mrodent
Copy link

Mrodent commented Mar 16, 2024

I just did a read_docx as part of my testing for my project on a test .docx file with various things including a textbox.
Examining the resulting Value::Object I can't find the text in my textbox anywhere.
I can see from the crates.io page that at the bottom, under "Features", "Textbox" is left unticked.
Does this mean that the parsing basically ignores all textboxes?

And yet, when I uncompress the .docx file, in document.xml there it is, near the end:

"v:textbox style="mso-fit-shape-to-text:t"><w:txbxContent><w:p w:rsidR="0094123E" w:rsidRPr="00DF617B" w:rsidRDefault="0094123E" w:rsidP="0094123E"><w:pPr><w:ind w:left="0" w:firstLine="0"/></w:pPr><w:r w:rsidRPr="00DF617B"><w:t>Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</w:t></w:r></w:p><w:p w:rsidR="0094123E" w:rsidRDefault="0094123E"/></w:txbxContent></v:textbox>"

Have I got this right about omitting textboxes currently?

If so, any reason why this is not apparently currently included in the parsing? It's slightly irksome because it means I'll have to cobble together my own code to parse document.xml.

@bokuweb
Copy link
Owner

bokuweb commented Mar 18, 2024

@Mrodent Thanks for your report. Could you please provide docx?

@Mrodent
Copy link
Author

Mrodent commented Mar 27, 2024

Here's a small .docx file with a text box. On my setup the text in the text box is just ignored when I parse.
test_file_2.docx

... but if you uncompress you'll find what I've included in my previous post.

By the way, I have only Word 2007 installed ... this may make a difference to something.

@Mrodent
Copy link
Author

Mrodent commented May 17, 2024

Edited the title in the hope that you might find time to give this some thought. Omitting text from text-boxes seems a bit of an oversight, which could seemingly be corrected fairly easily...

@Mrodent Mrodent changed the title Clarification about Word textboxes? Extract text from Word textboxes [proposed label: enhancement] May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants