Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't strip the BOM from paypal csv downloads #78

Open
hollyhudson opened this issue Jun 17, 2019 · 3 comments
Open

Doesn't strip the BOM from paypal csv downloads #78

hollyhudson opened this issue Jun 17, 2019 · 3 comments

Comments

@hollyhudson
Copy link

I'm seeing this in both the pip-downloaded version and when I install from a clone of the github repo. Here's what the error looks like:

geode❀ ~/Nextcloud/ledger/personal/unprocessed🐟
ledger-autosync \
		-a 'Assets:Liquid:PayPal' \
		-l main.ledger \
		/Users/me/Nextcloud/ledger/personal/unprocessed/bom-test.csv \
Traceback (most recent call last):
  File "/usr/local/bin/ledger-autosync", line 11, in <module>
    load_entry_point('ledger-autosync==1.0.1', 'console_scripts', 'ledger-autosync')()
  File "/usr/local/lib/python3.7/site-packages/ledger_autosync-1.0.1-py3.7.egg/ledgerautosync/cli.py", line 353, in run
  File "/usr/local/lib/python3.7/site-packages/ledger_autosync-1.0.1-py3.7.egg/ledgerautosync/cli.py", line 187, in import_csv
  File "/usr/local/lib/python3.7/site-packages/ledger_autosync-1.0.1-py3.7.egg/ledgerautosync/sync.py", line 203, in parse_file
  File "/usr/local/lib/python3.7/site-packages/ledger_autosync-1.0.1-py3.7.egg/ledgerautosync/converter.py", line 524, in make_converter
Exception: Cannot determine CSV type
geode❀ ~/Nextcloud/ledger/personal/unprocessed🐟

And here's what a friend added to my Makefile to fix it:

BOM := $(shell bash -c "echo -e '\xEF\xBB\xBF'")
$(INPUTDIR)/%-nobom.csv: $(INPUTDIR)/%.csv
    sed '1s/^$(BOM)//' < $< \
    | grep -v '"Order",".*",".*","","","0.00"' \
    > $@

(Sorry, that's a bit messy, the extra stuff is in there for completion, so there's context if anyone wants to duplicate it.) The sed statement is the important part -- it searches for the BOM on line one of the file and replaces it with nothing. INPUTDIR is the path to my downloaded, unprocessed files, and we're making a cleaned up paypal-nobom.csv file that is dependent on the downloaded paypal.csv file. The grep line is to deal with another problem where ledger-autosync would choke whenever it encountered one of the numerous lines with no value in the amount field (why, paypal??), so we've added that as part of the BOM-removal process.

@egh
Copy link
Owner

egh commented Jun 18, 2019

Thanks for this! It's supposed to strip the BOM, see https://github.com/egh/ledger-autosync/blob/master/ledgerautosync/sync.py#L186. Maybe the BOM is appearing somewhere besides the absolute beginning of the file, or perhaps my code is wrong? Can you send me a 2 line paypal.csv with the data anonymized so I can replicate? Thanks.

@colindean
Copy link
Contributor

For a one-off or alternative method, running dos2unix on the PayPal export suffices to remove the BOM.

@torfjelde
Copy link

A bit late to the party, but this breaks on Python 3.10 because csv.DictReader expects f to be opened as text while the f.read(3) == codecs.BOM_UTF8 comparison only works if the file is opened as bytes. This means that the BOM is not stripped, while csv.reader (which is used in csv.DictReader) seems to read the f in bytes form (it's implemented in C, so haven't looked at the code of this), which then causes the BOM to be included in the fieldnames 😕

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants