Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add diff functions #90

Open
mhkeller opened this issue Dec 23, 2019 · 3 comments
Open

Add diff functions #90

mhkeller opened this issue Dec 23, 2019 · 3 comments
Assignees

Comments

@mhkeller
Copy link
Owner

mhkeller commented Dec 23, 2019

Sometimes when I'm doing some data cleaning, I want to know the diff between my current attempt and a previous one. It would be interesting if there is an out function that would diff the two results.

For example, let's say you have a script that outputs this file

key,value
hello,1
hi,2

and then you make some changes and now it outputs this data

key,value
hello,1
hi,2
hey,3

This function would write out the diff of these two

@mhkeller mhkeller self-assigned this Dec 23, 2019
@mhkeller
Copy link
Owner Author

mhkeller commented Dec 23, 2019

It could be its own function that compares the data against the existing file:

io.writeDiffSync('path/to/file.csv', 'path/to/diff.diff', data)

If 'path/to/diff.diff' already exists, it could sequentially make a new file. maybe that's configured by its own option in case you want to overwrite it

io.writeDiffSync('path/to/file.csv', 'path/to/diff.diff', data, { overwrite: false })

Or the third argument is the out path and is optional. if it doesn't exist, it outputs to the console

// to file
io.writeDiffSync('path/to/file.csv', data, 'path/to/diff.diff',  { overwrite: false })
// to console
io.writeDiffSync('path/to/file.csv', data,  { overwrite: false })

It gets a little confusing since you have that options object already optional.

@dhalford
Copy link

dhalford commented Jan 8, 2020

I think an actual diff file (text file with HEAD >>> . <<<<) wouldn't be all that useful in the csv context (can't open it in excel for one) but if there was an output of diff.csv that could keep rows intact that would be great.

Of course because you would need three sets of data: modified, added, removed that would be difficult to pull into a single .csv file unless you introduce custom row headers to separate out the content.

As an aside, I've always dreamed of the equivalent of Excel sheets within a single .csv file, which could be achieved by having some kind of row divider to separate out each sheet (could be used here as well)

@dhalford
Copy link

dhalford commented Jan 8, 2020

One more thing that could prove difficult is how you identify rows. Above as key, value key is an id which you can use but what if it is something like:

number,foo
1,hello
2,hi
3,hola
number,foo
1,hello
8,hi
9,hola

Was the operation that the last two rows were modified? or were two rows removed and two completely separate rows added?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants