Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the companies dataset #509

Open
luizfzs opened this issue Dec 31, 2019 · 2 comments
Open

Update the companies dataset #509

luizfzs opened this issue Dec 31, 2019 · 2 comments

Comments

@luizfzs
Copy link
Contributor

luizfzs commented Dec 31, 2019

What is the problem?
The companies dataset seems somewhat old (from 2016-09-03).

How can this be addressed?
The idea is to get information from http://receita.economia.gov.br/orientacao/tributaria/cadastros/cadastro-nacional-de-pessoas-juridicas-cnpj/dados-publicos-cnpj , parse it and use it as the source of information. The processed data can be used to update the '2016-09-03-companies.xz' to a newer dataset.

This can be set as the first step before Rosie starts the analysis, meaning that at every run, the company dataset is up-to-date.

Who could help with this issue?
Anyone can help with it.

Labels
[data collection]

@cuducos
Copy link
Collaborator

cuducos commented Dec 31, 2019

Great! I'm working on that on okfn-brasil/serenata-toolbox#218 BTW

Anyone feel free to jump in and take it from where I've left ; )

@cuducos
Copy link
Collaborator

cuducos commented Jan 15, 2020

The new dataset from okfn-brasil/serenata-toolbox#218 has a slightly different set of columns. This helps us draw a roadmap for this issue. My suggestion is:

  • Analyze the current Company model and Add company dataset abstraction serenata-toolbox#218's CSV to identify what fields need to be changed, deleted from or added to the model
  • Ponder on the possibility of storing activities in a JsonField (instead of in a related table)
  • Update the company related tests to describe the new intended scenario
  • Iterate over Python tests to adapt the codebase to the new scenario
  • Test and adapt the Elm (layers) interface to the new scenario

Surely the last two points are easier said than done, but I think this roadmap might be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants