Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[demo] Track accuracy over time. #92

Open
juharris opened this issue Jun 17, 2020 · 8 comments
Open

[demo] Track accuracy over time. #92

juharris opened this issue Jun 17, 2020 · 8 comments
Assignees
Labels
demo Relates to the code to demonstrate the framework enhancement New feature or request good first issue Good for newcomers

Comments

@juharris
Copy link
Contributor

juharris commented Jun 17, 2020

In database? On blockchain?

We now track the accuracy in the database (managed by the server). This is okay but it's centralized so it would be good to add some proof to that database.

@juharris juharris added enhancement New feature or request good first issue Good for newcomers demo Relates to the code to demonstrate the framework labels Jun 17, 2020
@hkaur008
Copy link
Contributor

Hi @juharris
I m have knowledge of database and blockchain
I would like to contribute to this project .
Could please explain the issue more and where to start ?

@juharris
Copy link
Contributor Author

Hey, thanks for reaching out! As people add data and train a model, the model's accuracy for some test set will change and I would like to track that accuracy's change over time. I think there's a lot to be done for this issue, but we can break it down in some steps. Ideally, for the highest transparency, we would compute the test set evaluation on-chain, but that would be very expensive and arguably wasteful. So what are the steps that we can make towards transparency? I think as a start, you can store the accuracy and timestamp in the table that you can set up in demo/server.js. Maybe you can also store a hash of test set data that was used and some other metadata about the test set? I think that's a decent start and once that is done, you can get an idea of other ways to store test set metrics. You can also get into zero-knowledge proofs or use hashes to prove that the right computation was done to perform evaluation.

@hkaur008
Copy link
Contributor

hkaur008 commented Apr 25, 2021

I think i just need to maintain a table of accuracy , timestamp , hashset and other meta data for time being then improve it and then improve this with hashing to prove that changes where made or get in zero-knowledge proofs as well.
Could you please assign me this issue ?

@hkaur008
Copy link
Contributor

hkaur008 commented May 3, 2021

whenever a new training sample (data set of a particular model changes) is added the accuracy of the model changes. then changed accuracy with timestamp needs to be recorded of the model in an SQLite table. I think every model is having the same data table. But every model will have different accuracies for same dataset or data . i need to maintain table for every model separately to track accuracy of every model with timestamp ? . Please correct me if i am wrong .

@juharris
Copy link
Contributor Author

juharris commented May 3, 2021

Using a new table for each model will be hard to manage, so they should all use the same table. You can use a column with a dataset name to help keep track of which dataset the model was tested against.

@hkaur008
Copy link
Contributor

hkaur008 commented May 4, 2021

so we can create a table who has following parameters transaction_hash ,id of model ,accuracy , timestamp ,
as model is already storing meta data and transaction_hash is primary key to location which particular data we are talking of , I need to store only accuracy and timestamp .

we are having following apis :-
// Health
// Get all models.
// Get model with specific ID.
// Insert a new model.
// DATA MANAGEMENT
// Insert a training sample.
// Get original training data.

when do I need to call function to check accuracy with which API ?

@juharris
Copy link
Contributor Author

juharris commented May 5, 2021

Thanks for the update! I don't think a transaction hash is appropriate for the location of data. I'm not sure what we should use. You can just mode a "data_location" column and we can figure out what to put in it later. It might vary. Data might be on-chain, at a URL, it can vary.

I believe I answer the question and functions in the PR: You should make 2 new functions.

@hkaur008
Copy link
Contributor

What is left in this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
demo Relates to the code to demonstrate the framework enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants