Skip to content

Commit

Permalink
Squashed commit of the following:
Browse files Browse the repository at this point in the history
commit a196e66
Author: greenfish <greenfish77@gmail.com>
Date:   Sat Mar 26 00:18:13 2022 +0900

    1.1 version update

commit 12830b2
Author: greenfish <greenfish77@gmail.com>
Date:   Sat Mar 26 00:16:35 2022 +0900

    Squashed commit of the following:

    commit 159d347
    Author: greenfish <greenfish77@gmail.com>
    Date:   Sat Mar 26 00:10:32 2022 +0900

        dev

    commit 3b5670c
    Author: greenfish <greenfish77@gmail.com>
    Date:   Thu Mar 24 23:00:55 2022 +0900

        dev

    commit bd12401
    Author: greenfish <greenfish77@gmail.com>
    Date:   Mon Mar 21 23:10:47 2022 +0900

        dev

    commit ee1b86b
    Author: greenfish <greenfish77@gmail.com>
    Date:   Sun Mar 20 16:07:42 2022 +0900

        dev

    commit c2276d3
    Author: greenfish <greenfish77@gmail.com>
    Date:   Sun Mar 20 01:46:15 2022 +0900

        dev

    commit c3977bb
    Author: greenfish <greenfish77@gmail.com>
    Date:   Sun Mar 20 01:17:31 2022 +0900

        dev

    commit 677d973
    Author: greenfish <greenfish77@gmail.com>
    Date:   Sat Mar 19 23:48:03 2022 +0900

        dev

    commit 417ae82
    Author: greenfish <greenfish77@gmail.com>
    Date:   Sat Mar 19 12:05:58 2022 +0900

        dev

    commit c162c5a
    Author: greenfish <greenfish77@gmail.com>
    Date:   Sat Mar 19 10:51:58 2022 +0900

        dev

    commit e90a8bc
    Author: greenfish <greenfish77@gmail.com>
    Date:   Sat Mar 19 01:41:38 2022 +0900

        dev

    commit 0256dbe
    Author: greenfish <greenfish77@gmail.com>
    Date:   Sat Mar 19 00:16:57 2022 +0900

        dev

    commit e914b84
    Author: greenfish <greenfish77@gmail.com>
    Date:   Sun Mar 13 00:11:05 2022 +0900

        remove_chunk done.

    commit 7ec57fe
    Author: greenfish <greenfish77@gmail.com>
    Date:   Sat Mar 12 18:05:16 2022 +0900

        dev

    commit fe11979
    Author: greenfish <greenfish77@gmail.com>
    Date:   Sat Mar 12 15:32:02 2022 +0900

        dev

    commit 53209ab
    Author: greenfish <greenfish77@gmail.com>
    Date:   Sat Mar 12 13:29:32 2022 +0900

        add instance.actual

    commit be15a01
    Author: greenfish <greenfish77@gmail.com>
    Date:   Sat Mar 12 01:46:14 2022 +0900

        add remove chunk.

    commit 040919d
    Author: greenfish <greenfish77@gmail.com>
    Date:   Fri Mar 11 23:56:44 2022 +0900

        LR, develop_util

        text file LR mode.
        add develop_util.hpp.

    commit 4af25fe
    Author: greenfish <greenfish77@gmail.com>
    Date:   Thu Mar 10 20:18:56 2022 +0900

        add limit_instance_size.

        develop limit_instance_size.
  • Loading branch information
greenfish77 committed Mar 25, 2022
1 parent 77cc200 commit 8e29ad7
Show file tree
Hide file tree
Showing 21 changed files with 774 additions and 82 deletions.
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ cmake_minimum_required(VERSION 3.6)
# settings
##########
# project.
project(gaenari VERSION 1.0.0.1)
project(gaenari VERSION 1.1)

#################################
# auto git ignore build directory
Expand Down
20 changes: 19 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ therefore, in this case, it adaps to new data through continous `incremental lea

![design](./doc/img/design.gif)

a single `decision tree` and `dataset` are implemented in `ganari`.
a single `decision tree` and `dataset` are implemented in `gaenari`.
`supul` implements a public supul methods that can be called externally.
database and model processing for incremental learning are key.

Expand Down Expand Up @@ -665,4 +665,22 @@ see the comments in the code for detail.
|||errmsg|
|property||set_property|
|||get_property|
|||save|
|||reload|
|test||verify|
### property
the property.txt file in the project directory is the configuration file.
call set_property() or modify it yourself. see the comments in property.txt for detail.
|name|change possible|type|default|desc|
|-|:-:|-|-|-|
|ver||str||library version|
|db.type||str|`none`|support `sqlite`|
|db.tablename.prefix||str||set prefix table name|
|model.weak_treenode_condition.accuracy|O|double|0.8|see comment|
|model.weak_treenode_condition.total_count|O|int|5|see comment|
|limit.chunk.use|O|bool|true|see comment|
|limit.chunk.instance_lower_bound|O|int|1000000|see comment|
|limit.chunk.instance_upper_bound|O|int|2000000|see comment|
6 changes: 3 additions & 3 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,16 @@ TODO
- [ ] cache synchronization.
- [ ] event log.
- [ ] avoid hard-coded train options.
- [ ] reducing the number of instances in the DB (or limit size).
- [ ] reducing the size of model in the DB (or limit size).
- [ ] support other databases(postgresql, ...).
- [ ] tree model to json.

### In Progress

### Done ✓
##### 1.0.0.0
##### 1.0
- [X] first release.

##### 1.1
- [X] reducing the number of instances in the DB (or limit size).
### abandoned
- [ ] ~~nothing~~
15 changes: 15 additions & 0 deletions include/gaenari/gaenari/common/misc.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -388,6 +388,21 @@ inline const std::string& version(void) {
return v;
}

// the order of insert_order_map is not checked.
// a simple (key, value) pair matching check.
template<typename t1, typename t2>
inline bool compare_map_content(_in const t1& m1, _in const t2& m2) {
if (m1.size() != m2.size()) return false;
for (const auto& it: m1) {
const auto& k = it.first;
const auto& v = it.second;
const auto& f = m2.find(k);
if (f == m2.end()) return false;
if (v != f->second) return false;
}
return true;
}

} // namespace common
} // namespace gaenari

Expand Down
2 changes: 1 addition & 1 deletion include/gaenari/gaenari/common/property.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -359,7 +359,7 @@ struct prop {

bool reload(void) {
std::string p = this->_path;
return read(p);
return read(p, true);
}

bool all_keys_existed(_in const std::vector<std::string>& names) const {
Expand Down
32 changes: 32 additions & 0 deletions include/gaenari/supul/common/util.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -516,6 +516,38 @@ inline void verify_confusion_matrix(_in const std::vector<type::map_variant>& a,
// passed.
}

inline bool is_version_update(_in const std::string& cur, _in const std::string& next) {
std::vector<std::string> cur_items;
std::vector<std::string> next_items;

// parse.
gaenari::dataset::csv_reader::parse_delim(cur, '.', cur_items);
gaenari::dataset::csv_reader::parse_delim(next,'.', next_items);

// set four items.
if (cur_items.size() == 3) cur_items.emplace_back("0");
if (next_items.size() == 3) next_items.emplace_back("0");

// valid check.
if ((cur_items.size() != 4) or (next_items.size() != 4)) THROW_SUPUL_ERROR2("invalid version, %0 or %1.", cur, next);
for (size_t i=0; i<4; i++) {
if (cur_items[i].empty() or next_items[i].empty()) THROW_SUPUL_ERROR2("invalid version, %0 or %1.", cur, next);
if (not (std::all_of(cur_items[i].begin(), cur_items[i].end(), ::isdigit) and
std::all_of(next_items[i].begin(),next_items[i].end(),::isdigit))) {
THROW_SUPUL_ERROR2("invalid version, %0 or %1.", cur, next);
}
}

// compare.
for (size_t i=0; i<4; i++) {
auto c = std::atoi(cur_items[i].c_str());
auto n = std::atoi(next_items[i].c_str());
if (c < n) return true;
if (c > n) return false;
}
return false;
}

} // common
} // supul

Expand Down
57 changes: 56 additions & 1 deletion include/gaenari/supul/db/base.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -173,8 +173,24 @@ class base {
// UPDATE chunk SET updated=?, initial_correct_count=?, total_count=?, initial_accuracy=? WHERE id=?
virtual void update_chunk(_in int64_t chunk_id, _in bool updated, _in int64_t initial_correct_count, _in int64_t total_count, _in double initial_accuracy) = 0;

// update chunk total_count.
// UPDATE chunk SET total_count=? WHERE id=?
virtual void update_chunk_total_count(_in int64_t chunk_id, _in int64_t total_count) = 0;

// get chunk list.
// SELECT * FROM chunk ORDER BY datetime ASC
virtual void get_chunk_list(_in callback_query cb) = 0;

// get chunk_updated.
// SELECT updated FROM chunk WHERE id=?
virtual bool get_chunk_updated(_in int64_t chunk_id) = 0;

// update leaf_info.
// UPDATE leaf_info SET correct_count = correct_count + ?, total_count = total_count + ?, accuracy = (CAST(correct_count AS REAL) + ?) / (total_count + ?) WHERE id = ?
// UPDATE leaf_info
// SET correct_count=correct_count+?,
// total_count=total_count+?,
// accuracy=(CASE WHEN total_count+?=0 THEN 0.0 ELSE (CASE(correct_count AS REAL)+?)/(total_count+?) END)
// WHERE=?
virtual void update_leaf_info(_in int64_t leaf_info_id, _in int64_t increment_correct_count, _in int64_t increment_total_count) = 0;

// get_root_ref_treenode_id.
Expand Down Expand Up @@ -246,6 +262,45 @@ class base {
// SELECT ref_generation_id FROM treenode WHERE id=?
virtual int64_t get_generation_id_by_treenode_id(_in int64_t treenode_id) = 0;

// get leaf_info by chunk_id.
// SELECT instance."%y" as "instance.actual",
// instance_info.weak_count as "instance_info.weak_count",
// leaf_info.*
// FROM instance_info
// INNER JOIN instance ON instance_info.id = instance.id
// INNER JOIN treenode ON instance_info.ref_leaf_treenode_id = treenode.id
// INNER JOIN leaf_info ON treenode.ref_leaf_info_id = leaf_info.id
// WHERE instance_info.ref_chunk_id = ?
virtual void get_leaf_info_by_chunk_id(_in int64_t chunk_id, _in callback_query cb) = 0;

// get chunk total_count by chunk_id.
// SELECT total_count FROM chunk WHERE id=?
virtual int64_t get_total_count_by_chunk_id(_in int64_t chunk_id) = 0;

// delete instance_by_chunk_id.
// DELETE FROM instance
// WHERE id IN(
// SELECT instance_info.id
// FROM instance_info
// INNER JOIN chunk ON instance_info.ref_chunk_id = chunk.id
// WHERE chunk.id = ?
// )
virtual void delete_instance_by_chunk_id(_in int64_t chunk_id) = 0;

// delete instance_info_by_chunk_id.
// DELETE FROM instance_info
// WHERE id IN(
// SELECT instance_info.id
// FROM instance_info
// INNER JOIN chunk ON instance_info.ref_chunk_id = chunk.id
// WHERE chunk.id = ?
// )
virtual void delete_instance_info_by_chunk_id(_in int64_t chunk_id) = 0;

// delete chunk_by_id.
// DELETE FROM chunk WHERE id=?
virtual void delete_chunk_by_id(_in int64_t chunk_id) = 0;

// get global_row_count.
// SELECT COUNT(*) FROM global
virtual int64_t get_global_row_count(void) = 0;
Expand Down
6 changes: 5 additions & 1 deletion include/gaenari/supul/db/sqlite/impl/sqlite.engine.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,7 @@ inline void sqlite_t::execute(_in stmt stmt_type, _in const type::vector_variant
}
sqlite3_reset(stmt.stmt);
} catch(...) {
exceptions::catch_all();
error = true;
sqlite3_reset(stmt.stmt);
}
Expand Down Expand Up @@ -358,7 +359,10 @@ inline auto sqlite_t::execute(_in stmt stmt_type, _in const type::vector_variant
inline sqlite3_stmt* sqlite_t::get_stmt(_in const std::string& sql) {
sqlite3_stmt* stmt = nullptr;
int rc = sqlite3_prepare(this->db, sql.c_str(), -1, &stmt, nullptr);
if (rc != SQLITE_OK) THROW_SUPUL_DB_ERROR("fail to sqlite3_prepare.", error_desc(rc));
if (rc != SQLITE_OK) {
gaenari::logger::error("sql error: {0}", {sql});
THROW_SUPUL_DB_ERROR("fail to sqlite3_prepare.", error_desc(rc));
}
return stmt;
}

Expand Down
122 changes: 120 additions & 2 deletions include/gaenari/supul/db/sqlite/impl/sqlite.impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -256,8 +256,30 @@ inline void sqlite_t::build_stmt_pool(void) {
stmt_info{get_stmt(sql),
{}}});

// update chunk_total_count.
sql = schema.get_sql("UPDATE ${chunk} SET total_count=? WHERE id=?");
stmt_pool.insert({ stmt::update_chunk_total_count,
stmt_info{get_stmt(sql),
{}}});

// get chunk_list.
sql = schema.get_sql("SELECT * FROM ${chunk} ORDER BY datetime ASC");
stmt_pool.insert({ stmt::get_chunk_list,
stmt_info{get_stmt(sql),
schema.fields_include(type::table::chunk, {})}});

// get chunk_updated.
sql = schema.get_sql("SELECT updated FROM ${chunk} WHERE id=?");
stmt_pool.insert({ stmt::get_chunk_updated,
stmt_info{get_stmt(sql),
type::fields{{"updated", type::field_type::TINYINT}}}});

// update leaf_info.
sql = schema.get_sql("UPDATE ${leaf_info} SET correct_count=correct_count+?, total_count=total_count+?, accuracy=(CAST(correct_count AS REAL)+?)/(total_count+?) WHERE id=?");
sql = schema.get_sql("UPDATE ${leaf_info} "
"SET correct_count=correct_count+?, "
"total_count=total_count+?, "
"accuracy=(CASE WHEN total_count+?=0 THEN 0.0 ELSE (CAST(correct_count AS REAL)+?)/(total_count+?) END) "
"WHERE id=?");
stmt_pool.insert({ stmt::update_leaf_info,
stmt_info{get_stmt(sql),
{}}});
Expand Down Expand Up @@ -343,6 +365,63 @@ inline void sqlite_t::build_stmt_pool(void) {
stmt_info{get_stmt(sql),
schema.fields_include(type::table::treenode, {"ref_generation_id"})}});

// get leaf_info_by_chunk_id.
auto& _leaf_info_table = schema.get_table_info(type::table::leaf_info);
_names = common::get_names(_leaf_info_table.fields, {}, false, "${leaf_info}", true, ""); // ${leaf_info}."f1" as "f1", ...
sql = schema.get_sql("SELECT "
"${instance}.\"%y%\" AS \"instance.actual\", "
"${instance_info}.weak_count AS \"instance_info.weak_count\", "
+ _names + " "
"FROM ${instance_info} "
"INNER JOIN ${instance} ON ${instance_info}.id = ${instance}.id "
"INNER JOIN ${treenode} ON ${instance_info}.ref_leaf_treenode_id = ${treenode}.id "
"INNER JOIN ${leaf_info} ON ${treenode}.ref_leaf_info_id = ${leaf_info}.id "
"WHERE ${instance_info}.ref_chunk_id=?");
stmt_pool.insert({ stmt::get_leaf_info_by_chunk_id,
stmt_info{get_stmt(sql),
schema.fields_merge({
type::fields{
{"instance.actual", type::field_type::INTEGER},
{"instance_info.weak_count", schema.field_type(type::table::instance_info, "weak_count")}
},
schema.fields_include(type::table::leaf_info, {})})}});

// get total_count_by_chunk_id.
sql = schema.get_sql("SELECT total_count FROM ${chunk} WHERE id=?");
stmt_pool.insert({ stmt::get_total_count_by_chunk_id,
stmt_info{get_stmt(sql),
schema.fields_include(type::table::leaf_info, {"total_count"})}});

// delete instance_by_chunk_id.
sql = schema.get_sql("DELETE FROM ${instance} "
"WHERE id IN("
"SELECT ${instance_info}.id "
"FROM ${instance_info} "
"INNER JOIN ${chunk} ON ${instance_info}.ref_chunk_id = ${chunk}.id "
"WHERE ${chunk}.id=?"
")");
stmt_pool.insert({ stmt::delete_instance_by_chunk_id,
stmt_info{get_stmt(sql),
{}}});

// delete instance_info_by_chunk_id.
sql = schema.get_sql("DELETE FROM ${instance_info} "
"WHERE id IN("
"SELECT ${instance_info}.id "
"FROM ${instance_info} "
"INNER JOIN ${chunk} ON ${instance_info}.ref_chunk_id = ${chunk}.id "
"WHERE ${chunk}.id=?"
")");
stmt_pool.insert({ stmt::delete_instance_info_by_chunk_id,
stmt_info{get_stmt(sql),
{}}});

// delete chunk_by_id.
sql = schema.get_sql("DELETE FROM ${chunk} WHERE id=?");
stmt_pool.insert({ stmt::delete_chunk_by_id,
stmt_info{get_stmt(sql),
{}}});

// get global_row_count.
sql = schema.get_sql("SELECT COUNT(*) FROM ${global}");
stmt_pool.insert({ stmt::get_global_row_count,
Expand Down Expand Up @@ -641,8 +720,23 @@ inline void sqlite_t::update_chunk(_in int64_t chunk_id, _in bool updated, _in i
if (not result.empty()) THROW_SUPUL_INTERNAL_ERROR0;
}

inline void sqlite_t::update_chunk_total_count(_in int64_t chunk_id, _in int64_t total_count) {
auto result = execute(stmt::update_chunk_total_count, {total_count, chunk_id}, true);
if (not result.empty()) THROW_SUPUL_INTERNAL_ERROR0;
}

inline void sqlite_t::get_chunk_list(_in callback_query cb) {
execute(stmt::get_chunk_list, {}, cb);
}

inline bool sqlite_t::get_chunk_updated(_in int64_t chunk_id) {
auto result = execute(stmt::get_chunk_updated, {chunk_id}, true);
if (common::get_variant_int64(result, "updated") == 1) return true;
return false;
}

inline void sqlite_t::update_leaf_info(_in int64_t leaf_info_id, _in int64_t increment_correct_count, _in int64_t increment_total_count) {
auto result = execute(stmt::update_leaf_info, {increment_correct_count, increment_total_count, increment_correct_count, increment_total_count, leaf_info_id}, true);
auto result = execute(stmt::update_leaf_info, {increment_correct_count, increment_total_count, increment_total_count, increment_correct_count, increment_total_count, leaf_info_id}, true);
if (not result.empty()) THROW_SUPUL_INTERNAL_ERROR0;
}

Expand Down Expand Up @@ -697,6 +791,30 @@ inline int64_t sqlite_t::get_generation_id_by_treenode_id(_in int64_t treenode_i
return common::get_variant_int64(result, "ref_generation_id");
}

inline void sqlite_t::get_leaf_info_by_chunk_id(_in int64_t chunk_id, _in callback_query cb) {
execute(stmt::get_leaf_info_by_chunk_id, {chunk_id}, cb);
}

inline int64_t sqlite_t::get_total_count_by_chunk_id(_in int64_t chunk_id) {
auto result = execute(stmt::get_total_count_by_chunk_id, {chunk_id}, true);
return common::get_variant_int64(result, "total_count");
}

inline void sqlite_t::delete_instance_by_chunk_id(_in int64_t chunk_id) {
auto result = execute(stmt::delete_instance_by_chunk_id, {chunk_id}, true);
if (not result.empty()) THROW_SUPUL_INTERNAL_ERROR0;
}

inline void sqlite_t::delete_instance_info_by_chunk_id(_in int64_t chunk_id) {
auto result = execute(stmt::delete_instance_info_by_chunk_id, {chunk_id}, true);
if (not result.empty()) THROW_SUPUL_INTERNAL_ERROR0;
}

inline void sqlite_t::delete_chunk_by_id(_in int64_t chunk_id) {
auto result = execute(stmt::delete_chunk_by_id, {chunk_id}, true);
if (not result.empty()) THROW_SUPUL_INTERNAL_ERROR0;
}

inline int64_t sqlite_t::get_global_row_count(void) {
auto result = execute(stmt::get_global_row_count, {}, true);
return common::get_variant_int64(result, "COUNT(*)");
Expand Down
8 changes: 8 additions & 0 deletions include/gaenari/supul/db/sqlite/sqlite.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,9 @@ class sqlite_t: public base {
virtual auto get_treenode(_in int64_t parent_treenode_id) -> std::vector<type::treenode_db>;
virtual void update_instance_info(_in int64_t instance_id, _in int64_t ref_leaf_treenode_id, _in bool correct);
virtual void update_chunk(_in int64_t chunk_id, _in bool updated, _in int64_t initial_correct_count, _in int64_t total_count, _in double initial_accuracy);
virtual void update_chunk_total_count(_in int64_t chunk_id, _in int64_t total_count);
virtual void get_chunk_list(_in callback_query cb);
virtual bool get_chunk_updated(_in int64_t chunk_id);
virtual void update_leaf_info(_in int64_t leaf_info_id, _in int64_t increment_correct_count, _in int64_t increment_total_count);
virtual auto get_weak_treenode(_in double leaf_node_accuracy_upperbound, _in int64_t leaf_node_total_count_lowerbound) -> std::vector<int64_t>;
virtual void update_leaf_info_by_go_to_generation_id(_in int64_t generation_id, _in double leaf_node_accuracy_upperbound, _in int64_t leaf_node_total_count_lowerbound);
Expand All @@ -55,6 +58,11 @@ class sqlite_t: public base {
virtual int64_t copy_rule(_in int64_t src_rule_id);
virtual void update_rule_value_integer(_in int64_t rule_id, _in int64_t value_integer);
virtual int64_t get_generation_id_by_treenode_id(_in int64_t treenode_id);
virtual void get_leaf_info_by_chunk_id(_in int64_t chunk_id, _in callback_query cb);
virtual int64_t get_total_count_by_chunk_id(_in int64_t chunk_id);
virtual void delete_instance_by_chunk_id(_in int64_t chunk_id);
virtual void delete_instance_info_by_chunk_id(_in int64_t chunk_id);
virtual void delete_chunk_by_id(_in int64_t chunk_id);
virtual int64_t get_global_row_count(void);
virtual void add_global_one_row(void);
virtual auto get_global(void) -> type::map_variant;
Expand Down
Loading

0 comments on commit 8e29ad7

Please sign in to comment.