Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Talk indexing into a Talk::Index record. #540

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

kaspth
Copy link
Contributor

@kaspth kaspth commented Dec 22, 2024

This does take some extra work since Active Record can't detect rowid is the primary key.

But with some awareness around that, it's pretty smooth sailing and we can entirely remove a lot of the custom SQL with standard Active Record.

end

after_create_commit :create_in_index
after_update_commit :update_in_index
after_destroy_commit :remove_from_index
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We get this for free now via dependent: :destroy!

@kaspth
Copy link
Contributor Author

kaspth commented Dec 22, 2024

I don't get what's going on in the tests.

In development the query looks correct and finds the talk:

rubyvideo(dev)> Talk.ft_search "Hotwire Cookbook"
  Talk Load (2.3ms)  SELECT "talks".* FROM "talks" INNER JOIN "talks_search_index" ON "talks_search_index"."rowid" = "talks"."id" WHERE (talks_search_index match 'Hotwire* Cookbook*') /* loading for pp */ LIMIT 11 /*application='Rubyvideo'*/

But in tests the query looks different and doesn't find the talk:

SELECT "talks".* FROM "talks" INNER JOIN "talks_search_index" ON "talks_search_index"."rowid" = "talks"."id" WHERE (talks_search_index match ?)  [[nil, "Hotwire* Cookbook*"]]

The [[nil, "Hotwire* Cookbook*"]] structure seems to indicate a bind parameter and potentially a prepared statement, but we don't seem to have that turned on/off anywhere. Maybe it's a Rails default somehow?

Swapping to where("#{table_name} match :query", query:) also doesn't work. Hm, I'm confused.

@kaspth
Copy link
Contributor Author

kaspth commented Dec 22, 2024

Ok, if I put prepared_statements: false in database.yml, I get the same query as in development:

SELECT "talks".* FROM "talks" INNER JOIN "talks_search_index" ON "talks_search_index"."rowid" = "talks"."id" WHERE (talks_search_index match 'Hotwire* Cookbook*')

But it still doesn't find the book.

@kaspth
Copy link
Contributor Author

kaspth commented Dec 23, 2024

Ah, something seems to be up when we try to insert the record, hm. It's missing the rowid here.

INSERT INTO "talks_search_index" ("title", "summary", "speaker_names") VALUES ('yo', '', 'Yauhen Karatkou') RETURNING "rowid" /*application='Rubyvideo'*/

@kaspth
Copy link
Contributor Author

kaspth commented Dec 23, 2024

Ok, wow, maybe this is more trouble than it's worth. We have to drop back into raw SQL to get Active Record to put in the rowid when inserting.

@adrienpoly
Copy link
Owner

I like how it all feels really ActiveRecord.

@kaspth
Copy link
Contributor Author

kaspth commented Jan 4, 2025

I'm hoping to get back to this, because I'd like to wrap it up.

This does take some extra work since Active Record can't detect `rowid` is the primary key.

But with some awareness around that, it's pretty smooth sailing and we can entirely remove a lot of the custom SQL with standard Active Record.
@kaspth kaspth force-pushed the talk-index-virtual-table-model branch from e702e0c to 62dd179 Compare January 19, 2025 23:45
end

def reindex
update! id: talk.id, title: talk.title, summary: talk.summary, speaker_names: talk.speaker_names
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're taking advantage of Active Record automatically generating an update or insert statement based on new_record?.

Internally, update! calls create_or_update which then calls new_record? ? _create_record(&block) : _update_record(&block).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also means we now take advantage of Active Record skipping needless update queries when there's no change to the attributes.

extend ActiveSupport::Concern

included do
self.ignored_columns = [table_name.to_sym, :rank] # Rails parses our virtual table with these extra non-attributes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A SQLite fts5 table like this is built on a rowid primary key and several columns to index. Active Record now understands these types of virtual tables but it doesn't recognize the rowid at all, not even as a column. (It's probably because the column isn't spelled out anywhere in the schema statements so Active Record has no column to parse there).

So this module encapsulates all the work we need to have Active Record recognize the rowid column as a column and as the primary key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, and then Active Record also parses the table name and something called rank as valid columns, so we need to ignore those since they're internal to the fts5 implementation.


private

def attributes_for_create(attribute_names)
Copy link
Contributor Author

@kaspth kaspth Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to override attributes_for_update, because the rowid is included in the query_constraints_hash method, that Active Record calls in _update_record or wherever it was. The details are a little fuzzy to me now because of all the code I had to sift through.

def select_snippet(column, offset, tag: "mark", omission: "…", limit: 32)
select("snippet(talks_search_index, #{offset}, '<#{tag}>', '</#{tag}>', '#{omission}', #{limit}) AS #{column}_snippet")
def reindex_all
Talk::Index.delete_all
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrienpoly reading this again I'm confused as to why we need to delete the existing ones beforehand? Do you remember?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no I don't remember. Maybe I did that initially to just make sure we get something clean knowing that our data is verylimieted so doing a full rebuild is not really costly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrienpoly ah got it. Yeah, I think with this change we could let Active Record decide and spare some needless queries in case there's no update needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Talk::Index.delete_all

@kaspth
Copy link
Contributor Author

kaspth commented Jan 20, 2025

@adrienpoly @marcoroth think this is ready now. The ActiveRecord::SQLite::Index module I added should make this easier to port to the Speaker's index table.

kaspth added a commit to kaspth/rubyvideo that referenced this pull request Jan 21, 2025
@adrienpoly
Copy link
Owner

Thanks I will try it on my db and play a bit with it

kaspth added a commit to kaspth/rubyvideo that referenced this pull request Jan 22, 2025
Copy link
Owner

@adrienpoly adrienpoly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaspth I tested it locally and it works great. I pushed a small extra test as I was playing with it.

Overall I like how we are getting closer to the ActiveRecord native API. It is clearly code that I wouldn't have been able to write. At first I was a bit concerned about maintenance in the future. but now that it is here it seems fairly easy to extend it to other models so I am less concerned about maintenance.

Looking at ActiveRecord::SQLite::Index code I do see a few hacky things with AR I hope this won't bite us down the road

@marcoroth WDYT ?

Thanks again Kasper for this

test/models/talk_test.rb Outdated Show resolved Hide resolved
@kaspth
Copy link
Contributor Author

kaspth commented Jan 23, 2025

@kaspth I tested it locally and it works great. I pushed a small extra test as I was playing with it.

Perfect!

Overall I like how we are getting closer to the ActiveRecord native API. It is clearly code that I wouldn't have been able to write. At first I was a bit concerned about maintenance in the future. but now that it is here it seems fairly easy to extend it to other models so I am less concerned about maintenance.

Yeah, I've definitely been worried about this too. I think overall, the savings we're getting from using more standard Active Record and the hoops we have to jump through to get that are less concerning than the raw SQL hoops we were doing before.

And since that hoop logic is condensed in ActiveRecord::SQLite::Index it can be shared across models. Or even upstreamed into Rails. I've been wondering if @fractaledmind has thoughts on that part.

I'm also planning on doing a write up about how this works for future reference. I'm quite proud of having figured this out really.

Looking at ActiveRecord::SQLite::Index code I do see a few hacky things with AR I hope this won't bite us down the road

For sure! I think attributes_for_create is the hackiest part and that seems pretty stable to me, although it is technically private API.

@marcoroth
Copy link
Collaborator

Looks good to me!

@fractaledmind
Copy link

I'll give this a thorough review tomorrow. Interested to dive in tho

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants