HOSVD decomposition, high dimension factorization with a central tensor
Hard to make explaination about the central tensor.
Assign 0 of the learning rate for unobserved entries
Generialize to n-d matrix
Same latent matrix for the same entity for different relationship
Common latent space with locality added
Easy to follow optimization process
An extension of 2.
Standard tensor factorization ==> 3 latent matrices
Use stacked denoising autoencoder to get the latent matrix for user and item
Claim to handle sparse problem in tensor factorization by using autoencoder to reduce dimension
- CNN tends to bias toward neighboring interactions
- RNN tends to bias toward sequential dependency
- Low order & high order feature interactions should both be incorporated
Build a Deep Neural Net to implenment FM structure, embedding layers captures the factorization part
End to End network without pretrain for latent vector Prone to overfitting, use dropout to prevent it
AUGRU attentional update gate GRU for interest evolution
Create "attention weight" based on last behavior and current behavior and evolve hidden state in GRU
Able to handle user historical behavior, Item list, user profile, context at the same time
BERT for recommendation system, very similar to BERT in language modeling
Claim to be state of the art recommendation algorithm
Trained positional embedding
Deep learning for Sequential Recommendation. Capable of recommending items in different categories.
2 step recommendation: candidate generation + ranking
Recommendation as Classification: Learn user embedding in classification
- perform average on embeddings
- feed the timestamp of video(product) as a feature to add the time sensitivity to the model
- control the influence of dominating users by restrict sample size from certain users
- Aware of the user consumption sequence
New way to incorperate new data. Use NN (CNN) to model the update from W_{t-1} to W_{t}
However, the transfer model itself is hard to design and optimize. It also does not account for second order time dependency.
https://arxiv.org/ftp/arxiv/papers/2005/2005.14026.pdf
Used normalized Discounted Cumulative Gain (𝑛𝐷𝐶𝐺) as evaluation matrix for ranking
Pairwise loss designed for multiple items recommended to the same user
Recommendation is evaluated in terms of the item ranking for a user in the case that each user needs a recommendation
Closed form solution
Idea in cold start testing: random select half user(item) as new user(item) to test model performance in cold start
Fine designed reward function for the recommendation, Generative Adversarial Training to mimic rewarding function
User history as state, weighted/truncated M-step history can also be used
Cascading Q-learning (introduced by this paper), small modification in Q-learning
Neuralnet structure for Q value
Adding a GAN to estimate reward offers more flexibility to the reward function of a user
- current GNN cannot deal with users' history & interest
- current neightbour sample may mis-sample irrelevant information while leave out important nodes
- GNN based methods do not take mutual influence between target user behaviors and item into consideration in the procedure of information aggregation
Graph Connect: Find entity connections that user's historical behavior & target have in common
Graph Prune: Prune the entities which do not connect different items Attention Layer: Weigh each item differently based on how related the item is to the target item
Translation-based model, which exploits the implicit preference
Joint modeling item recommendation and KG completion
Use TransH for KG Completion, find embedding such that eh + r ≈ et in the projected space Model global preference vectors
Item in the basket recommendation
Use different aggregators for each type of entity -- another way is to introduce knowledge graph
Add a d-dim embedding to each entity
Use Conv Graph NN to train the embedding of user, item, basket then use the embedding to produce a score for each item in the targeted user basket.
Embedding aggregation for l layers can be view as l steps CNN, more steps capture more global informtion. In the end, all l layers output are concatnated and used to predict the target.
Back and forth aggregation
L-layer aggregation to get massages from neighbor nodes
Make predictions based on learned user & item embedding
Personally, this looks more like pure GNN then FM to me, I did not see any explicit FM component in it
Cross domain looks like multi-task learning to me, only with some shared weights. Nothing new or interesting
- Negative sampling in each epoch, add the variance into the training data, which reduces model's overfitting.
- KGAT: Knowledge Graph Attention Network for Recommendation (A good paper for graph Recommendation)
- Adding knowledge graph, additional high order relations increase the data & model size and sparsity dramatically
- Relationships are contributing unequally to the task
Use TransR in embedding: W1eh+er=W2et to embed triplets. And use pairwise ranking loss to optimize
Neighbor information propagation (aggregation)
Applied attention mechanism in propagation
Embedding representation product as prediction score
Alternatively optimize two loss functions
- Social relationship is considered in item recommendation
- entity relationships have various strength, the method considers heterogeneous strengths of social relations mathematically
Item Aggregation & User Aggregation seperately
Opinion Embedding -> Embed scores/ratings (as a relation), concat relation embedding and item embedding and perform MLP on the concated vector
Use attention in aggregation: attention is calculated by concatenating user embedding and processed item embedding (with option embedding aggregated) and performing a MLP on the concated vector
Use aggregated user vector in social relation aggregation (also called latent factor), and perform the same attention mechanism
Combine social agg and user agg together and iteratively update the latent vector for user Use the concatenated vector of user & item latent vector to predict the connection between user and item
- Random initialize embedding -- no one hot encoding
- Use RMSprop rather than vanilla SGD
Recommendation system for news focusing on 4 matrices in result evaluation: diversity, coverage, serendipity, and dynamism
Diversity: Inverse of the simmlarity of recommended item. Based on similarity of features
Coverage: The proportion of recommended items over all items
Serendipity: Inverse of simmlarity of recommended item v.s. user history
Dynamism: The recency of timestamp for each recommended items
Item simmlarity is useful in the evaluation. Actually most of such matrices can be added as a loss function in the recommendation algorithm, which is not fully discussed in the paper