Bayesian negative sampling is the theoretically optimal negative sampling algorithm that runs in linear time.

Related tags

Admin Panels BNS
Overview

Bayesian Negative Sampling

Required Packages

  • numpy : Implement BNS for Matrix Factorization (MF) (run main_MF.py );
  • pytorch: Implement BNS for light Graph Convolution Network (LightGCN) (run main_lightGCN.py ).

About BNS Framework:

We summarize the Bayesian negative sampling algorithm as: Randomly select a small set of candidates $\mathcal{M}_u$ from the negative instances set, for each negative instances $l \in \mathcal{M}_u$ do:

(i) Computing prior probability

By assuming $pop_l \sim B (N, P_{fn}(l))$, we compute prior probability as: $$P_{fn}(l) = \frac{pop_l}{N}$$ This step needs $\mathcal{O}(1)$.

(ii) Computing cumulative distribution function (likelihood)

$$ F_{n}(\hat{x} _ l) = \frac{1}{n} \sum I_ {|x_ \cdot \leq \hat{x}_l |} $$

This step needs $\mathcal{O}(|\mathcal{I}|)$.

The empirical distribution function $F_n (\cdot)$ converges to common cumulative distribution function $F(\cdot)$ almost surely by the strong law of large numbers. Glivenko Theorem (1933) strengthened this result by proving uniform convergence of $F_n(\cdot)$ to $F(\cdot)$. This property makes it possible for us to compute the $F(\cdot)$, even if we do not know its explicit expression. Given the observation $\hat{x}$, $F(\hat{x})$
assigns a probabilistic prediction valued in $[0,1]$ of $l$ being false negative (positive).

(iii) Computing negative signal (posterior probability)

$$ \mathtt{unbias}(l) = \frac{[1 - F(\hat{x} _ l)] [1-P _ {fn} (l)] }{1 - F(\hat{x}_ l) -P_{fn}(l) + 2F(\hat{x}_ l)P_{fn}(l)} $$

Note that $\mathtt{unbias}(l)$ is an unbiased estimator for $l$ being true negative (see Lemma 3.1). This step needs $\mathcal{O}(1)$.

(iv) Performing Bayesian negative sampling

$$ j = \mathop{\arg\min}\limits_{l \in\mathcal{M}_u}~ \mathtt{info}(l)\cdot [1-(1+\lambda)\mathtt{unbias}(l)]$$

If the sampler $h^*$ minimizes the conditional sampling risk $R(l|i)$, then the empirical sampling risk will be minimized (see Theorem 3.1). Thus the Bayesian optimal sampling rule is essentially sampling the instance with minimal conditional sampling risk. This step needs $\mathcal{O}(1)$.

More details about BNS(Bin Liu & Bang Wang, 2022) see our paper or poster at https://doi.org/10.48550/arXiv.2204.06520 .

Distribution Analysis

The key is the class conditional density. On the basis of order relation analysis of negatives' scores, we derive the class conditional density of true negatives and that of false negatives, and provide an affirmative answer from a Bayesian viewpoint to distinguish true negatives from false negatives.

Real distribution

Theoretical distribution

We exhibits the distribution morphology of false negatives and true negatives derived from the ordinal relation with different types of $f(x)$: Gaussian distribution $x \sim \mathcal{N}(\mu,\sigma)$ (symmetrical), student distribution $x \sim t(n)$ (symmetrical), and Gamma distribution $x\sim Ga(\alpha,\lambda)$ (asymmetrical) . As we can see, during the training process, the real distribution of true/false negatives gradually exhibit the same structure as theoretical distribution. Although the probability density function $f(x)$ changes during the training process, this separated structure is sufficient for us to classify the true negative instances from false negative instances.

Dataset Description:

MovieLens100K; MovieLens1M; Yahoo!-R3.

Our Bayesian Negative Sampling algothrim does not depend on the datasets. Just split the dataset you are interested in into the training set and test set, replace the train.csv and test.csv files, you can run Bayesian negative sampling easily. You just need to make sure that you have chosen appropriate prior information for modeling prior probability $P_{tn}(\cdot)$ or $P_{fn}(\cdot)$.

You might also like...

Play a shooting gallery game! Destroy enough targets before time runs out! (Assets are not mine)

Welcome to Zachary's Gallery Shoot 'Em Up! You control a white reticle on the sreen with your mouse. Press the left mouse buttom to shoot at the scr

Aug 5, 2022

Love sms is a lambda that runs periodically 1 time per day, which sends a random sms to your platonic love 💑.

Love-sms Love sms is a lambda that runs periodically 1 time per day, which sends a random sms to your platonic love 💑 . Steps Run pip install chalice

Oct 10, 2022

Omark is implements a linear search algorithm on a facial recognition model, its captures a classroom and determines which student is in class and which student isn't

Omark Omark is the new form of taking attendance in class, its captures a classroom and determines which student is in class and which student isn't O

Nov 15, 2022

Testing the accuracy of predicted stock prices using the linear regression algorithm.

stockPredictions Testing the accuracy of predicted stock prices using the linear regression algorithm. This project tests a range of stocks using spec

May 19, 2022

SpartaPlex is a black-box global optimization algorithm that requires few function evaluations and offers linear scalability for high-dimensional problems

SpartaPlex SpartaPlex is a deterministic algorithm with linear scalability for massively parallel global optimization of very large-scale problems. Se

Sep 8, 2022

A Python implementation of the Bayesian Optimization (BO) algorithm

A Python implementation of the Bayesian Optimization (BO) algorithm

A Python implementation of the Bayesian Optimization (BO) algorithm working on decision spaces composed of either real, integer, catergorical variables, or a mixture thereof.

Nov 23, 2022

Contribution to an open source repository which implements the Bayesian Optimization algorithm - Knowledge Gradient implementation

Contribution to an open source repository which implements the Bayesian Optimization algorithm - Knowledge Gradient implementation

Authors: Raffaella D'Anna Michele Di Sabato Martina Garavaglia Anna Iob Veronica Mazzola Open source project: original repository: https://github.com/

Nov 27, 2022

A collection of the pytorch implementation of neural bandit algorithm includes neuralUCB(Neural Contextual Bandits with UCB-based Exploration) and neuralTS(Neural Thompson sampling)

A collection of the pytorch implementation of neural bandit algorithm includes neuralUCB(Neural Contextual Bandits with UCB-based Exploration) and neuralTS(Neural Thompson sampling)

Neural-Bandit Algorithms (NeuralUCB and NeuralTS) A collection of the pytorch implementation of neural bandit algorithm includes neuralUCB Neural Cont

Jun 30, 2022

Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"

Implementation of the Transformer variant proposed in

FLASH - Pytorch Implementation of the Transformer variant proposed in the paper Transformer Quality in Linear Time Install $ pip install FLASH-pytorch

Nov 23, 2022
Owner
HUST MinsLab
HUST MinsLab
GUI toolkit for the Python Arcade library. This can be theoretically used for Pyglet.

arcade-gui GUI interface and widgets. Documentation is found throughout the files, though in some areas it may be obsolete or incomplete. More than me

Ethan Chan 1 Oct 1, 2022
This app pulls price data for any stock and runs a linear regression to assess buying/selling opportunities.

Stock Linear Regression Application This is an interactive web application hosted on Streamlit that performs statistical analysis on historical stock

peter 1 Jun 1, 2022
HerosNet: Hyperspectral Explicable Reconstruction and Optimal Sampling Deep Network for Snapshot Compressive Imaging (CVPR2022)

HerosNet: Hyperspectral Explicable Reconstruction and Optimal Sampling Deep Network for Snapshot Compressive Imaging (CVPR 2022) Xuanyu Zhang, Yongbin

Jian Zhang 14 Sep 19, 2022
Predict prices of houses to sell using Linear Regression with Multiple variables or Multivariate Linear Regression.

sell-house-price Predict prices of houses to sell using Linear Regression with Multiple variables or Multivariate Linear Regression. Dataset The file

Kenji 1 Feb 23, 2022
A high-dimensional property predictor framed as a pseudo-materials discovery benchmark with fake compositional (linear) and "no-more-than-X-components" (non-linear) constraints.

optimization-benchmark (WIP) A high-dimensional property predictor framed as a pseudo-materials discovery benchmark with fake compositional (linear) a

Sparks/Baird Materials Informatics 3 Aug 3, 2022
Bayesian simple polynomial and linear regression and model estimation via expected log pointwise predictive density (elpd) of widely applicable information criterion (WAIC)

Bayesian-linear-and-plynomial-regression-and-waic-calculation Bayesian simple polynomial and linear regression and model estimation via expected log p

null 1 Sep 13, 2022
Jax implementation for the paper "Sampling-based inference for large linear models, with application to linearised Laplace"

sampled-laplace This repository includes Jax code and experiments for the paper Sampling-based inference for large linear models, with application to

null 3 Oct 12, 2022
algox: python package for assisting users in selecting the most optimal algorithm

algox algox is a python package for assisting users in selecting the most optimal algorithm provided the dataset used must be completely preprocessed.

Avinash M 8 Nov 18, 2022
As alternative heuristic techniques; genetic algorithm, simulated annealing algorithm and city swap algorithm are implemented in Python for Travelling Salesman Problem

As alternative heuristic techniques; genetic algorithm, simulated annealing algorithm and city swap algorithm are implemented in Python for Travelling Salesman Problem. Details on implementation and test results can be found in this repository.

Selçuk Eşkil 4 Nov 7, 2022
Write a function, which takes a non-negative integer (seconds) as input and returns the time in a human-readable format (HH:MM:SS)

Codewars-5kyu-HumanReadableTime_python Write a function, which takes a non-negative integer (seconds) as input and returns the time in a human-readabl

Akaki 1 Apr 5, 2022