A Python implementation of Locality Sensitive Hashing.

Overview

pyLSHash

pyLSHash

PyPI Build Status codecov License Python Platform stars

A fast Python implementation of locality sensitive hashing.

I am using https://github.com/kayzhu/LSHash, but it stops to update since 2013.
So I maintain it myself.

Highlights

  • Fast hash calculation for large amount of high dimensional data through the use of numpy arrays.
  • Built-in support for persistency through Redis.
  • Multiple hash indexes support.
  • Built-in support for common distance/objective functions for ranking outputs.

Installation

pyLSHash depends on the following libraries:

  • numpy
  • redis (if persistency through Redis is needed)
  • bitarray (if hamming distance is used as distance function)

To install:

$ pip install pyLSHash

Quickstart

To create 6-bit hashes for input data of 8 dimensions:

from pyLSHash import LSHash

lsh = LSHash(6, 8)
lsh.index([1, 2, 3, 4, 5, 6, 7, 8])
lsh.index([2, 3, 4, 5, 6, 7, 8, 9])
lsh.index([10, 12, 99, 1, 5, 31, 2, 3])
lsh.query([1, 2, 3, 4, 5, 6, 7, 7])

[((1, 2, 3, 4, 5, 6, 7, 8), 1.0), ((2, 3, 4, 5, 6, 7, 8, 9), 11)]

Main Interface

  • To initialize a LSHash instance:
LSHash(hash_size, input_dim, num_of_hashtables=1, storage=None)

parameters:

  • hash_size: The length of the resulting binary hash.
  • input_dim: The dimension of the input vector.
  • num_hashtables = 1: (optional) The number of hash tables used for multiple lookups.
  • storage = None: (optional) Specify the name of the storage to be used for the index storage. Options include "redis".

To index a data point of a given LSHash instance, e.g., lsh:

lsh.index(input_point, extra_data=None)

parameters:

  • input_point: The input data point is an array or tuple of numbers of input_dim.
  • extra_data = None: (optional) Extra data to be added along with the input_point.

To query a data point against a given LSHash instance, e.g., lsh:

lsh.query(query_point, num_results=None, distance_func="euclidean")

parameters:

  • query_point: The query data point is an array or tuple of numbers of input_dim.
  • num_results = None: (optional) The number of query results to return in ranked order. By default all results will be returned.
  • distance_func = "euclidean": (optional) Distance function to use to rank the candidates. By default euclidean distance function will be used.
You might also like...

[ECCV2022] Motion Sensitive Contrastive Learning for Self-supervised Video Representation

[ECCV2022] Motion Sensitive Contrastive Learning for Self-supervised Video Representation

MSCL Official code for Motion Sensitive Contrastive Learning for Self-supervised Video Representation (ECCV2022). Introduction Contrastive learning ha

Sep 29, 2022

This script allows an attacker to search for sensitive files in a target's system

Credential Searcher Disclaimer This script is for educational purposes only, I don't endorse or promote it's illegal usage Table of Contents Overview

Aug 7, 2022

✨Create Differetially Private Synthetic Data from Multiple Sensitive Data Sources✨

Differentially Private Synthetic Data from Multiple Private Data Sources 🚀 What we'll cover in this tutorial: developing with oblivious (OBLV), opend

Aug 12, 2022

Find sensitive information using dorks from different search-engines.

Find sensitive information using dorks from different search-engines.

Find sensitive information using dorks from different search-engines. Agnee uses search_engines to find sensitive information about given domain using

Nov 27, 2022

This script detects the technologies used in the target url and outputs sensitive files for these technologies.

SensFind - Sensitive Web Path Finder v1.0 Detects Web products used at the given URL. Searches sensitive files according to the detected product. Prin

Sep 20, 2022

An experimental implementation of bitsliced aes in pure python. Quite possibly the fastest pure-python AES implementation on the planet.

Python Bitsliced AES An experimental implementation of bitsliced AES-128-ECB in pure python. Quite possibly the fastest pure-python AES implementation

Jul 14, 2022

A Python event implementation similar to the C# implementation

pyEventHook A Python event implementation similar to the C# implementation. Examples Create an event and suscribe handlers. def wake_up(name): pri

May 19, 2022

A pure Python implementation of encryption of AES.

AES-128 A pure Python implementation of encryption of AES-128 written with the goal to resemble this paper as closely as possible; although it is comp

Jun 27, 2022

Implementation of the Diffie-Hellman algorithm in Python

Implementation of the Diffie-Hellman algorithm in Python

Diffie-Hellman-Algorithm Implementacion del algortimo Diffie-Hellman en Python La criptografía de curva elíptica (ECC) existe desde mediados de la déc

Apr 14, 2022
Owner
郭飞
郭飞
[ECCV 2022] Locality Guidance for Improving Vision Transformers on Tiny Datasets

Locality Guidance for Improving Vision Transformers on Tiny Datasets (ECCV 2022) [arXiv paper] [ECCV paper] Description This is a PyTorch implementati

Kehan Li 40 Nov 21, 2022
Qgis plugin for french-locality-name-shortener

French Locality Name Shortener - QGIS Plugin Portage QGIS du programme french-locality-name-shortener de Benjamin et Julie CHARTIER permettant de crée

JB Desbas 2 Oct 19, 2022
This project utilizes AWS databases, SQL and Python to store user passwords encrypted with SHA-256 hashing

AWS-Encrypted-Password-Manager This project utilizes Amazon Web Services databases, SQL and Python to store user passwords encrypted with SHA-256 hash

null 1 Aug 2, 2022
This is a python script that you can add to your projects which adds a simple username - password login and registration system secured by bcrypt hashing.

Simple Authentication This is a python script that you can add to your projects which adds a simple username - password login and registration system

Kaiser 1 Oct 6, 2022
Official implementation of the ICML 2022 paper "Going Deeper into Permutation-Sensitive Graph Neural Networks"

Permutation Group Based Graph Neural Networks (PG-GNN) The official implementation of Going Deeper into Permutation-Sensitive Graph Neural Networks (I

Zhongyu Huang 18 Nov 25, 2022
Python CLI tool to redact sensitive data. 🔐📝

PyRedactKit ?? ?? CLI tool to redact sensitive information like ip address, email and dns. Features Redacts the following from your text files. ?? ✍️

Oaker Min 26 Oct 14, 2022
Pull sensitive data from users on windows including discord tokens and chrome data.

⭐ For a ?? Pegasus Pull sensitive data from users on windows including discord tokens and chrome data. Features ?? Discord tokens ?? Geolocation data

Addi 38 Nov 19, 2022
TATS: A Long Video Generation Framework with Time-Agnostic VQGAN and Time-Sensitive Transformer

Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer Project Website | Video | Paper tl;dr We propose TATS, a long video gene

null 93 Nov 21, 2022
[TMC] Delay-Sensitive Energy-Efficient UAV Crowdsensing by Deep Reinforcement Learning

DRL-eFresh Additional materials for paper "Delay-Sensitive Energy-Efficient UAV Crowdsensing by Deep Reinforcement Learning" accepted in TMC. ?? Descr

null 5 Oct 22, 2022
after macie scan for sensitive information, using lambda to automatically tag S3 object based on customized label

Macie-auto-tag after macie scan for sensitive information, using lambda to automatically tag S3 object based on customized label 可以在lambda环境变量中自定数据的保密

jwyc 1 Sep 27, 2022