💫 SpaCy wrapper for ConceptNet 💫

Overview

concepCy

PyPI version github actions docs demo status

concepCy is a spaCy wrapper for ConceptNet, a freely-available semantic network designed to help computers understand the meaning of words.

concepCy allows you to query ConceptNet.io to extract word meanings directly from the resource itself.

Install

You can install concepCy via pip:

pip install concepcy

Alternatively you can directly clone the repository and install it using poetry by running the following:

git clone https://github.com/JulesBelveze/concepcy.git
cd concepcy
poetry install

Getting Started

To get started you need to install of one the pre-trained spaCy model available here.

In ConceptNet words are represented as Node and relations between words as Edge.
The Node object contains the following attributes:

  • id: where you can look up all the information about that word
  • label: which may be a more complete phrase such as "an example" instead of just the word "example" that appears in the URI.
  • language: code for what language the label is in
  • term: a link to the most general version of this term. In many cases this is just the same URI.

The Edge object features the following attributes:

  • start: starting Node
  • end: ending Node
  • relation: name of the relation for those two nodes
  • text: some of ConceptNet's data is extracted from text, text shows you what this text was
  • weight: how believable the information is

Simple start

In this case we will simply be interested in the RelatedTo relations between words.

import spacy
import concepcy

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("concepcy")

doc = nlp("WHO is a lovely company")

# Access all the "RelatedTo" relations from the Doc
print("--- All the 'RelatedTo' relations from the Doc ---")
for word, relations in doc._.relatedto.items():
    print(f"Word: '{word}'\n{relations}")

# Access the "RelatedTo" relations word by word
print("--- The 'RelatedTo' relations word by word ---")
for token in doc:
    print(f"Word: '{token}'\n{token._.relatedto}\n")
--- All the 'RelatedTo' relations from the Doc ---
Word: 'company'
[{'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/business', 'type': 'Node', 'label': 'business', 'language': 'en', 'term': '/c/en/business'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[business]]', 'weight': 6.424017434596516}, {'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/corporation', 'type': 'Node', 'label': 'corporation', 'language': 'en', 'term': '/c/en/corporation'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[corporation]]', 'weight': 4.432155231938521}, {'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/organization', 'type': 'Node', 'label': 'organization', 'language': 'en', 'term': '/c/en/organization'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[organization]]', 'weight': 4.259107887809371}]

--- The 'RelatedTo' relations word by word ---
Word: 'WHO'
[]

Word: 'is'
[]

Word: 'a'
[]

Word: 'lovely'
[]

Word: 'company'
[{'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/business', 'type': 'Node', 'label': 'business', 'language': 'en', 'term': '/c/en/business'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[business]]', 'weight': 6.424017434596516}, {'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/corporation', 'type': 'Node', 'label': 'corporation', 'language': 'en', 'term': '/c/en/corporation'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[corporation]]', 'weight': 4.432155231938521}, {'start': {'id': '/c/en/company', 'type': 'Node', 'label': 'company', 'language': 'en', 'term': '/c/en/company'}, 'end': {'id': '/c/en/organization', 'type': 'Node', 'label': 'organization', 'language': 'en', 'term': '/c/en/organization'}, 'relation': 'RelatedTo', 'text': '[[company]] is related to [[organization]]', 'weight': 4.259107887809371}]

Custom configuration

One can customize the concepcy wrapper by changing the default value of the config. The two parameters of interest are:

  • relations_of_interest: List[str]: ConceptNet currently support 34 word-relations. Some of them might not be needed for your use case. To only keep the ones needed pass a list of all the relations you want to keep (see all relations available here). Each relation then becomes an extension.
  • filter_edge_fct: Callable[Edge]: Conceptnet is a crowd-sourced resource, meaning that some information might be more relevant than others. To only keep reliable relations you can pass a function that will take an Edge as input and will return a boolean indicating whether to filter that edge or not.
import spacy
import concepcy

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe(
    "concepcy",
    config={
        "relations_of_interest": ["MotivatedByGoal", "CapableOf"],
        "filter_edge_weight": 3.0,
        "filter_missing_text": True,
        "as_dict": False
    }
)

Documentation 📚

📄 The whole documentation along with design decisions and examples can be found here

🎮 A simple demo on how to use concepCy can be found here

References

Comments
  • Using another language (Dutch)

    Using another language (Dutch)

    Hello,

    I run into empty results when I try to use Concepcy with Dutch. Should it be done differently? This code does work if I switch to an English sentence and use en_core_web_sm. SpaCy version: 3.4.1. Python version: 3.8.0. Thank you.

    Code:

    import spacy
    import concepcy
    
    nlp = spacy.load('nl_core_news_sm')
    # Using default concepCy configuration
    nlp.add_pipe('concepcy')
    print(concepcy)
    doc = nlp('Terwijl drugshandelaars de straten van Antwerpen onveilig maken, vindt hun handelswaar vlotjes de weg naar Vlamingen uit alle lagen van de bevolking.')
    
    # Access all the 'RelatedTo' relations from the Doc
    for word, relations in doc._.relatedto.items():
        print(f'Word: {word} {relations}')
    
    # Access the 'RelatedTo' relations word by word
    for token in doc:
        print(f'Word: {token} {token._.relatedto}')
    

    Output:

    Word: Terwijl []
    Word: drugshandelaars []
    Word: de []
    Word: straten []
    Word: van []
    Word: Antwerpen []
    Word: onveilig []
    Word: maken []
    Word: , []
    Word: vindt []
    Word: hun []
    Word: handelswaar []
    Word: vlotjes []
    Word: de []
    Word: weg []
    Word: naar []
    Word: Vlamingen []
    Word: uit []
    Word: alle []
    Word: lagen []
    Word: van []
    Word: de []
    Word: bevolking []
    Word: . []
    
    opened by edloginova 6
  • Error trying to add ConcepCy to spacy pipeline

    Error trying to add ConcepCy to spacy pipeline

    Code:

    import spacy
    import concepcy
    
    nlp = spacy.load('en_core_web_sm')
    # Using default concepCy configuration
    nlp.add_pipe('concepcy')
    

    Error:

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    Input In [12], in <cell line: 6>()
          4 nlp = spacy.load('en_core_web_sm')
          5 # Using default concepCy configuration
    ----> 6 nlp.add_pipe('concepcy')
    
    File ~\anaconda3\envs\NewsAggregator\lib\site-packages\spacy\language.py:795, in Language.add_pipe(self, factory_name, name, before, after, first, last, source, config, raw_config, validate)
        787     if not self.has_factory(factory_name):
        788         err = Errors.E002.format(
        789             name=factory_name,
        790             opts=", ".join(self.factory_names),
       (...)
        793             lang_code=self.lang,
        794         )
    --> 795     pipe_component = self.create_pipe(
        796         factory_name,
        797         name=name,
        798         config=config,
        799         raw_config=raw_config,
        800         validate=validate,
        801     )
        802 pipe_index = self._get_pipe_index(before, after, first, last)
        803 self._pipe_meta[name] = self.get_factory_meta(factory_name)
    
    File ~\anaconda3\envs\NewsAggregator\lib\site-packages\spacy\language.py:674, in Language.create_pipe(self, factory_name, name, config, raw_config, validate)
        671 cfg = {factory_name: config}
        672 # We're calling the internal _fill here to avoid constructing the
        673 # registered functions twice
    --> 674 resolved = registry.resolve(cfg, validate=validate)
        675 filled = registry.fill({"cfg": cfg[factory_name]}, validate=validate)["cfg"]
        676 filled = Config(filled)
    
    File ~\anaconda3\envs\NewsAggregator\lib\site-packages\thinc\config.py:747, in registry.resolve(cls, config, schema, overrides, validate)
        738 @classmethod
        739 def resolve(
        740     cls,
       (...)
        745     validate: bool = True,
        746 ) -> Dict[str, Any]:
    --> 747     resolved, _ = cls._make(
        748         config, schema=schema, overrides=overrides, validate=validate, resolve=True
        749     )
        750     return resolved
    
    File ~\anaconda3\envs\NewsAggregator\lib\site-packages\thinc\config.py:796, in registry._make(cls, config, schema, overrides, resolve, validate)
        794 if not is_interpolated:
        795     config = Config(orig_config).interpolate()
    --> 796 filled, _, resolved = cls._fill(
        797     config, schema, validate=validate, overrides=overrides, resolve=resolve
        798 )
        799 filled = Config(filled, section_order=section_order)
        800 # Check that overrides didn't include invalid properties not in config
    
    File ~\anaconda3\envs\NewsAggregator\lib\site-packages\thinc\config.py:868, in registry._fill(cls, config, schema, validate, resolve, parent, overrides)
        865     getter = cls.get(reg_name, func_name)
        866     # We don't want to try/except this and raise our own error
        867     # here, because we want the traceback if the function fails.
    --> 868     getter_result = getter(*args, **kwargs)
        869 else:
        870     # We're not resolving and calling the function, so replace
        871     # the getter_result with a Promise class
        872     getter_result = Promise(
        873         registry=reg_name, name=func_name, args=args, kwargs=kwargs
        874     )
    
    File ~\anaconda3\envs\NewsAggregator\lib\site-packages\concepcy\__init__.py:70, in ConcepCyComponent.__init__(self, nlp, name, url, relations_of_interest, as_dict, filter_edge_weight, filter_missing_text)
         67 self.parser = ConceptnetParser(relations_of_interest, as_dict, filter_edge_fct)
         69 for relation in relations_of_interest:
    ---> 70     Doc.set_extension(relation.lower(), default=defaultdict(list))
         71     Token.set_extension(relation.lower(), default=[])
    
    File ~\anaconda3\envs\NewsAggregator\lib\site-packages\spacy\tokens\doc.pyx:141, in spacy.tokens.doc.Doc.set_extension()
    
    ValueError: [E090] Extension 'relatedto' already exists on Doc. To overwrite the existing extension, set `force=True` on `Doc.set_extension`.```
    opened by SulavKhadka 4
  • Readme example breaks.

    Readme example breaks.

    I tried running the example on the readme file locally.

    import spacy
    import concepcy
    
    nlp = spacy.load("en_core_web_sm")
    nlp.add_pipe("concepcy")
    
    doc = nlp("WHO is a lovely company")
    
    # Access all the "RelatedTo" relations from the Doc
    print("--- All the 'RelatedTo' relations from the Doc ---")
    for word, relations in doc._.relatedto.keys():
        print(f"Word: '{word}'\n{relations}")
    
    # Access the "RelatedTo" relations word by word
    print("--- The 'RelatedTo' relations word by word ---")
    for token in doc:
        print(f"Word: '{token}'\n{token._.relatedto}\n")
    

    This led to this error:

    /home/vincent/Development/prodigy-demos/venv/lib/python3.8/site-packages/spacy/util.py:865: UserWarning: [W095] Model 'en_core_web_sm' (3.3.0) was trained with spaCy v3.3 and may not be 100% compatible with the current version (3.4.1). If you see errors or degraded performance, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
      warnings.warn(warn_msg)
    --- All the 'RelatedTo' relations from the Doc ---
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    Input In [1], in <cell line: 11>()
          9 # Access all the "RelatedTo" relations from the Doc
         10 print("--- All the 'RelatedTo' relations from the Doc ---")
    ---> 11 for word, relations in doc._.relatedto.keys():
         12     print(f"Word: '{word}'\n{relations}")
         14 # Access the "RelatedTo" relations word by word
    
    ValueError: too many values to unpack (expected 2)
    

    I think this would be a nice test case.

    opened by koaning 3
  • fix: force set extension

    fix: force set extension

    This PR aims at simply adding the force=True parameter when setting extension for both Doc and Token objects. That way one will not run into an issue when adding the same extension multiple times.

    opened by JulesBelveze 1
  • Reduce latency

    Reduce latency

    The way the component is currently implemented forces to make an HTTP request for every eligible token. It causes a lot of overhead..

    As an example, processing the following sentence takes ~12sec:

    """The Jan. 6 select committee’s latest public hearing went inside the White House to detail then-President Donald Trump’s hourslong refusal to call for an end to the Capitol riot. The hearing marked the final scheduled presentation of the committee’s initial findings from its investigation of the Jan. 6, 2021, insurrection until September."""
    
    opened by JulesBelveze 1
  • Error when passing custom `filter_edge_fct`

    Error when passing custom `filter_edge_fct`

    The object passed to the config parameter to the .add_pipe method needs to be serializable.

    Example of error:

    nlp.add_pipe(
         "concepcy",
         config={
             "relations_of_interest": ["Causes"],
             "filter_edge_fct": lambda x: x.weight < 0.0
         }
    )
    
    ValueError: [E961] Found non-serializable Python object in config. Configs should only include values that can be serialized to JSON. If you need to pass models or other objects to your component, use a reference to a registered function or initialize the object in your component.
    
    {'relations_of_interest': ['Causes'], 'filter_edge_fct': <function <lambda> at 0x107922ca0>}
    
    bug 
    opened by JulesBelveze 0
  • chore: set one extension per relation

    chore: set one extension per relation

    This PR aims at setting one extension per relation type instead of storing everything under concepts. It makes it easier to access and retrieve wanted relations.

    opened by JulesBelveze 0
Releases(0.1.0)
Owner
Jules Belveze
AI craftsman | NLP | MLOps
Jules Belveze
A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata

spaCy fishing A spaCy wrapper of Entity-Fishing, a tool for named entity disambiguation and linking on Wikidata. This extension allows using Entity-Fi

Lucas Terriel 123 Dec 29, 2022
This repository contains an easy and intuitive approach to few-shot NER using most similar expansion over spaCy embeddings. Now with entity scoring.

Concise Concepts When wanting to apply NER to concise concepts, it is really easy to come up with examples, but pretty difficult to train an entire pi

Pandora Intelligence 189 Dec 29, 2022
EDS-NLP provides a set of spaCy components to extract information from clinical notes written in French.

EDS-NLP EDS-NLP provides a set of spaCy components that are used to extract information from clinical notes written in French. Check out the interacti

Greater Paris University Hospitals (AP-HP) 67 Dec 16, 2022
Asent is a rule-based sentiment analysis library for Python made using SpaCy.

Asent: Fast, flexible and transparent sentiment analysis Asent is a rule-based sentiment analysis library for Python made using SpaCy. It is inspired

Kenneth Enevoldsen 85 Dec 29, 2022
API to save / load Spacy models.

SaveSpacy API to save / load Spacy models. Spacy models are composed of two files. The binary model and a configuration file. While saving / loading m

null 1 Aug 30, 2022
A Python wrapper for the R5 routing analysis engine. Inspired by r5r, a wrapper for R.

r5py: Rapid Realistic Routing with R5 in Python R5py is a Python wrapper for the R5 routing analysis engine. It’s inspired by r5r, a wrapper for R, an

null 49 Jan 4, 2023
OpenAI Gym style Wrapper for Multi-agent environment which made by Unity ML-Agents

UnityGymWrapper (Unity ML-Agents to OpenAI Gym Style) OpenAI Gym style Wrapper for Multi-agent environment which made by Unity ML-Agents. Unity ML-Age

Hoeun Lee 1 Mar 21, 2022
Pearl is a custom Discord bot solution developed within discord.py, an API wrapper written in Python.

About Pearl is a custom Discord bot solution developed within discord.py, an API wrapper written in Python. Inspired from living with a certain feline

Paul Ranshaw 0 Mar 31, 2022
A useful roblox API wrapper, that can register accounts, "login" into accounts, check robux, and more upcoming

Examples Check ?? It use's the balance method to check or authorize account validity # bot = RoPy() # bot YourCookie = "" YourCookie if RoPy().Logi

null 3 Oct 14, 2022
A simple comprehensive Python Wrapper for LivePeerAPI.

LivePeerSDK Make LivePeerAPICalls from Python License : MIT Copyright (c) 2022 RAGAVENDIRAN BALASUBRAMANIAN. GMAIL : [email protected] LINKEDIN

Ragavendiran 4 Mar 27, 2022
A Python wrapper for the Hybrid Genetic Search algorithm for Capacitated Vehicle Routing Problems (HGS-CVRP)

PyHygese This package is under active development. It can introduce breaking changes anytime. Please use it at your own risk. A solver for the Capacit

Changhyun Kwon 37 Dec 17, 2022
An API wrapper for monitor.betterbot.ru

BetterMonitoring Installation Enter one of these commands to install the library: pip install bettermonitoring Examples You can find other examples i

Roman 7 Jul 22, 2022
An wrapper around depthai Python package focused on ease of use.

Eloquent DepthAI An wrapper around depthai Python package focused on ease of use. Example code To preview a realtime stream of the disparity camera: f

null 1 Apr 7, 2022
A python wrapper for plotting best fit lines.

bestFit Simple wrapper to plot the best fit line for a given set of coordinates. Installation: pip install -U bestFit In a .py file, import with impor

null 2 Nov 25, 2022
All in one API wrapper written in Python.

ManyAPI All in one, Cross-platform, Simple but powerful API wrapper written in Python. API list & Examples API List : api_list.md Examples : examples

null 1 Apr 14, 2022
A Python API Wrapper for Quaver (Rhythm Game) API

Quaver.py Installing Python 3.8 or higher is required # Linux/macOS python3 -m pip install -U quaver.py # Windows py -3 -m pip install -U quaver.py Q

SELECT stupidity FROM discord 2 Apr 28, 2022
A python wrapper for the Infinitode 2 API.

Infinitode.py An asynchronous python wrapper for the Infinitode-2 API using async-await syntax. Installing Installing via pip: pip install infinitode.

null 2 May 8, 2022
Python wrapper for the open-trivia-database API

opentdb-py Python wrapper for the open-trivia-database API Installation py -m pip install -U opentdb-py # latest (unstable) py -m pip install -

null 0 Nov 15, 2022
A Python API-Wrapper for the unofficial wikifolio API

wikifolio-api A Python API-Wrapper for the unofficial wikifolio API Usage Clone this repo Create a new file inside it with the following content (the

henrydatei 6 Dec 29, 2022