Your first time on this page? Allow me to give some explanations.
Awesome NLP with Ruby
Curated List: Practical Natural Language Processing done in Ruby
Here you can see meta information about this topic like the time we last updated this page, the original creator of the awesome list and a link to the original GitHub repository.
Thank you arbox & contributors
View Topic on GitHub:
Search for resources by name or description.
Simply type in what you are looking for and the results will be filtered on the fly.
Further filter the resources on this page by type (repository/other resource), number of stars on GitHub and time of last commit in months.
Composable Operations is a tool set for creating operations and assembling multiple of these operations in operation pipelines.
Ruby wrapper for Apache Spark
Simplifying Kafka for ruby apps
Ruby: parallel processing made simple and fast
Parallel Workflow extension for Rake, runs on multicores, clusters, clouds.
Ruby bindings to the OpenNLP Java toolkit.
Ruby bindings to the Stanford Core NLP tools (English, French, German).
Natural language processing framework for Ruby.
wrapper for basic nlp tools
JRuby tools wrapper for Apache OpenNLP
A sdk for AlchemyAPI using Ruby - Please note that this legacy AlchemyAPI SDK is no longer supported by IBM. Please use the Watson SDKs https://github.com/watson-developer-cloud?utf8=✓&query=sdk
Ruby library for Wit.ai
Ruby based API for the project Wortschatz Leipzig.
Official Ruby client for the MonkeyLearn API. Build and consume machine learning models for language processing from your Ruby apps.
Google Cloud Client Library for Ruby
A simple tokenizer in Ruby for NLP tasks.
A multilingual tokenizer to split a string into tokens
Natural language processing algorithms implemented in pure Ruby with minimal dependencies
Simple and customizable text tokenization gem.
Pragmatic Segmenter is a rule-based sentence boundary detection gem that works out-of-the-box across many languages.
Ruby port of the NLTK Punkt sentence segmentation algorithm
Accurate Bayesian sentence tokenizer in Ruby.
A fast and accurate rule-based sentence segmentation tool for Ruby.
Expose libstemmer_c to Ruby
Ruby port of UEALite Stemmer - a conservative stemmer for search and indexing
Lemmatizer for text in English. Inspired by Python's nltk.corpus.reader.wordnet.morphy
Lexical Statistics: Counting Types and Tokens
Your Word Counter Gem
A word counter for String and Hash in Ruby
A Ruby natural language processor.
Filtering Stop Words
Project for filtering stopwords
Phrasal Level Processing
N-Gram generator in Ruby - http://en.wikipedia.org/wiki/N-gram
Break words and phrases into ngrams.
A flexible and general-purpose ngrams library written in Ruby. Raingrams supports ngram sizes greater than 1, text/non-text grams, multiple parsing styles and open/closed vocabulary models.
An Earley parser written in Ruby
Syntax tree generator made with Ruby and RMagic
Approximate String Matching library
Calculates edit distance using Damerau-Levenshtein algorithm
Fast Ruby FFI string edit distance algorithms
Fast string edit distance computation, using the Damerau-Levenshtein algorithm.
Term Frequency - Inverse Document Frequency in Ruby
Ruby gem to calculate the similarity between texts using tf*idf
A simple and extensible sentiment analysis gem
Spelling and Error Correction
Ruby wrapper for correcting spelling and grammar mistakes based on the context of complete sentences.
Ruby bindings to Hunspell 1.2.x with iconv support
Ruby FFI bindings for Hunspell.
Ruby wrapper for the famous spell checker library hunspell.
Alignment functions for corpus linguistics
REST client for Google APIs
Ruby client for the microsoft translator API
Translations with speech synthesis in your terminal as a ruby gem
Ruby NLP library
Sentiment analysis for the German language
Numbers, Dates, and Time Parsing
Chronic is a pure Ruby natural language date parser.
A natural language parser for validating complex date ranges
A simple Ruby natural language parser for elapsed time
A dirt simple library for parsing and formatting human readable dates
Nickel extracts date, time, and message information from naturally worded text.
Natural language parser for recurring events
Parse numbers in natural language from strings (ex forty two).
Named Entity Recognition
Named entity recognition with Stanford NER and Ruby
Ruby Binding for Stanford Pos-Tagger and Name Entity Recognizer
Ruby wrapper for ‘espeak’ and ‘lame’ with sugar on top to create Text-To-Speech mp3 files.
A ruby gem for Text-To-Speech by using google translate service.
A Ruby library for consuming the AT&T Speech API for speech to text.
Ruby speech recognition with Pocketsphinx
Dialog Agents, Assistants, and Chatbots
A straightforward ruby-based Twitter Bot Framework, using OAuth to authenticate.
ChatOps for Ruby.
A pure Ruby interface to the WordNet database
A Ruby interface to the WordNet® Lexical Database.
Machine Learning Libraries
Ruby language bindings for LIBSVM
Machine Learning & Data Mining with JRuby
ID3-based implementation of the ML Decision Tree algorithm
A Ruby interface to the Timbl machine-learning library
A general classifier module to allow Bayesian and other types of classifications. A fork of cardmagic/classifier.
A Ruby wrapper for Latent Dirichlet Allocation (LDA).
This is the Ruby interface to LIBLINEAR (much more efficient than LIBSVM for text classification and other large linear classifications)
A redis-backed Bayesian classifier
a JRuby maximum entropy classifier for string data, based on the OpenNLP Maxent framework
Simple Naive Bayes classifier
A robust, full-featured Ruby implementation of Naive Bayes
A generalized rack framework for text classifications.
Naive Bayes text classification implementation as an OmniCat classifier strategy. (#ruby #naivebayes)
Ruby library for interfacing with FANN (Fast Artificial Neural Network)
library for nlp with ruby
Optical Character Recognition
A Ruby wrapper library to the tesseract-ocr API.
Read text and metadata from files and documents (.doc, .docx, .pages, .odt, .rtf, .pdf)
Full Text Search, Information Retrieval, Indexing
A Ruby client for Apache Solr
Solr-powered search for Ruby objects
Sphinx plugin for ActiveRecord/Rails
Ruby integrations for Elasticsearch
Elasticsearch integrations for ActiveModel/Record and Ruby on Rails
REST client for Google APIs
Language Aware String Manipulation
Find a needle (a document or record) in a haystack using string similarity and (optionally) regular expression rules. Uses Dice's Coefficient (aka Pair Similiarity) and Levenshtein Distance internally.
fuzzy string matching library for ruby
Ruby on Rails
Fuzzy document finding in Ruby
Unicode normalization library. (Mirror of Yoshida-san's code base to maintain the RubyGem.)
Find a lot of kinds of common information in a string. CommonRegex port for Ruby
Generate strings that match a given regular expression
Make difficult regular expressions easy! Ruby port of the awesome VerbalExpressions repo - https://github.com/jehna/VerbalExpressions
Hebrew - English Transliteration Engine
Ruby bindings to re2, an "efficient, principled regular expression library".
Regex to sample value. Ruby gem.
RegexSample.generate(/(a|b)/) #=> a or b
Russian transliteration using nalgeon/iuliia schemas
Articles, Posts, Talks, and Presentations
Projects and Code Examples
Distance Measurements are Awesome!
Named entity recognition with Stanford NER and Ruby
Needs your Help!
Ferret: the extensible information retrieval library for ruby.
A Ruby C wrapper for Open Text Summarizer
A list of Neural MT implementations
A collection of awesome Ruby libraries, tools, frameworks and software
A collection of links to Ruby Natural Language Processing (NLP) libraries, tools and software
A curated list of speech and natural language processing resources
Official gem repository: Ruby kernel for Jupyter/IPython Notebook
Links to awesome OCR projects
TensorFlow - A curated list of dedicated resources http://tensorflow.org