npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@digitallinguistics/tags2dlx

v0.4.0

Published

A JavaScript (Node.js) library that converts a tagged (monolinear) text to DLx JSON format

Downloads

14

Readme

tags2dlx

GitHub issues GitHub release (latest SemVer) Travis (.com) branch GitHub DOI GitHub stars

Have a question or need to report an issue? Open an issue here.

Introduction

This is a JavaScript library for Node.js that converts a tagged linguistic text to DLx JSON format. It is useful for anybody working with a monolingual (single-language) corpus whose words have been tagged for some feature.

For example, here is an example sentence from the Open American National Corpus:

All_DT hotels_NNS accept_VBP major_JJ credit_NN cards_NNS ._.
I_PRP guess_VBP I_PRP was_VBD in_IN a_DT wing_NN of_IN the_DT hospital_NN ._.

Each word in this sentence has been tagged for part of speech by adding an underscore at the end of the word, followed by an abbreviation indicating the part of speech. For example, the word hotels has been tagged as a plural noun using the abbreviation NNS. Punctuation is typically tagged as well (._. in the above example). The text is divided into utterances, with each new utterance starting on a new line.

Using this library, you can convert tagged texts like this to JSON format, like so:

{
  "utterances": [
    {
      "words": [
        {
          "transcription": "All",
          "tags": {
            "pos": "DT"
          }
        },
        {
          "transcription": "hotels",
          "tags": {
            "pos": "NNS"
          }
        },
        …
      ]
    },
  ]
}

The format of the resulting JSON can be adjusted by passing options to the tags2dlx converter. See the Options section below.

Installation & Basic Usage

Installation:

npm install @digitallinguistics/tags2dlx

Usage in Node.js (latest stable release):

import convert from '@digitallinguistics/tags2dlx';

const text = `This_DEM is_V a_DET sentence_N ._.`;

// The output is a plain-old JavaScript object (POJO), formatted as a DLx Text object
const output = convert(text, options);

// Do something with the output, like write it to text.json

Usage on the command line:

tags2dlx corpora/English

Options

The tags2dlx function accepts an options hash as the second argument. The options hash accepts the following options:

Option | Flag | Default | Description --------------------- | ---- | ------------ | ----------- metadata | N/A | {} | An object containing additional metadata to add to the Text, such as title, etc. This metadata should adhere to the DLx Text format. punctuation | -p | ,.!?"'‘’“” | Punctuation to ignore. Tagged items consisting of one of these characters will be removed from the output. tagName | -n | null | The name of the property to store the tag in tagSeparator | -s | _ | The character(s) delimiting the word token from its tag

Command Line

The tags2dlx library can also be run from the command line. The script accepts one required argument, which is the path to either a file or folder to convert. If a single file is passed, that file will be converted to JSON and a new JSON file generated alongside the original. If a folder is passed, the script will recurse the directory and convert each file with a .txt extension to JSON, saving the new file alongside the original.

The command line version supports each of the same options as the module version, with the exception of the metadata option. This option is not available on the command line.

If this library is installed globally, you should be able to run it from the command line simply by typing tags2dlx text.txt. Otherwise you will need to run the script as a regular node script, using node node_modules/@digitallinguistics/tags2dlx/tags2dlx.js text.txt.

Contributing

Want to contribute to this project? Feel free to open an issue or create a pull request. If your pull request does anything other than fix a bug, please open an issue to discuss the change first.

Tests are run using Jasmine. They can be run from the command line with npm test.

This library is written and maintained by Daniel W. Hieber (@dwhieb) and made available under an MIT license. Please cite this library using the following model:

Hieber, Daniel W. 2019. digitallinguistics/tags2dlx. https://doi.org/10.5281/zenodo.3376957