Transform the json output of TermSuite into dindexed terms per documents. Initially made for ISTEX purposes.

guibon authored on 7 Mar 2016
ISTEX-json2index-2.0 Now works for TermSuite2.0 JSON output 2 years ago
ISTEX-json2index-2.0.jar Now works for TermSuite2.0 JSON output 2 years ago
README.md updated README 2 years ago
README.md

ISTEX-json2index

Transform the json output of TermSuite into dindexed terms per documents. Initially made for ISTEX purposes. Adapted for TermSuite2.0 TermSuite2.0 !

Usage

java -jar ISTEX-json2index-2.0.jar -input [termsuite json output file path] -output [tsv file path]

You can also use the -parse option to use it using parsing but I don't recommend it.

java -jar ISTEX-json2index-2.0.jar -parse -input [termsuite json output file path] -output [tsv file path]

For huge json files

If you have a huge json file to index, please do not forget to specify the maximum RAM size by using the -Xmx java option. Also, using the flag option -parse will not manage to treat huge json files.

Output

A tsv file with:

Spotting Rule | TermPilot | Document Frequency | Specificity

Each Document is specify before iterating its terms with the following line: cluster | Document : | id | path of the document

Contacts

gael dot guibon at gmail.com gael dot guibon at inist.fr istex at inist.fr

@2015 ISTEX INIST-CNRS