Skip to content

utgwkk/Twitter-Text

Repository files navigation

Actions Status

NAME

Twitter::Text - Perl implementation of the twitter-text parsing library

SYNOPSIS

use Twitter::Text;

$result = parse_tweet('Hello world こんにちは世界');
print $result->{valid} ? 'valid tweet' : 'invalid tweet';

DESCRIPTION

Twitter::Text is a Perl implementation of the twitter-text parsing library.

WARNING

This library does not implement auto-linking and hit highlighting.

Please refer Implementation status for latest status.

FUNCTIONS

All functions below are exported by default.

Extraction

extract_hashtags

$hashtags = extract_hashtags($text);

Returns an array reference of extracted hashtag string from $text.

extract_hashtags_with_indices

$hashtags_with_indices = extract_hashtags_with_indices($text, [\%options]);

Returns an array reference of hash reference of extracted hashtag from $text.

Each hash reference consists of hashtag (hashtag string) and indices (range of hashtag).

extract_mentioned_screen_names

$screen_names = extract_mentioned_screen_names($text);

Returns an array reference of exctacted screen name string from $text.

extract_mentioned_screen_names_with_indices

$screen_names_with_indices = extract_mentioned_screen_names_with_indices($text);

Returns an array reference of hash reference of extracted screen name or list from $text.

Each hash reference consists of screen_name (screen name string) and indices (range of screen name).

extract_mentions_or_lists_with_indices

$mentions_or_lists_with_indices = extract_mentions_or_lists_with_indices($text);

Returns an array reference of hash reference of extracted screen name from $text.

Each hash reference consists of screen_name (screen name string) and indices (range of screen name or list). If it is a list, the hash reference also contains list_slug item.

extract_urls

$urls = extract_urls($text);

Returns an array reference of extracted URL string from $text.

extract_urls_with_indices

$urls = extract_urls_with_indices($text, [\%options]);

Returns an array reference of hash reference of extracted URL from $text.

Each hash reference consists of url (URL string) and indices (range of screen name).

Validation

parse_tweet

$parse_result = parse_tweet($text, [\%options]);

The parse_tweet function takes a $text string and optional \%options parameter and returns a hash reference with following values:

  • weighted_length

    The overall length of the tweet with code points weighted per the ranges defined in the configuration file.

  • permillage

    Indicates the proportion (per thousand) of the weighted length in comparison to the max weighted length. A value > 1000 indicates input text that is longer than the allowable maximum.

  • valid

    Indicates if input text length corresponds to a valid result.

  • display_range_start, display_range_end

    An array of two unicode code point indices identifying the inclusive start and exclusive end of the displayable content of the Tweet.

  • valid_range_start, valid_range_end

    An array of two unicode code point indices identifying the inclusive start and exclusive end of the valid content of the Tweet.

EXAMPLES

use Data::Dumper;
use Twitter::Text;

$result = parse_tweet('Hello world こんにちは世界');
print Dumper($result);
# $VAR1 = {
#       'weighted_length' => 33
#       'permillage' => 117,
#       'valid' => 1,
#       'display_range_start' => 0,
#       'display_range_end' => 32,
#       'valid_range_start' => 0,
#       'valid_range_end' => 32,
#     };

is_valid_hashtag

$valid = is_valid_hashtag($hashtag);

Validate $hashtag is a valid hashtag and returns a boolean value that indicates if given argument is valid.

is_valid_list

$valid = is_valid_list($username_list);

Validate $username_list is a valid @username/list and returns a boolean value that indicates if given argument corresponds to a valid result.

is_valid_url

$valid = is_valid_url($url, [unicode_domains => 1, require_protocol => 1]);

Validate $url is a valid URL and returns a boolean value that indicates if given argument is valid.

If unicode_domains argument is a truthy value, validate $url is a valid URL with Unicode characters. (default: true)

If require_protocol argument is a truthy value, validation requires a protocol of URL. (default: true)

is_valid_username

$valid = is_valid_username($username);

Validate $username is a valid username for Twitter and returns a boolean value that indicates if given argument is valid.

SEE ALSO

twitter-text. Implementation of Twitter::Text (this library) is heavily based on Ruby implementation of twitter-text.

https://developer.twitter.com/en/docs/counting-characters

COPYRIGHT & LICENSE

Copyright (C) Twitter, Inc and other contributors

Copyright (C) utgwkk.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

AUTHOR

utgwkk utagawakiki@gmail.com