Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Twitter Message Sub-string Extraction?
From
Richard Fairbanks <[email protected]>
To
[email protected]
Subject
st: Twitter Message Sub-string Extraction?
Date
Sun, 12 Jun 2011 15:39:37 -0400
Dear Statalisters,
I'm preparing a dataset of ~ 2,000 tweets (Twitter messages) for social
network analysis. I'm trying to track who tweeted to whom and the theme
(hashtag) of the message.
Observations of the single variable look like this.
*@RndmUsername* I'm having a great time at #Ibiza! #summer2011 RT
@SomeOtherPerson15 @YetAnotherPerson
For those unfamiliar with Twitter:
@[Name] - Username of the person sending the tweet. Must be 20 characters or
less, including letters and / or integers in any position.
RT - "re-tweet" - Think of this like an email "Forward" option for tweets.
No help needed here, just making a dummy variable!
#[Name] - "hashtag" - An arbitrary code in letters and integers specifying
the topic or adding commentary
Subsequent @[Name]s - These are people to whom the message is specifically
directed.
I know how to generate a new variable that contains the message sender
(always the first string after the "@" character) using regular expressions,
although there's probably a simpler way.
How can I generate a new variable that contains #[Names] and @[Names] after
the first case of a username or hashtag? (That is, using the example, I'm
having trouble extracting #summer2011, @SomeOtherPerson15 and
@YetAnotherPerson.
Thanks,
Richard Fairbanks
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/