Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Strings dropped when importing Excel file
From
Friedrich Huebler <[email protected]>
To
[email protected]
Subject
Re: st: Strings dropped when importing Excel file
Date
Fri, 8 Nov 2013 21:00:41 -0500
Sergiy,
I solved the problem by adding quote marks to the problematic cells in
Excel and everything is now imported properly into Stata.
http://www.stata.com/statalist/archive/2013-11/msg00315.html
Friedrich
On Fri, Nov 8, 2013 at 4:15 PM, Sergiy Radyakin <[email protected]> wrote:
> Friedrich,
> even if the file contains no formulas, it might still contain errors.
>
> I know it sounds strange, but this is true. There is a content-type
> flag that indicates a cell as an error, and it might be set regardless
> of what content the cell has. Consider the following example:
>
> http://radyakin.org/statalist/2013110801/errs.xlsx
>
> All values (in column A) were typed directly from the keyboard. No
> tricks. Note formatting differs (alignment) for values RECOGNIZED as
> errors by Excel as errors. I didn't adjust that at all.
>
> Now if we look at how the file is saved, you will see here (extract):
> http://radyakin.org/statalist/2013110801/errs.png
> that some cells have t="e" and some cells have t="s"
>
> If it is an "e" - the cell is erroneous. Stata imports it as a missing
> (blank for strings).
> If it is an "s" - it is a string, Stata pulls the corresponding string
> from the strings repository (separate file, not shown).
>
> Stata itself doesn't enumerate particular error messages. And there
> are more, not only the ones you have mentioned so far, see here:
> http://www.techonthenet.com/excel/formulas/error_type.php
>
> Instead Stata seems to be looking at the flag of cell type. But it
> doesn't have to. Instead StataCorp could have decided to import the
> value as an extended missing. Wait! You can say there are no extended
> missing values in strings! Well in Stata 13 you can come up with a way
> to deal with it. Especially now that you have binary strings. Even
> without serious file format modifications, the values could have been
> preserved as e.g. "char(13)+value", since we know char(13) can't be
> part of the string in Excel. Later your code would simply truncate the
> marker (since you don't need it). Alternatively, StataCorp could have
> imported all values as valid, and let the user recode all #something!
> values into the missings in strings if needed. Or... I'll stop here.
>
> Again, there are multiple ways to resolve the problem in your current
> situation without waiting for Stata to evolve. What I suggested before
> is still valid - just create, say, another column and recompute the
> values there making sure the errors (and they are really errors, the
> ISERROR() function returns true for such cells) are replaced with
> something else.
>
> Largely the choice of method depends of whether you have one such
> file? or 100? or they are coming every day?
> There is no key or option in Stata's -import excel- facility that
> would help you with that. StataCorp itself might extend the
> functionality in the future, but as far as version 13.1 stands now, I
> see no other option but to manipulate the file externally, either with
> Excel itself or with another tool available. For example, if your file
> contains only strings, as you describe, save it from Excel as
> tab-separated txt file. Then insheet it into Stata.
>
> Best, Sergiy Radyakin
>
>
>
>
> On Fri, Nov 8, 2013 at 2:13 PM, Friedrich Huebler <[email protected]> wrote:
>> Thank you, Sergiy.
>>
>> The Excel file that I am trying to import contains only text, no
>> formulas. The entries #DIV/0! (which is also replaced by an empty cell
>> as I discovered), #N/A and #VALUE! are stored as text in their
>> respective cells, together with other text in the same columns. The
>> Excel file that you created contains a formula that yields a #DIV/0!
>> error and this formula (=7/A2) can be seen when one clicks on the
>> cell. My file is different, there are no formulas, and therefore I
>> would expect all variables to be imported without changes.
>>
>> Friedrich
>>
>> On Fri, Nov 8, 2013 at 1:52 PM, Sergiy Radyakin <[email protected]> wrote:
>>> Friedrich,
>>>
>>> to the best of my knowledge the importer would take the
>>> 'cached'/'computed' value for each cell. These are stored in the excel
>>> file by Excel itself. Sometimes, if you open really old file, Excel
>>> would tell you, that these values were saved after computing the
>>> formulas using an older version of Excel, and it needs to recompute
>>> them in the newer version (this warning has many reasons to be
>>> crucial, as recomputing simply may not succeed at all).
>>>
>>> Anyways, both values that you mention seem to be not 'values' but
>>> representation - something that Excel shows you in the cell, based not
>>> only on the genuine cell value, but also the formula, and other cell
>>> attributes (note that you have TEXT separate from VALUE separate from
>>> FORMULA for each cell). In this case these are representations of the
>>> errors.
>>>
>>> Depending on the complexity of the Excel file different suggestions
>>> might be suitable. My first suggestion would be to use Excel's
>>> functions ISERR() and IFERROR() to obtain values, that you can later
>>> manage in Stata.
>>> See the example here:
>>> http://radyakin.org/statalist/2013110801/func.xlsx
>>> (I expect you would have the problem you described with col B but not col C)
>>>
>>> If you have errors of different types, use the ERROR.TYPE() function
>>> to distinguish between them. Not sure which Excel versions have this
>>> function, it might be fairly recent. Based on this reference, after
>>> 2000:
>>> http://www.techonthenet.com/excel/formulas/error_type.php
>>>
>>> Hope this helps. Don't hesitate to ask if you need more information.
>>>
>>> Best, Sergiy Radyakin
>>>
>>>
>>>
>>>
>>> On Fri, Nov 8, 2013 at 1:23 PM, Friedrich Huebler <[email protected]> wrote:
>>>> I use Stata 12.1. When importing an Excel file with string variables
>>>> (cell format "Text"), the variables are correctly imported as strings
>>>> but the following entries are replaced by empty cells in Stata:
>>>>
>>>> #N/A
>>>> #VALUE!
>>>>
>>>> Is it possible to keep these strings? I found no information on this
>>>> in the documentation for -import excel-. The -allstring- option makes
>>>> no difference.
>>>>
>>>> Thanks,
>>>>
>>>> Friedrich
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/