CCS C Software and Maintenance Offers
FAQFAQ   FAQForum Help   FAQOfficial CCS Support   SearchSearch  RegisterRegister 

ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

CCS does not monitor this forum on a regular basis.

Please do not post bug reports on this forum. Send them to CCS Technical Support

Counting non-ASCII charcaters in string

 
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion
View previous topic :: View next topic  
Author Message
glenjoy



Joined: 18 Oct 2004
Posts: 21

View user's profile Send private message

Counting non-ASCII charcaters in string
PostPosted: Tue Mar 01, 2005 11:37 pm     Reply with quote

I am having a problem in knowing if how many ASCII or non-ASCII character there is in a certain string, as I've noticed that strlen() only counts ASCII characters and stops on a NULL, so if there is a NULL in between my array, it will stop counting and will leave the sequence.

Is there a command that I will know the lenght or number of ASCII or non-ASCII chracter inside an array?

Thanks.


Code:

{
   unsigned char x;
   unsigned char data[] = {'A','B','C','D','E','F','G','H','I',};
   unsigned char data_1[] = {"ABCDEFGHI"};


                                           // results
   printF(" %d ",strlen(data));    // ---- > 18
   printf(" %d ",strlen(data_1));  //----->  9
   
   printF("sf%d",sizeof(data));    //----->  9
   printf(" sf%d",sizeof(data_1)); //---->10
}

 



Your help will be much appreciated.
Ttelmah
Guest







PostPosted: Wed Mar 02, 2005 4:46 am     Reply with quote

Key thing to remember, is that in C, '\0', is the _string_ terminator. Hence strlen, will terminate on this character.
Since characters are only stored in bytes, there is no such thing as a 'non ASCII' character in this language (no support for things like UniCode). Every character stored in a character array, is inherently 'ASCII'. If the array is 50 characters long, then it potentially holds 50 ASCII characters. Without a terminator character, there is no way to distinguish 'real' data from empty spaces in the array. The code for 'strlen', is in string.h, so if you want to use another character as a terminator, simply copy this, and write your own version, testing for the other terminator. The number of 'ASCII' characters held in a 50 character array, is the size of the array.
Now you 'abuse' the strlen function, and hand it the address of an array that does not contain a string. Strlen 'data', is returning '18', because 'data' does not contain a 'string', and as such has no null terminator. Hence the function scans forward through memory, till it hits the null terminator on 'data_1', which is sitting after 'data' in memory. This gives the nine characters of 'data', followed by the nine characters of data1, and a result of '18'. This is an example of incorrectly using a function, and getting garbage as a result (strlen is designed to handle 'strings'). If you want to have strings containing the null character, they you will have to handle this yourself, the classic example, is how buffers which may contain any ASCII character, are handled for serial I/O, with a seperate counter being used to give the input and output point, and inherently the 'size'.
The sizeof function, is also giving exactly the results expected, with the character array being 9 characters long, while the string is ten characters long (the 9 text characters, plus the null terminator).

Best Wishes
SherpaDoug



Joined: 07 Sep 2003
Posts: 1640
Location: Cape Cod Mass USA

View user's profile Send private message

PostPosted: Wed Mar 02, 2005 8:05 am     Reply with quote

In a nutshell...
Somehow you have to know when the string ends. C by convention uses a null to mark then end. If you don't use a null you must use something to mark the end, or there is no end!
Also, bytes is bytes. Strings store bytes. There are no non-ASCII bytes. ASCII is just a convention for translating bytes to characters. ASCII has a character for every possible byte.
_________________
The search for better is endless. Instead simply find very good and get the job done.
Display posts from previous:   
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group