|
|
View previous topic :: View next topic |
Author |
Message |
glenjoy
Joined: 18 Oct 2004 Posts: 21
|
Counting non-ASCII charcaters in string |
Posted: Tue Mar 01, 2005 11:37 pm |
|
|
I am having a problem in knowing if how many ASCII or non-ASCII character there is in a certain string, as I've noticed that strlen() only counts ASCII characters and stops on a NULL, so if there is a NULL in between my array, it will stop counting and will leave the sequence.
Is there a command that I will know the lenght or number of ASCII or non-ASCII chracter inside an array?
Thanks.
Code: |
{
unsigned char x;
unsigned char data[] = {'A','B','C','D','E','F','G','H','I',};
unsigned char data_1[] = {"ABCDEFGHI"};
// results
printF(" %d ",strlen(data)); // ---- > 18
printf(" %d ",strlen(data_1)); //-----> 9
printF("sf%d",sizeof(data)); //-----> 9
printf(" sf%d",sizeof(data_1)); //---->10
}
|
Your help will be much appreciated. |
|
|
Ttelmah Guest
|
|
Posted: Wed Mar 02, 2005 4:46 am |
|
|
Key thing to remember, is that in C, '\0', is the _string_ terminator. Hence strlen, will terminate on this character.
Since characters are only stored in bytes, there is no such thing as a 'non ASCII' character in this language (no support for things like UniCode). Every character stored in a character array, is inherently 'ASCII'. If the array is 50 characters long, then it potentially holds 50 ASCII characters. Without a terminator character, there is no way to distinguish 'real' data from empty spaces in the array. The code for 'strlen', is in string.h, so if you want to use another character as a terminator, simply copy this, and write your own version, testing for the other terminator. The number of 'ASCII' characters held in a 50 character array, is the size of the array.
Now you 'abuse' the strlen function, and hand it the address of an array that does not contain a string. Strlen 'data', is returning '18', because 'data' does not contain a 'string', and as such has no null terminator. Hence the function scans forward through memory, till it hits the null terminator on 'data_1', which is sitting after 'data' in memory. This gives the nine characters of 'data', followed by the nine characters of data1, and a result of '18'. This is an example of incorrectly using a function, and getting garbage as a result (strlen is designed to handle 'strings'). If you want to have strings containing the null character, they you will have to handle this yourself, the classic example, is how buffers which may contain any ASCII character, are handled for serial I/O, with a seperate counter being used to give the input and output point, and inherently the 'size'.
The sizeof function, is also giving exactly the results expected, with the character array being 9 characters long, while the string is ten characters long (the 9 text characters, plus the null terminator).
Best Wishes |
|
|
SherpaDoug
Joined: 07 Sep 2003 Posts: 1640 Location: Cape Cod Mass USA
|
|
Posted: Wed Mar 02, 2005 8:05 am |
|
|
In a nutshell...
Somehow you have to know when the string ends. C by convention uses a null to mark then end. If you don't use a null you must use something to mark the end, or there is no end!
Also, bytes is bytes. Strings store bytes. There are no non-ASCII bytes. ASCII is just a convention for translating bytes to characters. ASCII has a character for every possible byte. _________________ The search for better is endless. Instead simply find very good and get the job done. |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|