| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Howto use eXpath XML browser

Page history last edited by Sergeant_Kolja 8 years, 9 months ago

 

(based on a German Discussion from JA2 German Forum, between Realist and Sergeant Kolja, translated into English and moved to the new Wiki by Sergeant Kolja)

Question

QUOTE=Realist can anybody tell, how the XML parsing by expat works? I can understand it neither from BP's code nor from expat-online-documentation./QUOTE

Answer

Yes, see below

Question

QUOTE=Realist Or does anybody know a XML-Parser for C, which is handy? I'm trying hard with LIBXML, but this is also quite complicated./QUOTE

Answer

Not surprisingly, both are tightly related ...

Answer

Take a look into the file .\Tactical\XML_ComboMergeInfo.cpp, and there the function ReadInAttachmentComboMergeStats().
This is the smallest, simplest of all.
Basically, it works this way:


/*define Your own struct for the data from XML file */

struct

{

...

} typedef attachmentcombomergeParseData;

...

/*create our Data object */

attachmentcombomergeParseData Data;


/*create a parser object */

XML_Parser parser = XML_ParserCreate(NULL);


 

The above just creates an object named 'parser'. This objekt has to be given to all the following functions as the 1st argument. Imagine it as a 'handle'.
Then, a memory block with the size of the XML-File is created (may be, one can read the file also without RAM - don't care here, perhaps you'll find it out by yourself, later).

the calls


XML_SetElementHandler( parser, attachmentcombomergeStartElementHandle, attachmentcombomergeEndElementHandle);

XML_SetCharacterDataHandler( parser, attachmentcombomergeCharacterDataHandle );


 

now connecting the object 'parser' with your 3 handler functions:

  • attachmentcombomergeStartElementHandle()
  • attachmentcombomergeEndElementHandle()
  • attachmentcombomergeCharacterDataHandle()

This 3 are called later on - for any XML data element, whis was found in the file by the parser.
Now you need to know, Data is nothing but a helper structure!
The 'real' Data of our exsample are located in AttachmentComboMerge, which is declared somewhere else as ComboMergeInfoStruct AttachmentComboMerge[MAXITEMS+1];.
So we need to tell 'Data', where 'AttachmentComboMerge' is in the memory, and how big it is in size.
And of course we have to tell the parser, that we want to have a reference or handle like connection to our Data in all of the 3 functions (named above):

/* Init Your own structure to ZERO */

memset( &Data, 0x00, sizeof(pData) );


/* Init some of the Members to useful start values */

Data.curArray = AttachmentComboMerge;

Data.maxArraySize = MAXITEMS;


/* set parser to tell us the address of 'Data' as 1st arg on each call*/

XML_SetUserData(parser, &Data);


 

Every time, the parser calls one of the 3 functions, we get a pointer to 'Data' as 1st argument (here it's name is 'void* userData'). This pointer is of type void*, because eXpat can't know any and all types.
Thus, our 3 functions have to typecast the void pointer back to the application specific type:


attachmentcombomergeParseData * pData =

(attachmentcombomergeParseData*) userData;


This is not dangerous, because 'void* userData' has been already &Data, as you remeber. So it only lost it's type for easier transport (in C++ one would use type templates and entirely avoid such back-and-forth casting, but eXpath is C)

 

'pData->' now points to all elements in 'Data'. And 'pData->curArray.' has access to all Elements of 'AttachmentComboMerge' of the real right daten table. The latter is by the way an array of:


typedef struct

{

UINT16 usItem;

UINT16 usAttachment[2];

UINT16 usResult;

UINT32 uiIndex;

} ComboMergeInfoStruct;

...

ComboMergeInfoStruct AttachmentComboMerge[MAXITEMS+1];


 

5002 elements.

 

Question

And now? What is following this?

Answer
Well, the parsing of the whole memory block (which is indeed a 1:1 copy of the XML-file).
And after the parser has finished, the release of the parser object.

Finally, the release of the memory block:


XML_Parse(parser, lpcBuffer, uiFSize, TRUE);

XML_ParserFree(parser);


 

Question

yes ... but ... where it will be read? :confused:

Answer
This is simple: in the 3 callback-functions, which I referenced often before!

The XxxxStartElement() function is allways called, when the XML-file 'opens' an element.
This is not only true for a simple variable like:


<uiIndex>0</uiIndex>


 

but also for any high-level layer:


<ATTACHMENTCOMBOMERGELIST>

   <ATTACHMENTCOMBOMERGE>

      <uiIndex>0</uiIndex>


 

Any 'opening' layer calls the XxxxStartElement() function. In our example file so 3 times, before we get on the data details. In our recent implementation of XxxxStartElement() we just check if we are within the same layer with the variables (uiIndex & Co.) or if we are one or more layers 'up'. Or if we are in the right session at all (so below <ATTACHMENTCOMBOMERGELIST>). One could also increment the Array Index here ... - TIMTOWDI ...

 

In function XxxxCharacterDataHandle() is done - opposite to the name - quite less. Just appended the content of the actual data elements to pData->szCharData.

Arrived on the first data layer, we expect to have "0" in pData->szCharData, because there is only one single data element 'uiIndex' with the value '0', namely: "<uiIndex>0</uiIndex>".

The 'real' evaluation of our data is following only after "</uiIndex>" was found, exactly in XxxxEndElement().
This looks complicated, but it's smart, because we only know now that the parser inbetween got the final character of the data content. In our examples it is allways a single number which already could have been examined in XxxxCharacterDataHandle(), but this is not allways the case in XML.


B.T.W.:
In JA2.1.13, the uiIndex elements are quite sensitiv, because alle XML reading fiunctions get their array indexes for the internal C-arrays immediately from these uiIndex XML variables.
And nearly all of the externalized arrays are 5000 characters in size

 


A final note:

our XML-files are using UTF-8 character set, to allow multi national text. UTF-8 allows optional a BOM (Byte Order Marker), means there are 3 Bytes at start of the file, reading in hexadecimal as "EF BB BF".
At least when you use German text with umlauts (äÄöÖüÜß) some tools seems to run crazy (f.i. Beyond Compare), if you save this file one time with BOM and one time without.
It's tricky, because a lot of tools (not only Beyond Compare and Altova XML and Total Commander Viewer) are hiding the BOM from your eyes in the first place.

Hint: always use and save the files with BOM!

 

:erdbeerteechug:

 

 

Comments (0)

You don't have permission to comment on this page.