Help parsing XML data

Hi @ABG, I received your posts via notification on my smartphone and don' t know what I should say or how should I thank you. At least it triggered me to turn computer on while I was on the way to bed.

I will have a detailed look at it on weekend. I'm looking forward to it. Regarding the date / time I am very sure to know what it is. I deal with that in business programming. I guess it is a timestamp +0000 is an offset. A timestamp always is UTC. So the offset is 0000.

In Germany (CET) the timestamp would be +01:00 hour (normal time) +02:00 hours (summer time). I found e.g. that link It seams to be related to XML and an Oracle DB.

Best Regards
Thomas

1 Like

I had some free time to show an example. Compared the xml with the data shown in the app in local (CET) time. You can see, that currently 1 hour offsetz (we are currently in normal time not in summer time) must applied for display.

So... there are two options... convert the local time to utc and search the list. Then convert the UTC timestamp from search result to local time for display in the app. Or convert the UTC timestamp in XML to local time. But this is only unique when using the offset. Because last sunday in october some times are doubled e.g. 2:15 pm.

In business live the 1 approach turned out best. Convert local time to UTC, search, converte the search result back to local time for display. One aspect is: do the more complex conversion from local to UTC only one time (UI) then get the search result. Then convert the fount fewer records from utc to local.

Lets say: I want to see the whole program list for a sender from 00:00 to 23:59:59 for today. Then search sould be done for: 20260218230000 +0000 to 20260219225959 +0000 while a compareable timestamp in local would be: 20260219000000 +0100 to 20260219235959 +0100

Add Info: i found other links with epg data. But they turned out to be slower on the other hand have api' s (was not able to use them) or filter possibilities via URL parameters (I guess). The last other alternatives I found just now... Link 1 Here it would be possible to load channels separately. This version I tried also but it is very slow. XML larger.

We have been down this rabbit hole (wormhole?) before.

1 Like

Yes. I will have a look at it on weekend. The timestamp topic was only an information for you. This is the less important topic in my opinion.

image

There will need to be some conversion across time zones to make the lexigraphic start/stop comparisons work.

This requires further thought, which is in short supply on my end at the moment.

1 Like

I am absolutly fine with your support. Your input will help me if I will continue follow up this topic. I have some Arduino projects first. What I tried (after a nightly flash of inspiration) is to import that in Excel. Knowing that the newer versions internal deal with XML (.xlsx) suggest that it supports handlng capapilities. In fact there is a Power Query Editor. It handles the file of about 17 MB within seconds on PC. Of course not compareable to a tablet.

I'm still unsure whether AI2 can process this, or whether it needs to be programmed natively (Android Developer Studio).

Though AI2 has no Excel support, there is a Google Sheets Spreadsheet component and some history of using that with Google Scripting.

I found
https://www.google.com/search?q=google+sheets+import+xml+from+URI
to be of possible use if you want to filter and sort your xml data upstream for smaller downloads, in a hybrid approach.

I have no personal experience with Google Script yet, though.

hi @ABG, I made one last attempt to test the feasibility. I won't be pursuing this topic further for now. Using the following approach, I was able to achieve results that could be expanded upon (hybrid approach). But that's obviously not what I want.

Preparing the data in Excel. Problem: I only have a current version that supports Power Query on my work computer. I've now programmed this in VBA. Quick and "the word ist not allowed". This is also a good opportunity to address the UTC to CET issue. If you're already working on it, you don't have to do it later in the app.

I exported the data. CSV is impractical because the name "TV program" can contain a comma. The same applies to semicolons. I tried tab stops. I could not parse them. \n\r and '\t' and all the other tricks you can find online didn't work.

There are two versions: a short list and a long (complete) one. You can switch between them. Displaying a long list in the list viewer takes 30 seconds. I'm not going to test whether it can still filter the data (given the amount of data).

image

You'll find all the files attached if you're interested and want to try it out. Please don't be confused by the name of the AIA file. It's no longer a CSV file, but a text file with its own separator.

EPG_CSV_X1.aia (550.4 KB)

Excel File I can not post because of maybe containing personal data. I post some parts how conversion was done... converting the data ist very fast...

All marked data (column A rows 1 to n) will be analyzed... I trigger it directly in VBA but via Macros is also possible

This results in Tabelle1...

and is overworked with a formula. then copied directly from cells to a notepad document.

So... thats it so far from my side for that topic at the moment. Will continue build my mini retro Arduino game unit :slight_smile:

PS (VBA snippet)

Sub LoopThroughSelection()
    Dim cell As Range
    Dim parts() As String
    Dim curRow As Long
    Dim curCol As Long
    Dim count As Long
    
    Set regex = CreateObject("VBScript.RegExp")
    With regex
        .Global = False
        .IgnoreCase = True
        ' Muster: Zwischen diesen Tags
        .Pattern = ">(.*?)</title>"
    End With
    
    ' kein screen update
    Application.ScreenUpdating = False
    
    ' Sicherstellen, dass nur Zellen (keine Grafiken) durchlaufen werden
    If TypeName(Selection) = "Range" Then
        For Each cell In Selection
            If InStr(1, cell.Value, "<programme") > 0 Then
                ' Teilt den Text anhand des " Zeichens
                parts = Split(cell.Value, """")
                
                If UBound(parts) >= 2 Then
                    curRow = curRow + 1
                    ' channel
                    Worksheets("Tabelle1").Cells(curRow, 1).Value = parts(5)
                    ' start
                    ' Dieses Zeitstempel-Format (yyyymmddhhmmss +hhmm) ist eine Variante
                    ' der ISO 8601 Norm, speziell das sogenannte Basisformat (Basic Format)
                    ' für Datum und Uhrzeit. +hhmm (oder -hhmm): Zeitzonen-Offset (Stunden
                    ' und Minuten Abweichung von UTC
                    Worksheets("Tabelle1").Cells(curRow, 2).Value = parts(1)
                    ' stopp
                    Worksheets("Tabelle1").Cells(curRow, 3).Value = parts(3)
                End If
                
                ' jetzt in den nächsten zeilen nach dem titel suchen; aktuell
                ' werden maximal 3 folgezeilen ausgewertet
                count = 0
                Do
                    count = count + 1
                    
                    Set matches = regex.Execute(cell.Offset(count, 0).Value)
                    
                    If matches.count > 0 Then
                        Worksheets("Tabelle1").Cells(curRow, 4).Value = matches(0).SubMatches(0)
                        Exit Do
                    End If
                Loop Until count = 3
            End If
            
            ' Beispiel: Debug.Print cell.Value ' Wert im Direktfenster anzeigen
        Next cell
    End If
    
    ' screen update an
    Application.ScreenUpdating = True
End Sub