[Idea] Data File Component (Extended)

elatoskinas · March 4, 2020, 1:16am

Hello,

Recently I was thinking of extending the idea of a Data File component which could serve as an abstraction to provide ease of parsing certain common formats of data files, and provide all the necessary functionality to easily access and inspect the data in a more transparent way.

As part of the Chart Components project that I worked on, this component was already introduced as part of the project, but in a rather simplified form with the support of JSON & CSV files, providing the following blocks:

The main purpose of the component is that a user could read in a file/select a file source, and then if it is a valid CSV or JSON formatted document, it will read in the data, store it as rows and columns (for CSV this is pretty straight forward, for JSON a column is treated as a single JSON key+value combination, and rows are constructed by transposing the list of columns). Then, all of this could be quite simply accessed via the block properties.

However, the primary purpose of the component was to support Charts, but I believe this could be extended beyond by supporting the following (extended) features:

Support for JSON, CSV and XML (some more formats could be spreadsheet files, YAML data files, and so on) files to simplify direct file parsing from (semi)-structured files. This is also quite open to extension to new data file extensions that could be done quite transparently based on the provided file’s signature and extension.
Provide data access methods/data structures. With the newly introduced dictionary structure, this can now be quite powerful. For example, upon parsing a JSON file, one could retrieve a map that gives key -> value mappings.
Provide construction of a structured DataFile (perhaps the name is not accurate here) from given text input. This use case would essentially allow to plug in raw JSON/CSV/XML strings to the component to be able to access the data in a simpler way through the component (by either loading it into the component and providing procedure blocks to do so, or by returning a list of mappings (suitable for JSON) or some other data structure directly via a procedure block). This could save the trouble of going through multiple blocks and building the logic from scratch for data file parsing for each individual application, and allow support for more formats desired. Moreover, this is quite open to extension for some more interesting sources of data, such as spreadsheets, which could be loaded in a more user friendly and ready to use way.
Simplified updates to the file. Writing back a JSON/CSV/XML file could be tedious by hand, so the component could provide blocks to make modification easier.

Let me know your thoughts; I would be keen on taking this idea further, and working on a design document if this is deemed as an interesting use case for the community.

Peter · March 4, 2020, 5:25am

I like this idea very much. There are a lot of DataFiles that could be useable for this. The first example that i was thinking of was GPX, KML for GPS data.

Edit: Other ideas

iCalendar/ics files

Red_Panda · March 4, 2020, 12:46pm

Great idea! I think that components which handle data will be very important in the future. Maybe you can write an extension to showcase what such a component would be capable of

elatoskinas · March 8, 2020, 6:37pm

Good ideas! In general, the way XML-like documents are read could be simplified a lot by adding a component that could handle most of the work (although, in the end, we cannot support every file format out there, but we could cater to the typical use cases). I’m curious, from a power user perspective, how frequent are such use-cases (of the files you mentioned above, or of reading JSON/XML/etc. files) in general in the app developer community?

Red_Panda · March 30, 2020, 5:42pm

I started working on such a component and the good news is that the yaml reader is already working! But I want some feedback on the way the extension handles data. I have to make some choices and I want your opinion on them.

blocks style

We can either have direct conversion or asynchronous blocks like in the file component. A third option could be to return the values as properties (see picture in first post)

return format

I’m returning either a list or a dictionary at the moment, depending whether the format has key-value mappings or not. Alternatively, we could have separate GotData events for lists and dictionaries, allowing us to provide columns/rows or keys/values

format families

I’m treating GPX and KML as XML at the moment. What would be the differences or advantages of treating it differently?

detection method

How should we detect the format? Based on mimetype or file extension?

state of development

I currently use the built-in parsers for XML, JSON, CSV and HTML. I’m planning on adding YAML, .ini and TSV support. Any file formats that I’m missing?

@Peter Do you have an idea on how to represent iCalendar files in App Inventor?

mike · March 31, 2020, 12:38am

@elatoskinas @Peter @Red_Panda this kind of thing is a good idea I think. Even just the ability to call json, toml and yaml files seamlessly. ie. if you have a file called config.json and you decided for whatever crazy reason to delete it and replace it with the same file but in yaml format, config.yaml, you wouldn’t have to change a single block for it to still work. all of the parsing is done magically with the extension.

Peter · March 31, 2020, 7:14am

Sorry for my late response. I lost my father a few weeks ago so had little time on the community.

Questions about reading json files is something that is asked here but also asked frequently on other builder communities.

Questions about plotting routes are also asked. I use KML files myself frequently in my apps. I learned a lot from Carlos Pedroza on the Thunkable/Kodular community about plotting them on a Google Maps component. Didn't try to use them on a openstreetmap component.

GPX and KML are XML based so that is great i guess.
I think detecting it on file extension will do the trick.

For FOSDEM @Ghica made an app that showcased all talks and events. She used this xml file for that
https://fosdem.org/2020/schedule/xml

There was also an ical available
https://fosdem.org/2020/schedule/ical

With the new extension it would be much easier to get all the values in a list or dictionary.

Ghica · March 31, 2020, 9:24am

Peter, my sincere condolences. It must be really hard to cope with this, while living in the middle of the C-pandemic. Regards, Ghica.

Peter · March 31, 2020, 9:37am

Thank you. Yes there were many rules on how to organize the funeral and how many people could attent.

Ghica · March 31, 2020, 9:55am

Personally, I am not a big fan of JSON. I think XML is more robust, and if you use the right tools, easier to handle. But I know that I am in a small minority here.

The FOSDEM XML file is rather large, with details about the conference schedule, rooms, talks in a room, abstracts etc. My first attempt was to tackle it using XMLDecode in App Inventor. Things became really complicated and I had to code around a few bugs in the XMLDecode block.

Then, I switched to using JavaScript in an invisible WebViewer and everything fell into place.
I used JQuery to be able to download the XML to the phone, the JavaScript evaluate function to parse the XML with xPath to find the proper result.
WebViewString and the JavaScript setInterval() function are also key concepts.

The result is that the App Inventor blocks are now just those needed to build the user interface and a few to communicate with the WebViewer. The JavaScript is less than 200 lines and there are < 600 App Inventor blocks, while there were a few thousand before. Yes, you need to know xPath, but W3Schools is good for that.
Cheers, Ghica

elatoskinas · March 31, 2020, 10:16am

@Peter My condolences for your loss… I hope all the arrangements for the funeral went well in the time of the pandemic.

With regards to the files, are there also any popular extensions out there already to handle file reading? Or is the answer to e.g. reading json/csv/etc. revolve to simpler methods? (or even using WebView as @Ghica pointed out) Depending on use cases for this, it might be worthwhile to in the end have a simple component/block to read in the data in only a few steps, so that all the ‘messy’ work is hidden from the common user.

@Red_Panda Great to hear that you are working on an extension and got the YAML reader to work! I would think the standard solution for the blocks style would be an asynchronous block that would return it’s contents, although it is a bit tricky because now you have different file formats to deal with, and the output might not always be the same. As for the properties that I have shown in the picture, it is very likely that they will be replaced to asynchronous blocks in the end (so we might have something like a getColumns block that will trigger an event), but this is yet to be decided.

For the return format, I think it would make sense to allow as much flexibility as possible. We could support both options in the end, and for instance, if you think about something like CSV files, they will not have keys, so a dictionary might not be the best approach (unless your key is the entire column, which is also an interesting approach, but then not every CSV file has it’s first column designated as the key header)

With regards to detection, the ‘trick’ that we decided on with the developer team was to automatically detect the type based on the first contents of the file. While this is not the most robust option and might lead to some errors in very rare cases, it did work for our use cases. There is code in the App Inventor data base that essentially checks the first contents of the file, and based on that, either reads or rejects it (in the case of JSON). Also an option is to first try to JSON read a file, if it runs into an exception, then try to read it as CSV (and because the formats differed very much, one of the two would fail really quickly, so the performance loss would’ve been more or less negligible). In the end I had a combination of such fallback methods and detection of the MIME type to determine which reader to use. Basing reading off file extensions is not truly robust, since you might for example have a CSV file in a txt file, even though the extension for csv files is csv (and using txt is really not that uncommon, especially for Machine Learning data to give one example). Another option is to set a property to the reader you want to use, so it’s less ambiguous to the user.

Peter · March 31, 2020, 10:40am

Thank you.

That's what i like about this idea. For the common user an easy method to get the content of different datafile formats.

Red_Panda · March 31, 2020, 1:28pm

I’ve read all of your comments now. First of all, my sincere condolences @Peter. From what I’ve read, you all prefer an asynchronous and flexible version. This means for me, the way to go is to have an asynchronous function which reads the file’s content and separate methods to convert a string to either list or dictionary. I’m planning to use list of pairs as a fallback if dictionaries are not supported. Does anyone have a better idea?

Red_Panda · April 4, 2020, 5:13pm

I’m also planning to add a simpler xml parser which ignores attributes.

<breakfast_menu version="1.0">
<food>
    <name>Belgian Waffles</name>
    <price>$5.95</price>
    <description>
   Two of our famous Belgian Waffles with plenty of real maple syrup
   </description>
    <calories>650</calories>
</food>
</breakfast_menu>

would then be converted to the YailDictionary

{
  "breakfast_menu": {
    "food": {
      "name": "Belgian Waffles",
      "price": "$5.95",
      "description": "Two of our famous Belgian Waffles with plenty of real maple syrup",
      "calories": "650"
    }
  }
}

mike · April 4, 2020, 5:25pm

@Red_Panda nice

Red_Panda · April 11, 2020, 2:10pm

I’m proud to present a preview of this extension

It currently supports the following formats:
table a.k.a. list of lists

CSV
TSV

key value pairs a.k.a dictionaries (returns a list of pairs if dictionaries are not supported)

XML
JSON
YAML
TOML
INI (Not standardized, might work for .conf and .cfg files aswell)
ICS (currently not working)

I added a few useful functions for working with data

SimpleDictFromXml: Converts an xml string to a dictionary while ignoring attributes
Transpose: Transposes a list of lists

I’m planning on adding support for ods and xls/xlsx files
Here is a demo file datafile.aia (2,9 MB)

nitinseshadri · April 11, 2020, 4:26pm

Does this decode plist files? If your parser looks at the XML DTD then it might make it easier to implement that.

For reference: https://en.wikipedia.org/wiki/Property_list#Format

Red_Panda · April 11, 2020, 4:59pm

No. Is there any standardization of plist?

nitinseshadri · April 16, 2020, 10:13pm

This seems to be an Apple-only format, though it’s a variant of XML. There might be more documentation on Apple’s website.