! Please note that this is a snapshot of our old Bugzilla server, which is read only since May 29, 2020. Please go to gitlab.xfce.org for our new server !
Simplify detection that the selected file is an archive
Status:
RESOLVED: MOVED
Product:
Thunar-archive-plugin
Component:
General

Comments

Description Sergey Ponomarev 2019-09-02 17:53:42 CEST
Sorry guys, this is a big issue which involves the Thunar sources itself.
When clicking right button on an archive file it is shown a context menu with "Extract here" option.
Internally the plugin tries to determine that the selected file is an archive.
To do this the plugin contains a full list of archive MIME types:

static const gchar TAP_MIME_TYPES[][34] = {
  "application/x-7z-compressed",
  "application/x-gtar",
  "application/zip",
   ...
};

The problem here is that in fact there are more compressor algorithms. Recently was published ZStandard (zstd) *.zst and Brotli *.br and we need to support them too.
I wanted to add them but in fact I decided to find a way to get rid of the TAP_MIME_TYPES list at all.
To determine a content type internally Thunar uses GLib which uses the Shared MIME Info library https://www.freedesktop.org/wiki/Software/shared-mime-info/ which is kind of clone of libmagic + icons + translations.
Unfortunately the shared-mime-info doesn't have a clear flag that the type is an archive. But all archive types (tar, zip, gz etc) always have the same icon "package-x-generic".
So we can use the content type's icon name to determine that it is an archive or compressed file:

  generic_icon_name = g_content_type_get_generic_icon_name (content_type);
  file_is_archive = g_strcmp0 (generic_icon_name, "package-x-generic") == 0;

This looks like a hack but it always works and when zst and br or any other will be added to Shared MIME Info then the plugin starts to recognize them.

I already created a patch for Thunar Send To Email tool:
Thunar SendTo Email: improve archives detection
https://bugzilla.xfce.org/show_bug.cgi?id=15917

Please take a look on it first because then it will be easier to understand the changes proposed in this ticket.

Implementation:

First of all I decided that it would be better to extract archive detection from the thunar-archive-plugin into thunarx and create a new function thunarx_file_info_is_archive()
0001-thunarx-expose-a-new-function-thunarx_file_info_is_a.patch (This is a patch for Thunar itself. Do to forget to install the thunarx headers with "cd thunarx; make; make install")

The implementation will just always return FALSE but we'll just call the function from the thunar-archive-plugin. See patches 0001-tap-backend.c-Use-g_list_free_full-where-applicable.patch (refactoring to fix warnings) and 0002-Use-thunarx_file_info_is_archive-instead-of-detectin.patch for thunar-archive-plugin.

You can see the resulted branch on GitHub: https://github.com/stokito/thunar-archive-plugin/tree/thurarx_archive


Then I implemented the thunar_file_is_archive() function in the same manner as I did for send-to-email:

/**
 * thunar_file_is_archive:
 * @file : a #ThunarFile instance.
 *
 * Checks whether @file refers to an archive e.g. tar, zip, gz etc.
 *
 * Return value: %TRUE if @file is an archive.
 **/
gboolean
thunar_file_is_archive (const ThunarFile *file)
{
  const gchar *content_type;
  gchar *generic_icon_name;
  gboolean file_is_archive;

  if (thunar_file_is_directory (file))
    return FALSE;

  content_type = thunar_file_get_content_type (THUNAR_FILE (file));
  if (content_type == NULL)
  {
    return FALSE;
  }

  /* If icon for the content is "package-x-generic" then the type is archive.
   * GLib internally uses a shared-mime-info library to determine a content type and it's description.
   * Unfortunately the shared-mime-info doesn't have a clear flag that the type is an archive.
   * But all archive types (tar, zip, gz etc) always have the same icon "package-x-generic".
   * So we can use the content type's icon name to determine that it is an archive or compressed file.
   */
  generic_icon_name = g_content_type_get_generic_icon_name (content_type);
  if (generic_icon_name == NULL) {
    return FALSE;
  }
  file_is_archive = g_strcmp0 (generic_icon_name, "package-x-generic") == 0;
  g_free (generic_icon_name);
  return file_is_archive;
}

See 0002-thunarx-implement-thunarx_file_info_is_archive.patch

I checked and everything is working just fine!

You can see the resulted branch on GitHub: https://github.com/stokito/thunar/tree/thunarx_archive_simple


In fact there can make a performance improvement.
First of all we can use the same approach as for thunar_file_is_directory() and check only a file flag instead checking the file content type each time on the thunar_file_is_archive() call.
We need to add into ThunarFileFlags bitmask a new value THUNAR_FILE_FLAG_IS_ARCHIVE and then inside of thunar_file_is_archive() we can check FLAG_IS_SET(file, THUNAR_FILE_FLAG_IS_ARCHIVE)

Second improvement is that since we deal with an icon so the same icon is already resolved inside of thunar_file_get_icon_name(). There we call the g_themed_icon_get_names() function which returns a list of all icons. Internally the g_themed_icon_get_names calls the same g_content_type_get_generic_icon_name() as we already used.
So from the returned list we can check that if there is the "package-x-generic" then the file is an archive.

See 0002-thunarx-implement-thunarx_file_info_is_archive-by-ic.patch with the optimized version.

This is a great performance improvement because we already have this icons list and we can do the archive type check almost for free. Yes, it looks even more ugly but in the same time we'll use even less code and it will be faster.



I'll attach patches with the fix and small refactoring as comments because I don't now how to attach multiple patch files:
But you can also see it on github:
thunar-archive-plugin https://github.com/stokito/thunar-archive-plugin/tree/thurarx_archive
thunar simple:  https://github.com/stokito/thunar/tree/thunarx_archive_simple
thunar optimized:  https://github.com/stokito/thunar/tree/thurarx_archive_optimized
Comment 1 Sergey Ponomarev 2019-09-02 17:54:45 CEST
Created attachment 8985 
0001-thunarx-expose-a-new-function-thunarx_file_info_is_a.patch
Comment 2 Sergey Ponomarev 2019-09-02 17:57:09 CEST
Created attachment 8986 
0001-tap-backend.c-Use-g_list_free_full-where-applicable.patch
Comment 3 Sergey Ponomarev 2019-09-02 18:00:27 CEST
Created attachment 8987 
0002-Use-thunarx_file_info_is_archive-instead-of-detectin.patch
Comment 4 Sergey Ponomarev 2019-09-02 18:03:52 CEST
Created attachment 8988 
0002-thunarx-implement-thunarx_file_info_is_archive.patch
Comment 5 Sergey Ponomarev 2019-09-02 18:06:20 CEST
Created attachment 8989 
0002-thunarx-implement-thunarx_file_info_is_archive-by-ic.patch
Comment 6 alexxcons editbugs 2019-09-06 13:59:08 CEST
> First of all I decided that it would be better to extract archive detection from the thunar-archive-plugin into thunarx and
> create a new function thunarx_file_info_is_archive()
Why would it be better ? Sorry, it would be good if you would first talk to me or Andre Miranda before starting to write bigger patches like this one.

So we could agree on a strategy beforehand, and you would save alot of work.

IMO there are some fundamental problems with the patches (please proof me wrong!):
1) If the thunar-sendto-email plugin anyhow cannot make use of "thunarx_file_info_is_archive", so that "thunar-archive-plugin" is the only user of the method, why would you want to have the method in thunar itself at all ? ( Instead of just keeping it in thunar-archive-plugin ? )

2) So far there seems to be no strain to make "isArchive()" more performant. It's only used on the context-menu creation. Afaik the context menu creation does not suffer from slowness. So I would not want to add extra complexity to thunar-file in order to speed up "isArchive()". Some border cases may require to even update the THUNAR_FILE_FLAG_IS_ARCHIVE flag later on (E.g. what about mv myArchive.tar myArchive ? )

3) Even if other other plugins would use the code as well (In that case I would agree on a thunarx API extension) I would not put "isArchive()" into "thunar-file",  but into "thunar-gio-extensions".

Sorry to be a bit conservative here. Time has shown that usually new code brings new bugs and more maintenance work ... so IMO there should be a good reason to add new code to thunar. 

So as long as thunar-sendto-email plugin is not able to use thunarx in order to re-use the same code, I would prefer to just apply the same fix from thunar-sendto-email plugin to thunar-archive-plugin.
Comment 7 Sergey Ponomarev 2019-09-06 21:07:01 CEST
> why would you want to have the method in thunar itself at all ?

My first attempt was to do like this i.e. resolve everything on the plugin side. There was some problems but now I tried again and here is a patch 0001-Simplify-tap_is_archive-and-detect-that-the-file-is-.patch
So yes, looks like we can do this on the plugins side and the fix is simple.

From the other side you can put a debug point there and see that the function is called quite often. For some reason it's even call for each opened directory and for all it's subfolders so by skipping them we can improve performance.
Comment 8 Sergey Ponomarev 2019-09-06 21:07:39 CEST
Created attachment 9010 
0001-Simplify-tap_is_archive-and-detect-that-the-file-is-.patch
Comment 9 alexxcons editbugs 2019-09-08 23:31:20 CEST
Created attachment 9017 
patch which adds missing archive types to list

I would prefer to keep the same logic as for "thunar-sendto-email". For consistency I reused the same char array and the same method (as good as possible)

(TODO: I should add a comment to remember that the same impl. exists in TSE)

I as well checked what would be the effort to use the thunarx API in TSE (I hat duplicated code :F) That would be rather heavy, since the type "ThunarxFileInfo" cannot be used "just like that". It would need to be injected by thunarx.
One would first need to completely change paradigm for the TSE plugin, so that it just provides a service instead of providing its own binary. IMO by far too much effort for the gain.

I have one remaining problem: I fail to get it to work ;) Neither your patch, nor my own patch enables me to see "extract here" on the context menu for my "test.br" file.
Though the "test.br" archive is not shown as archive at all for me .. neither by thunar, nor by nautilus or dolphin.
Do I have to add support for the "*.br" mime type as well at some other, system wide location ?
Comment 10 Andre Miranda editbugs 2019-10-04 05:04:10 CEST
My to 2 cents: simplifying the code is always good, but I'm unwilling to merge the proposed patch which checks the icon name. Sorry, that's too hackish IMO.
Comment 11 Git Bot editbugs 2020-05-20 21:12:02 CEST
-- GitLab Migration Automatic Message --

This bug has been migrated to xfce.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.xfce.org/thunar-plugins/thunar-archive-plugin/-/issues/7.

Please create an account or use an existing account on one of our supported OAuth providers. 

If you want to fork to submit patches and merge requests please continue reading here: https://docs.xfce.org/contribute/dev/git/start#gitlab_forks_and_merge_requests

Also feel free to reach out to us on the mailing list https://mail.xfce.org/mailman/listinfo/xfce4-dev

Bug #15918

Reported by:
Sergey Ponomarev
Reported on: 2019-09-02
Last modified on: 2020-05-20

People

Assignee:
Jannis Pohlmann
CC List:
3 users

Version

Attachments

Additional information