#wikimedia-office log

21:02:51 <gwicke> #startmeeting https://phabricator.wikimedia.org/T164990  | Wikimedia meetings channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/
21:02:51 <wm-labs-meetbot`> Meeting started Wed Jun 14 21:02:51 2017 UTC and is due to finish in 60 minutes.  The chair is gwicke. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:02:51 <wm-labs-meetbot`> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:02:51 <wm-labs-meetbot`> The meeting name has been set to 'https___phabricator_wikimedia_org_t164990___wikimedia_meetings_channel___please_note__channel_is_logged_and_publicly_posted__do_not_remove_this_note____logs__http___bots_wmflabs_org__wm_bot_logs__23wikimedia_office_'
21:03:43 <gwicke> coreyfloyd or mdholloway: could you give a quick summary of what you are trying to do, and what you would like to get out of this meeting today?
21:03:57 <coreyfloyd> gwicke: sure…
21:04:37 <coreyfloyd> Reading wants to create a service for syncing private reading lists first within Android and then iOS and mobile web
21:04:58 <coreyfloyd> For what the feature will look like, you can check out the current Android app
21:05:44 <coreyfloyd> It will be much the same, except will allow users to sync their reading lists (which users use for bookmarking and offline reading) so that they will not lose them if they change or lose their device
21:06:03 <coreyfloyd> mdholloway: do you have a screen shot of the feature to post?
21:06:21 * mdholloway finds one quickly
21:06:32 <coreyfloyd> gwicke: does that answer the question well enough?
21:06:58 <DanielK_WMDE> coreyfloyd: would there be just one list per user, or should the system be designed with multiple lists per person in mind?
21:07:18 <coreyfloyd> DanielK_WMDE: it would be multiple lists per user
21:07:34 <DanielK_WMDE> all private? or public?
21:07:43 <coreyfloyd> Lists are all private
21:07:43 <gwicke> #info Reading wants to create a service for syncing private reading lists first within Android and then iOS and mobile web
21:07:57 <coreyfloyd> And so this would require a user to be authenticated to access their lists
21:07:58 <gwicke> #info it would be multiple lists per user, all private
21:09:03 <gwicke> coreyfloyd: which questions would you especially like to get feedback on / make decisions on?
21:09:11 <coreyfloyd> One major architectural decision we have to make - as laid out on the RFC - is whether we build this within MediaWiki as an extension using RESTBase as a proxy, or we build it as a separate node.js service
21:09:22 <TimStarling> so you have to log in with your wikipedia user account to get reading lists at all?
21:09:43 <dbrant> correct
21:09:47 <coreyfloyd> TimStarling: no, only for syncing purposes
21:09:54 <TimStarling> oh right
21:09:54 <dbrant> ^
21:10:13 <DanielK_WMDE> how would anon lists work?
21:10:14 <coreyfloyd> TimStarling: the feature will continue to work as it does now for those who do not want to log in
21:10:20 <coreyfloyd> DanielK_WMDE: ^
21:10:24 <coreyfloyd> They would be local only
21:11:08 <DanielK_WMDE> so the proposed REST service is only for logged in users, but the client side part would also support local-only lists for anons?
21:11:24 <dbrant> Without logging in, the user can still have reading lists locally. Then the first time they log in, the lists will start being synced.
21:11:25 <coreyfloyd> DanielK_WMDE: correct
21:11:53 <coreyfloyd> DanielK_WMDE: we have to work out UX, but the expectation is that users will need to opt in to syncing in some way
21:11:56 <tgr> for apps yes, for mobile web I don't think we have decided yet
21:12:09 <DanielK_WMDE> are the CRUD operations sufficient for reliable syncing? wouldn't a log be needed?
21:12:12 <tgr> the worry is that storage is much more limited there
21:12:18 <coreyfloyd> tgr: good point
21:12:30 <coreyfloyd> DanielK_WMDE: we have laid out some sync APIs for that purpose
21:12:42 <coreyfloyd> Basically getting “changes since"
21:12:42 <DanielK_WMDE> #info <dbrant> Without logging in, the user can still have reading lists locally. Then the first time they log in, the lists will start being synced.
21:13:21 <gwicke> DanielK_WMDE: I am guessing this would use If-Match, and resolve conflicts clients side
21:13:22 <DanielK_WMDE> ah, update timestamp and soft delete...
21:13:35 <coreyfloyd> yep
21:13:35 <tgr> the GET /lists/reading/changes/since/{date} endpoint would be used for sync
21:13:45 <TimStarling> can the feature be extended to the desktop site?
21:13:47 <mdholloway> screenshots https://usercontent.irccloud-cdn.com/file/BZhR0mSq/Screenshot_20170614-171133.png https://usercontent.irccloud-cdn.com/file/Y2qMFmM0/Screenshot_20170614-171139.png
21:13:55 <coreyfloyd> TimStarling: yes it can be
21:14:09 <coreyfloyd> TimStarling: we wanted to start out with the Android client first as a testing ground
21:14:19 <TimStarling> ok
21:14:32 <coreyfloyd> And then we would do some analysis then move to iOS, then mobile web, and then desktop
21:14:42 <DanielK_WMDE> coreyfloyd: i'm blurry on sync algorithems. this sounds like it can work, but could get interresting in the nitty gritte. is this following a standard sync strategy?
21:15:13 <DanielK_WMDE> does soft delete mean rows never get deleted?
21:15:34 <coreyfloyd> DanielK_WMDE: basically just a standard time stamp syncing strategy… clients keep a time stamp of the last sync, and must pass it to get changes since the last sync
21:15:34 <Scott_WUaS> (coreyfloyd: What are your plans for Wikipedia's/Wikidata's 358 languages here?)
21:15:53 <gwicke> DanielK_WMDE: it's basically turned into a tombstone
21:16:01 <coreyfloyd> DanielK_WMDE: soft deletes will need to be cleaned up later…
21:16:12 <DanielK_WMDE> gwicke: and the tombstones are forever
21:16:22 <coreyfloyd> We haven’t decided on the strategy, it could be after a period of time - kill after 30 days
21:16:36 <gwicke> the overall strategy looks quite standard
21:16:52 <coreyfloyd> DanielK_WMDE: basically if a client hasn’t synced in a while, then it will be required do a full sync
21:16:53 <DanielK_WMDE> coreyfloyd: sounds a bit unreliable
21:17:03 <mdholloway> Scott_WUaS: these lists should not be wiki-specific; they can contain pages from any number of projects.
21:17:09 <coreyfloyd> DanielK_WMDE: can you be more specific in how it would be unreliable?
21:17:10 <DanielK_WMDE> a full sync means losing all local changes
21:17:48 <coreyfloyd> DanielK_WMDE: to be clear… and this has not been decided… a full sync would not be about losing changes, but since it would have been a disconnected client for so long, it is best to do a full sync
21:18:11 <coreyfloyd> DanielK_WMDE: fully syncing does not preclude doing a merge of local changes
21:18:16 <DanielK_WMDE> coreyfloyd: if i have unsynced local changes older than 30 days, i might loose them. I don't mean to pick on the details here. my concern is the storage schema
21:18:19 <gwicke> you can probably tell when there were local changes that weren't synced yet
21:18:26 <tgr> you can still use timestamps to merge on a full sync
21:18:43 <DanielK_WMDE> it seems the timestamp/aoft delete is baked in. i wonder if a log/event based storage should be considered instead.
21:18:46 <coreyfloyd> Scott_WUaS: as far as all languages, mdholloway answered… these are cross wiki
21:18:46 <DanielK_WMDE> but let's move on
21:19:24 <coreyfloyd> Scott_WUaS: so this will support all languages… and potentially other projects
21:20:00 <DanielK_WMDE> #info the timestamp/soft delete is baked into the storage model. perhaps a log/event based sync should also be considered.
21:20:01 <gwicke> #info <daniel> it seems the timestamp/aoft delete is baked in. i wonder if a log/event based storage should be considered instead.
21:20:18 <gwicke> also, beware of race conditions
21:20:20 <DanielK_WMDE> gwicke: hehe
21:21:13 <coreyfloyd> Any thoughts on the 2 options in the RFC?
21:21:15 <tgr> this is private data so races are not very likely
21:21:30 <tgr> unless the user has a phone in each hand
21:21:38 <coreyfloyd> Tgr: lol
21:21:50 <gwicke> tgr: my joke was more about me & daniel info-ing the same thing than the RFC ;)
21:21:53 <coreyfloyd> Maybe not too uncommon nowadays
21:21:59 <TimStarling> option 1 seems fine to me
21:22:13 <TimStarling> I'm skeptical about RESTBase becoming the new enormous monolithic blob
21:22:26 <TimStarling> it can be a path router, that's simple, easy to maintain
21:22:44 <gwicke> that's the idea, I believe
21:22:55 <coreyfloyd> Yeah for proposal 1
21:23:07 <TimStarling> and it seems like jaime favoured option 1?
21:23:09 <coreyfloyd> The only other work from RESTBase would be injection of summaries
21:23:09 <DanielK_WMDE> tgr: a laptop and a phone and a tablet, all syncing periodically, when they have connectivity. used phone and tablet on the plane, both go online again at the same time...
21:23:11 <gwicke> I think somehow "RESTBase service" has become a synonym for "node service"
21:23:24 <DanielK_WMDE> tgr: otoh, offline utility of the app is limited ;)
21:23:40 <coreyfloyd> gwicke: I think you are right… I do that myself
21:24:03 <gwicke> to clarify, option 2 is proposing a stand-alone service, right?
21:24:22 <gwicke> proxied through RB, which would do auth & summary hydration
21:24:30 <TimStarling> jcrespo said "This has to be integrated into mediawiki or other existing service, as we will not have hw available for proxies or other middleware to have a dedicated service, specially at the beginning."
21:24:40 <mdholloway> that's right.  just using MediaWiki for the authentication layer.
21:24:49 <tgr> TimStarling: IIRC he did not have a strong opinion but predicted a larger hardware need for 2
21:25:16 <tgr> TimStarling: https://phabricator.wikimedia.org/T164805 has more on the subject
21:25:17 <coreyfloyd> gwicke: yes… it would exist outside of MediaWiki, but use MediaWIki for authentication. And how the service would access the DB is still under discussion
21:26:05 <gwicke> #info option 2 is proposing a stand-alone node service, proxied through RB, which would do auth & summary hydration
21:26:24 <DanielK_WMDE> coreyfloyd: why sql for storage? just because it's there and we know it?
21:26:29 <DanielK_WMDE> good reasons, for sure.
21:26:40 <DanielK_WMDE> but are there others? are there tempting alternatives?
21:26:50 <coreyfloyd> DanielK_WMDE: large queries, known performance and stability characteristics
21:27:04 <gwicke> range queries are important for this use case
21:27:13 <TimStarling> the task description says "Reading lists contain primary data (cannot be regenerated from other sources and losing it would have a major UX impact), and data needs to be fetched based on criteria other than the id (e.g. all lists containing a given page, all entries which have changed after a given date) so MariaDB will be used"
21:27:22 <gwicke> which don't scale as well in some of the distributed alternatives
21:27:37 <coreyfloyd> DanielK_WMDE: we did talk about Cassandra but did not seem to be a good fit
21:28:01 <DanielK_WMDE> range queries are a good point, yea
21:28:56 <gwicke> this service is also not as critical, so doesn't necessarily need to be active-active right away
21:29:58 <coreyfloyd> DanielK_WMDE: we also didn’t want to just store the data as a JSON blob
21:30:20 <coreyfloyd> We want to have each entry be a row, so we can do more interesting queries on the data
21:30:26 <TimStarling> if you stored it as a JSON blob you could have combined it with T128602
21:30:26 <stashbot> T128602: Create and deploy an extension that implements an authenticated key-value store. - https://phabricator.wikimedia.org/T128602
21:31:15 <coreyfloyd> TimStarling: we looked at that, but then you lose granularity - for instance if we need to do push notifications, how do we know which users have which pages in a reading list?
21:31:24 <DanielK_WMDE> one big json blob would be bad. but how about one per list entry? not as a page, of course.
21:31:54 <coreyfloyd> Similar issues… if we want o push changes to users when a page changes, then there is no sensible way to query that information
21:31:56 <DanielK_WMDE> that would make it seasy to add meta-data later. personal notes, last viewed, etc
21:32:32 <coreyfloyd> One part of this that isn’t clear, is that the clients use reading lists as a means to manage offline pages
21:32:39 <DanielK_WMDE> coreyfloyd: oh, i don't want to get rid of the db table. just have a blob in one of the fields.
21:32:49 <DanielK_WMDE> that makes it easy to add fields that we don't need to query by
21:33:02 <coreyfloyd> So an important part of having offline pages is updating them when the page changes
21:33:25 <tgr> TimStarling: also the feedback on that RfC was that people would prefer a dedicated API
21:33:26 <coreyfloyd> So being able to tell which clients need to be notified when the article “dog” changes becomes important
21:33:30 <tgr> some feedback at least
21:33:34 <Scott_WUaS> (Thanks)
21:33:46 <gwicke> #info <Tim> if you stored it as a JSON blob you could have combined it with T128602
21:33:46 <tgr> (I seem to remember brion)
21:34:05 <TimStarling> coreyfloyd and tgr will be the implementors?
21:34:16 <coreyfloyd> DanielK_WMDE: some fields could be in a blob if needed… just not the urls
21:34:24 <coreyfloyd> TimStarling: tgr would be implementing
21:34:40 <DanielK_WMDE> #idea for extensibility, have a field for a JSON blob in the reading_list_entry table
21:34:54 <TimStarling> right, and tgr prefers option 1, judging by the task description?
21:34:55 <DanielK_WMDE> coreyfloyd: sure
21:35:05 <tgr> yeah
21:35:18 <coreyfloyd> DanielK_WMDE: yeah that makes sense… we have some meta that will probably get added as clients need it over time
21:35:53 <DanielK_WMDE> coreyfloyd: querying "pages on this list that changed since X" turns this into a watchlist. that opens a pretty big can of worms.
21:35:54 <tgr> we had a rough consensus on 1 in the Reading Infra / Services discussion I think, but were sufficently unsure to pose it as a question
21:36:03 <coreyfloyd> TimStarling: yeah… I think that it seems easier to do as option 1, but wanted to hear ideas here as well
21:36:47 <coreyfloyd> I believe gwicke (or someone on services) was talking about using this as an opportunity to extract a library that can access MariaDB directly without MediaWIki
21:37:21 <gwicke> that was just jaime & me talking about mysql proxying etc
21:37:34 <coreyfloyd> DanielK_WMDE: yeah - I think “any list of URLs” can bee seen as a watchlist at some level
21:37:48 <DanielK_WMDE> only if you need to join against recentchanges
21:37:53 <DanielK_WMDE> that's the crux here
21:38:28 <coreyfloyd> DanielK_WMDE: changes for now are mostly about  adds/removes from the list
21:38:29 <DanielK_WMDE> or, more broadly: change propagation infrastructure
21:38:40 <legoktm> does this thing need to handle page moves?
21:38:54 <DanielK_WMDE> coreyfloyd: no, i mean edits to pages on your list. so you can update offline pages.
21:39:16 <gwicke> from a testing and fault isolation perspective a stand-alone has some advantages, but I think in this case there are a lot of pragmatic reasons to add this to MW
21:39:39 <coreyfloyd> DanielK_WMDE: I wouldn’t expect that to necessarily live in the reading list service
21:39:59 <mobrovac> agreed, conceptually option #2 is better, but involves a lot more work
21:40:06 <coreyfloyd> DanielK_WMDE: some other service may want to query reading lists to see if a particular user needs notified
21:40:44 <tgr> DanielK_WMDE: originally we were hoping to put it on x1 to support such joins, but jaime said it would not be a good idea
21:40:57 <TimStarling> ok, so I will propose that you remove option 2 and we put the RFC to last call in that form
21:40:58 <coreyfloyd> legoktm: I think that page moves are ok as long as we have the redirect follow
21:41:04 <tgr> but that was more of a "would be nice" thing, not a planned use case
21:41:55 <tgr> there are a bunch of "TBD" marks in the RfC, if someone would like to advise on them
21:42:05 <tgr> small things
21:42:16 <gwicke> #info <gabriel> from a testing and fault isolation perspective a stand-alone service has some advantages, but I think in this case there are a lot of pragmatic reasons to add this to MW
21:42:25 <tgr> (mostly DBA questions though)
21:42:35 <DanielK_WMDE> coreyfloyd: yea, some change propagation, somewhere. it's tricky to do at scale. but that's out of scope of this rfc, i suppose
21:42:35 <gwicke> #info <mobrovac> agreed, conceptually option #2 is better, but involves a lot more work
21:42:49 <gwicke> #info <Tim> ok, so I will propose that you remove option 2 and we put the RFC to last call in that form
21:42:49 <gwicke> C
21:43:47 <gwicke> any objections to TimStarling's proposal?
21:43:53 <coreyfloyd> nope
21:44:07 <gwicke> you can also move it to an "also considered" section
21:44:09 <DanielK_WMDE> #info <DanielK_WMDE> querying "pages on this list that changed since X" turns this into a watchlist; <coreyfloyd> some other service may want to query reading lists to see if a particular user needs notified
21:44:23 <gwicke> just so that it's clear which other options were considered later
21:44:39 <coreyfloyd> DanielK_WMDE: yeah… its mostly forward looking… and just a reason we want to make sure we can query the individual pages later on for such things
21:45:14 <Scott_WUaS> (coreyfloyd: and mdholloway: any way to plan for or anticipate translation between wikipedia's 358 languages at this early stage and especially re querying ... re syncing private reading lists first within Android and then iOS and mobile web?)
21:45:58 <DanielK_WMDE> the goal is to make this available on the app first, on via the web interface later?
21:45:58 <coreyfloyd> Scott_WUaS: translation of what content?
21:46:29 <Scott_WUaS> (coreyfloyd: what will emerge in the reading lists)
21:46:39 <mdholloway> DanielK_WMDE: that's correct; and perhaps only tentatively on the web (coreyfloyd probably knows better about their plans)
21:46:57 <coreyfloyd> DanielK_WMDE: yes… Android, then iOS, then mobile web, then web - this gives us a reasonable ramp up of users and allows us to vet it from the project from both performance and product perspectives
21:47:25 <coreyfloyd> Scott_WUaS: I’m not sure I quite understand the question
21:47:44 <coreyfloyd> Scott_WUaS: there will be some UI in the apps/web interface that will go through the normal translate wiki process
21:47:44 <gwicke> #action Corey will update the RFC to make it clear that Option 1 is proposed
21:48:00 <DanielK_WMDE> the product perspecitive is probably quite different on the web interface. people have browser bookmarks, editors have watchlists and user pages
21:48:04 <coreyfloyd> Scott_WUaS: is that what you are asking about?
21:48:20 <Scott_WUaS> (coreyfloyd: I'll check out further https://phabricator.wikimedia.org/T164805 re what you mean by reading lists - thanks)
21:48:38 <coreyfloyd> DanielK_WMDE: yeah… Reading Lists are being heavily investigated by the product team in Readers currently
21:48:46 <coreyfloyd> And the designers…
21:49:08 <coreyfloyd> Rita ho just did a survey: https://goo.gl/nC5NpX
21:49:11 <DanielK_WMDE> why bake "reading" into the name, btw?
21:49:27 <Scott_WUaS> (re Reading Lists - https://phabricator.wikimedia.org/T164990)
21:49:30 <DanielK_WMDE> nothing in the functionality suggests "reading". just lists of pages
21:49:38 <DanielK_WMDE> could be my "deletion list" ;)
21:50:05 <gwicke> "killfile"
21:50:07 <mdholloway> :)
21:50:16 <mdholloway> just an artifact of the first planned use case, i think
21:50:17 <DanielK_WMDE> hehehe
21:50:19 <coreyfloyd> DanielK_WMDE: yeah it was specifically to scope this to reading
21:50:26 <coreyfloyd> And not watch lists for instance
21:51:06 <DanielK_WMDE> mdholloway: i'd prefer not to have "reading" all the service names and tables... "page lists"?
21:51:07 <coreyfloyd> DanielK_WMDE: we are not sure that this would be the ideal infrastructure for other types of lists
21:51:42 <coreyfloyd> DanielK_WMDE: however, the route naming exists to not preclude the option of having other types of lists (lists/reading/…)
21:51:55 <coreyfloyd> DanielK_WMDE: but I am not against removing it
21:52:47 <gwicke> I guess there is a namespacing concern as well; if this service was to be used for all kinds of other lists, you'd need to be able to distinguish those from the reading lists
21:53:01 <TimStarling> I'm fine with reading lists, being specific makes it easier to add features
21:53:10 <Zppix> Why not let users customize the name?
21:53:34 <coreyfloyd> Zppix: users can customize the name of lists
21:53:51 <TimStarling> the MW extension could contain both the API for the mobile apps, and also the UI for the websites
21:54:17 <coreyfloyd> Zppix: this is just mostly about the name of the routes - and the name of the service itself
21:54:28 <Zppix> coreyfloyd:  oh i see my bad
21:54:29 <coreyfloyd> TimStarling: yeah… that is a possibility for sure
21:54:59 <Zppix> why not incorp this into mw itself?
21:55:06 <tgr> TimStarling: web (mobile at least) is moving towards relying on APIs instead of the MW skin
21:55:28 <gwicke> coreyfloyd: speaking of other list-of-title use cases, what is the plan for collections?
21:55:33 <TimStarling> if you make it generic then you have to imagine all possible use cases when you make b/c breaking changes
21:55:46 <tgr> where that effort will be at by the time reading lists reach the web interface (planned for Q4-ish) is an open question
21:56:00 <gwicke> (collections is the print-many-pages-to-PDF feature)
21:56:52 <gwicke> https://www.mediawiki.org/wiki/Extension:Collection
21:56:57 <coreyfloyd> gwicke: you mean for how collections relate to lists?
21:57:17 <gwicke> yeah, both need lists of pages
21:57:31 <gwicke> and some people might want to print their reading list.. just speculating
21:57:34 <TimStarling> OCG seems to be a favourite punching bag at the moment
21:57:35 <tgr> TimStarling: we considered building it on top of some kind of generic list feature in MediaWiki core but then decided for rule of three
21:57:38 <coreyfloyd> gwicke: ahh… well they are separate but being thought of…
21:57:45 <TimStarling> as an example of an unmaintained service
21:57:51 <tgr> not enough use cases / impementations yet to generalize
21:57:57 <coreyfloyd> gwicke: collections are public and lists are private… so we have talked about being able to convert between the 2
21:58:26 <TimStarling> and there were vague plans for sunsetting it, but it wasn't clear what that would mean for Collection
21:58:29 <coreyfloyd> gwicke: also we have talked about having a “make a pdf” button for reading lists that will use the collection extension
21:58:42 <gwicke> coreyfloyd: makes sense; some kind of unguessable list UUID could help with that, I guess
21:58:58 <coreyfloyd> TimStarling: yeah… I am not the expert on OCG, but it is being sunset and replaced with something else… that research is in process now
21:59:17 <TimStarling> perhaps OCG would be replaced by browser automation?
21:59:27 <tgr> TimStarling: current plan is to keep (refactor, hopefully) Collection and switch out OCG to Electron (or wkhtmltopdf, decision still pending)
21:59:39 <DanielK_WMDE> tgr: three, like watchlists, collections, and now reading lists?...
21:59:44 <gwicke> something browser-based, in any case
22:00:18 <tgr> DanielK_WMDE: well, Collection never had any serious list support
22:00:20 <gwicke> okay, we should wrap up soon
22:00:33 <tgr> session and wikitext hacks
22:00:47 <coreyfloyd> DanielK_WMDE: how these 3 work together and what can be eventually decommissioned is being looked at… they do have different use cases… but maybe we can get them to the same backend
22:01:06 <gwicke> I didn't hear any objections against Tim's proposal, and I think there were no major objections raised here that would make a last call inappropriate
22:01:20 <DanielK_WMDE> sgtm
22:01:26 <coreyfloyd> 👍
22:01:54 <gwicke> #agreed After changes to call out Option 1 as proposed, this RFC will enter its Final Comment Period.
22:02:43 <gwicke> final chance to add anything to the log..
22:02:50 <Zppix> Nope i agree with tim
22:03:08 <gwicke> #endmeeting