21:02:51 <gwicke> #startmeeting https://phabricator.wikimedia.org/T164990 | Wikimedia meetings channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ 21:02:51 <wm-labs-meetbot`> Meeting started Wed Jun 14 21:02:51 2017 UTC and is due to finish in 60 minutes. The chair is gwicke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:02:51 <wm-labs-meetbot`> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:02:51 <wm-labs-meetbot`> The meeting name has been set to 'https___phabricator_wikimedia_org_t164990___wikimedia_meetings_channel___please_note__channel_is_logged_and_publicly_posted__do_not_remove_this_note____logs__http___bots_wmflabs_org__wm_bot_logs__23wikimedia_office_' 21:03:43 <gwicke> coreyfloyd or mdholloway: could you give a quick summary of what you are trying to do, and what you would like to get out of this meeting today? 21:03:57 <coreyfloyd> gwicke: sure… 21:04:37 <coreyfloyd> Reading wants to create a service for syncing private reading lists first within Android and then iOS and mobile web 21:04:58 <coreyfloyd> For what the feature will look like, you can check out the current Android app 21:05:44 <coreyfloyd> It will be much the same, except will allow users to sync their reading lists (which users use for bookmarking and offline reading) so that they will not lose them if they change or lose their device 21:06:03 <coreyfloyd> mdholloway: do you have a screen shot of the feature to post? 21:06:21 * mdholloway finds one quickly 21:06:32 <coreyfloyd> gwicke: does that answer the question well enough? 21:06:58 <DanielK_WMDE> coreyfloyd: would there be just one list per user, or should the system be designed with multiple lists per person in mind? 21:07:18 <coreyfloyd> DanielK_WMDE: it would be multiple lists per user 21:07:34 <DanielK_WMDE> all private? or public? 21:07:43 <coreyfloyd> Lists are all private 21:07:43 <gwicke> #info Reading wants to create a service for syncing private reading lists first within Android and then iOS and mobile web 21:07:57 <coreyfloyd> And so this would require a user to be authenticated to access their lists 21:07:58 <gwicke> #info it would be multiple lists per user, all private 21:09:03 <gwicke> coreyfloyd: which questions would you especially like to get feedback on / make decisions on? 21:09:11 <coreyfloyd> One major architectural decision we have to make - as laid out on the RFC - is whether we build this within MediaWiki as an extension using RESTBase as a proxy, or we build it as a separate node.js service 21:09:22 <TimStarling> so you have to log in with your wikipedia user account to get reading lists at all? 21:09:43 <dbrant> correct 21:09:47 <coreyfloyd> TimStarling: no, only for syncing purposes 21:09:54 <TimStarling> oh right 21:09:54 <dbrant> ^ 21:10:13 <DanielK_WMDE> how would anon lists work? 21:10:14 <coreyfloyd> TimStarling: the feature will continue to work as it does now for those who do not want to log in 21:10:20 <coreyfloyd> DanielK_WMDE: ^ 21:10:24 <coreyfloyd> They would be local only 21:11:08 <DanielK_WMDE> so the proposed REST service is only for logged in users, but the client side part would also support local-only lists for anons? 21:11:24 <dbrant> Without logging in, the user can still have reading lists locally. Then the first time they log in, the lists will start being synced. 21:11:25 <coreyfloyd> DanielK_WMDE: correct 21:11:53 <coreyfloyd> DanielK_WMDE: we have to work out UX, but the expectation is that users will need to opt in to syncing in some way 21:11:56 <tgr> for apps yes, for mobile web I don't think we have decided yet 21:12:09 <DanielK_WMDE> are the CRUD operations sufficient for reliable syncing? wouldn't a log be needed? 21:12:12 <tgr> the worry is that storage is much more limited there 21:12:18 <coreyfloyd> tgr: good point 21:12:30 <coreyfloyd> DanielK_WMDE: we have laid out some sync APIs for that purpose 21:12:42 <coreyfloyd> Basically getting “changes since" 21:12:42 <DanielK_WMDE> #info <dbrant> Without logging in, the user can still have reading lists locally. Then the first time they log in, the lists will start being synced. 21:13:21 <gwicke> DanielK_WMDE: I am guessing this would use If-Match, and resolve conflicts clients side 21:13:22 <DanielK_WMDE> ah, update timestamp and soft delete... 21:13:35 <coreyfloyd> yep 21:13:35 <tgr> the GET /lists/reading/changes/since/{date} endpoint would be used for sync 21:13:45 <TimStarling> can the feature be extended to the desktop site? 21:13:47 <mdholloway> screenshots https://usercontent.irccloud-cdn.com/file/BZhR0mSq/Screenshot_20170614-171133.png https://usercontent.irccloud-cdn.com/file/Y2qMFmM0/Screenshot_20170614-171139.png 21:13:55 <coreyfloyd> TimStarling: yes it can be 21:14:09 <coreyfloyd> TimStarling: we wanted to start out with the Android client first as a testing ground 21:14:19 <TimStarling> ok 21:14:32 <coreyfloyd> And then we would do some analysis then move to iOS, then mobile web, and then desktop 21:14:42 <DanielK_WMDE> coreyfloyd: i'm blurry on sync algorithems. this sounds like it can work, but could get interresting in the nitty gritte. is this following a standard sync strategy? 21:15:13 <DanielK_WMDE> does soft delete mean rows never get deleted? 21:15:34 <coreyfloyd> DanielK_WMDE: basically just a standard time stamp syncing strategy… clients keep a time stamp of the last sync, and must pass it to get changes since the last sync 21:15:34 <Scott_WUaS> (coreyfloyd: What are your plans for Wikipedia's/Wikidata's 358 languages here?) 21:15:53 <gwicke> DanielK_WMDE: it's basically turned into a tombstone 21:16:01 <coreyfloyd> DanielK_WMDE: soft deletes will need to be cleaned up later… 21:16:12 <DanielK_WMDE> gwicke: and the tombstones are forever 21:16:22 <coreyfloyd> We haven’t decided on the strategy, it could be after a period of time - kill after 30 days 21:16:36 <gwicke> the overall strategy looks quite standard 21:16:52 <coreyfloyd> DanielK_WMDE: basically if a client hasn’t synced in a while, then it will be required do a full sync 21:16:53 <DanielK_WMDE> coreyfloyd: sounds a bit unreliable 21:17:03 <mdholloway> Scott_WUaS: these lists should not be wiki-specific; they can contain pages from any number of projects. 21:17:09 <coreyfloyd> DanielK_WMDE: can you be more specific in how it would be unreliable? 21:17:10 <DanielK_WMDE> a full sync means losing all local changes 21:17:48 <coreyfloyd> DanielK_WMDE: to be clear… and this has not been decided… a full sync would not be about losing changes, but since it would have been a disconnected client for so long, it is best to do a full sync 21:18:11 <coreyfloyd> DanielK_WMDE: fully syncing does not preclude doing a merge of local changes 21:18:16 <DanielK_WMDE> coreyfloyd: if i have unsynced local changes older than 30 days, i might loose them. I don't mean to pick on the details here. my concern is the storage schema 21:18:19 <gwicke> you can probably tell when there were local changes that weren't synced yet 21:18:26 <tgr> you can still use timestamps to merge on a full sync 21:18:43 <DanielK_WMDE> it seems the timestamp/aoft delete is baked in. i wonder if a log/event based storage should be considered instead. 21:18:46 <coreyfloyd> Scott_WUaS: as far as all languages, mdholloway answered… these are cross wiki 21:18:46 <DanielK_WMDE> but let's move on 21:19:24 <coreyfloyd> Scott_WUaS: so this will support all languages… and potentially other projects 21:20:00 <DanielK_WMDE> #info the timestamp/soft delete is baked into the storage model. perhaps a log/event based sync should also be considered. 21:20:01 <gwicke> #info <daniel> it seems the timestamp/aoft delete is baked in. i wonder if a log/event based storage should be considered instead. 21:20:18 <gwicke> also, beware of race conditions 21:20:20 <DanielK_WMDE> gwicke: hehe 21:21:13 <coreyfloyd> Any thoughts on the 2 options in the RFC? 21:21:15 <tgr> this is private data so races are not very likely 21:21:30 <tgr> unless the user has a phone in each hand 21:21:38 <coreyfloyd> Tgr: lol 21:21:50 <gwicke> tgr: my joke was more about me & daniel info-ing the same thing than the RFC ;) 21:21:53 <coreyfloyd> Maybe not too uncommon nowadays 21:21:59 <TimStarling> option 1 seems fine to me 21:22:13 <TimStarling> I'm skeptical about RESTBase becoming the new enormous monolithic blob 21:22:26 <TimStarling> it can be a path router, that's simple, easy to maintain 21:22:44 <gwicke> that's the idea, I believe 21:22:55 <coreyfloyd> Yeah for proposal 1 21:23:07 <TimStarling> and it seems like jaime favoured option 1? 21:23:09 <coreyfloyd> The only other work from RESTBase would be injection of summaries 21:23:09 <DanielK_WMDE> tgr: a laptop and a phone and a tablet, all syncing periodically, when they have connectivity. used phone and tablet on the plane, both go online again at the same time... 21:23:11 <gwicke> I think somehow "RESTBase service" has become a synonym for "node service" 21:23:24 <DanielK_WMDE> tgr: otoh, offline utility of the app is limited ;) 21:23:40 <coreyfloyd> gwicke: I think you are right… I do that myself 21:24:03 <gwicke> to clarify, option 2 is proposing a stand-alone service, right? 21:24:22 <gwicke> proxied through RB, which would do auth & summary hydration 21:24:30 <TimStarling> jcrespo said "This has to be integrated into mediawiki or other existing service, as we will not have hw available for proxies or other middleware to have a dedicated service, specially at the beginning." 21:24:40 <mdholloway> that's right. just using MediaWiki for the authentication layer. 21:24:49 <tgr> TimStarling: IIRC he did not have a strong opinion but predicted a larger hardware need for 2 21:25:16 <tgr> TimStarling: https://phabricator.wikimedia.org/T164805 has more on the subject 21:25:17 <coreyfloyd> gwicke: yes… it would exist outside of MediaWiki, but use MediaWIki for authentication. And how the service would access the DB is still under discussion 21:26:05 <gwicke> #info option 2 is proposing a stand-alone node service, proxied through RB, which would do auth & summary hydration 21:26:24 <DanielK_WMDE> coreyfloyd: why sql for storage? just because it's there and we know it? 21:26:29 <DanielK_WMDE> good reasons, for sure. 21:26:40 <DanielK_WMDE> but are there others? are there tempting alternatives? 21:26:50 <coreyfloyd> DanielK_WMDE: large queries, known performance and stability characteristics 21:27:04 <gwicke> range queries are important for this use case 21:27:13 <TimStarling> the task description says "Reading lists contain primary data (cannot be regenerated from other sources and losing it would have a major UX impact), and data needs to be fetched based on criteria other than the id (e.g. all lists containing a given page, all entries which have changed after a given date) so MariaDB will be used" 21:27:22 <gwicke> which don't scale as well in some of the distributed alternatives 21:27:37 <coreyfloyd> DanielK_WMDE: we did talk about Cassandra but did not seem to be a good fit 21:28:01 <DanielK_WMDE> range queries are a good point, yea 21:28:56 <gwicke> this service is also not as critical, so doesn't necessarily need to be active-active right away 21:29:58 <coreyfloyd> DanielK_WMDE: we also didn’t want to just store the data as a JSON blob 21:30:20 <coreyfloyd> We want to have each entry be a row, so we can do more interesting queries on the data 21:30:26 <TimStarling> if you stored it as a JSON blob you could have combined it with T128602 21:30:26 <stashbot> T128602: Create and deploy an extension that implements an authenticated key-value store. - https://phabricator.wikimedia.org/T128602 21:31:15 <coreyfloyd> TimStarling: we looked at that, but then you lose granularity - for instance if we need to do push notifications, how do we know which users have which pages in a reading list? 21:31:24 <DanielK_WMDE> one big json blob would be bad. but how about one per list entry? not as a page, of course. 21:31:54 <coreyfloyd> Similar issues… if we want o push changes to users when a page changes, then there is no sensible way to query that information 21:31:56 <DanielK_WMDE> that would make it seasy to add meta-data later. personal notes, last viewed, etc 21:32:32 <coreyfloyd> One part of this that isn’t clear, is that the clients use reading lists as a means to manage offline pages 21:32:39 <DanielK_WMDE> coreyfloyd: oh, i don't want to get rid of the db table. just have a blob in one of the fields. 21:32:49 <DanielK_WMDE> that makes it easy to add fields that we don't need to query by 21:33:02 <coreyfloyd> So an important part of having offline pages is updating them when the page changes 21:33:25 <tgr> TimStarling: also the feedback on that RfC was that people would prefer a dedicated API 21:33:26 <coreyfloyd> So being able to tell which clients need to be notified when the article “dog” changes becomes important 21:33:30 <tgr> some feedback at least 21:33:34 <Scott_WUaS> (Thanks) 21:33:46 <gwicke> #info <Tim> if you stored it as a JSON blob you could have combined it with T128602 21:33:46 <tgr> (I seem to remember brion) 21:34:05 <TimStarling> coreyfloyd and tgr will be the implementors? 21:34:16 <coreyfloyd> DanielK_WMDE: some fields could be in a blob if needed… just not the urls 21:34:24 <coreyfloyd> TimStarling: tgr would be implementing 21:34:40 <DanielK_WMDE> #idea for extensibility, have a field for a JSON blob in the reading_list_entry table 21:34:54 <TimStarling> right, and tgr prefers option 1, judging by the task description? 21:34:55 <DanielK_WMDE> coreyfloyd: sure 21:35:05 <tgr> yeah 21:35:18 <coreyfloyd> DanielK_WMDE: yeah that makes sense… we have some meta that will probably get added as clients need it over time 21:35:53 <DanielK_WMDE> coreyfloyd: querying "pages on this list that changed since X" turns this into a watchlist. that opens a pretty big can of worms. 21:35:54 <tgr> we had a rough consensus on 1 in the Reading Infra / Services discussion I think, but were sufficently unsure to pose it as a question 21:36:03 <coreyfloyd> TimStarling: yeah… I think that it seems easier to do as option 1, but wanted to hear ideas here as well 21:36:47 <coreyfloyd> I believe gwicke (or someone on services) was talking about using this as an opportunity to extract a library that can access MariaDB directly without MediaWIki 21:37:21 <gwicke> that was just jaime & me talking about mysql proxying etc 21:37:34 <coreyfloyd> DanielK_WMDE: yeah - I think “any list of URLs” can bee seen as a watchlist at some level 21:37:48 <DanielK_WMDE> only if you need to join against recentchanges 21:37:53 <DanielK_WMDE> that's the crux here 21:38:28 <coreyfloyd> DanielK_WMDE: changes for now are mostly about adds/removes from the list 21:38:29 <DanielK_WMDE> or, more broadly: change propagation infrastructure 21:38:40 <legoktm> does this thing need to handle page moves? 21:38:54 <DanielK_WMDE> coreyfloyd: no, i mean edits to pages on your list. so you can update offline pages. 21:39:16 <gwicke> from a testing and fault isolation perspective a stand-alone has some advantages, but I think in this case there are a lot of pragmatic reasons to add this to MW 21:39:39 <coreyfloyd> DanielK_WMDE: I wouldn’t expect that to necessarily live in the reading list service 21:39:59 <mobrovac> agreed, conceptually option #2 is better, but involves a lot more work 21:40:06 <coreyfloyd> DanielK_WMDE: some other service may want to query reading lists to see if a particular user needs notified 21:40:44 <tgr> DanielK_WMDE: originally we were hoping to put it on x1 to support such joins, but jaime said it would not be a good idea 21:40:57 <TimStarling> ok, so I will propose that you remove option 2 and we put the RFC to last call in that form 21:40:58 <coreyfloyd> legoktm: I think that page moves are ok as long as we have the redirect follow 21:41:04 <tgr> but that was more of a "would be nice" thing, not a planned use case 21:41:55 <tgr> there are a bunch of "TBD" marks in the RfC, if someone would like to advise on them 21:42:05 <tgr> small things 21:42:16 <gwicke> #info <gabriel> from a testing and fault isolation perspective a stand-alone service has some advantages, but I think in this case there are a lot of pragmatic reasons to add this to MW 21:42:25 <tgr> (mostly DBA questions though) 21:42:35 <DanielK_WMDE> coreyfloyd: yea, some change propagation, somewhere. it's tricky to do at scale. but that's out of scope of this rfc, i suppose 21:42:35 <gwicke> #info <mobrovac> agreed, conceptually option #2 is better, but involves a lot more work 21:42:49 <gwicke> #info <Tim> ok, so I will propose that you remove option 2 and we put the RFC to last call in that form 21:42:49 <gwicke> C 21:43:47 <gwicke> any objections to TimStarling's proposal? 21:43:53 <coreyfloyd> nope 21:44:07 <gwicke> you can also move it to an "also considered" section 21:44:09 <DanielK_WMDE> #info <DanielK_WMDE> querying "pages on this list that changed since X" turns this into a watchlist; <coreyfloyd> some other service may want to query reading lists to see if a particular user needs notified 21:44:23 <gwicke> just so that it's clear which other options were considered later 21:44:39 <coreyfloyd> DanielK_WMDE: yeah… its mostly forward looking… and just a reason we want to make sure we can query the individual pages later on for such things 21:45:14 <Scott_WUaS> (coreyfloyd: and mdholloway: any way to plan for or anticipate translation between wikipedia's 358 languages at this early stage and especially re querying ... re syncing private reading lists first within Android and then iOS and mobile web?) 21:45:58 <DanielK_WMDE> the goal is to make this available on the app first, on via the web interface later? 21:45:58 <coreyfloyd> Scott_WUaS: translation of what content? 21:46:29 <Scott_WUaS> (coreyfloyd: what will emerge in the reading lists) 21:46:39 <mdholloway> DanielK_WMDE: that's correct; and perhaps only tentatively on the web (coreyfloyd probably knows better about their plans) 21:46:57 <coreyfloyd> DanielK_WMDE: yes… Android, then iOS, then mobile web, then web - this gives us a reasonable ramp up of users and allows us to vet it from the project from both performance and product perspectives 21:47:25 <coreyfloyd> Scott_WUaS: I’m not sure I quite understand the question 21:47:44 <coreyfloyd> Scott_WUaS: there will be some UI in the apps/web interface that will go through the normal translate wiki process 21:47:44 <gwicke> #action Corey will update the RFC to make it clear that Option 1 is proposed 21:48:00 <DanielK_WMDE> the product perspecitive is probably quite different on the web interface. people have browser bookmarks, editors have watchlists and user pages 21:48:04 <coreyfloyd> Scott_WUaS: is that what you are asking about? 21:48:20 <Scott_WUaS> (coreyfloyd: I'll check out further https://phabricator.wikimedia.org/T164805 re what you mean by reading lists - thanks) 21:48:38 <coreyfloyd> DanielK_WMDE: yeah… Reading Lists are being heavily investigated by the product team in Readers currently 21:48:46 <coreyfloyd> And the designers… 21:49:08 <coreyfloyd> Rita ho just did a survey: https://goo.gl/nC5NpX 21:49:11 <DanielK_WMDE> why bake "reading" into the name, btw? 21:49:27 <Scott_WUaS> (re Reading Lists - https://phabricator.wikimedia.org/T164990) 21:49:30 <DanielK_WMDE> nothing in the functionality suggests "reading". just lists of pages 21:49:38 <DanielK_WMDE> could be my "deletion list" ;) 21:50:05 <gwicke> "killfile" 21:50:07 <mdholloway> :) 21:50:16 <mdholloway> just an artifact of the first planned use case, i think 21:50:17 <DanielK_WMDE> hehehe 21:50:19 <coreyfloyd> DanielK_WMDE: yeah it was specifically to scope this to reading 21:50:26 <coreyfloyd> And not watch lists for instance 21:51:06 <DanielK_WMDE> mdholloway: i'd prefer not to have "reading" all the service names and tables... "page lists"? 21:51:07 <coreyfloyd> DanielK_WMDE: we are not sure that this would be the ideal infrastructure for other types of lists 21:51:42 <coreyfloyd> DanielK_WMDE: however, the route naming exists to not preclude the option of having other types of lists (lists/reading/…) 21:51:55 <coreyfloyd> DanielK_WMDE: but I am not against removing it 21:52:47 <gwicke> I guess there is a namespacing concern as well; if this service was to be used for all kinds of other lists, you'd need to be able to distinguish those from the reading lists 21:53:01 <TimStarling> I'm fine with reading lists, being specific makes it easier to add features 21:53:10 <Zppix> Why not let users customize the name? 21:53:34 <coreyfloyd> Zppix: users can customize the name of lists 21:53:51 <TimStarling> the MW extension could contain both the API for the mobile apps, and also the UI for the websites 21:54:17 <coreyfloyd> Zppix: this is just mostly about the name of the routes - and the name of the service itself 21:54:28 <Zppix> coreyfloyd: oh i see my bad 21:54:29 <coreyfloyd> TimStarling: yeah… that is a possibility for sure 21:54:59 <Zppix> why not incorp this into mw itself? 21:55:06 <tgr> TimStarling: web (mobile at least) is moving towards relying on APIs instead of the MW skin 21:55:28 <gwicke> coreyfloyd: speaking of other list-of-title use cases, what is the plan for collections? 21:55:33 <TimStarling> if you make it generic then you have to imagine all possible use cases when you make b/c breaking changes 21:55:46 <tgr> where that effort will be at by the time reading lists reach the web interface (planned for Q4-ish) is an open question 21:56:00 <gwicke> (collections is the print-many-pages-to-PDF feature) 21:56:52 <gwicke> https://www.mediawiki.org/wiki/Extension:Collection 21:56:57 <coreyfloyd> gwicke: you mean for how collections relate to lists? 21:57:17 <gwicke> yeah, both need lists of pages 21:57:31 <gwicke> and some people might want to print their reading list.. just speculating 21:57:34 <TimStarling> OCG seems to be a favourite punching bag at the moment 21:57:35 <tgr> TimStarling: we considered building it on top of some kind of generic list feature in MediaWiki core but then decided for rule of three 21:57:38 <coreyfloyd> gwicke: ahh… well they are separate but being thought of… 21:57:45 <TimStarling> as an example of an unmaintained service 21:57:51 <tgr> not enough use cases / impementations yet to generalize 21:57:57 <coreyfloyd> gwicke: collections are public and lists are private… so we have talked about being able to convert between the 2 21:58:26 <TimStarling> and there were vague plans for sunsetting it, but it wasn't clear what that would mean for Collection 21:58:29 <coreyfloyd> gwicke: also we have talked about having a “make a pdf” button for reading lists that will use the collection extension 21:58:42 <gwicke> coreyfloyd: makes sense; some kind of unguessable list UUID could help with that, I guess 21:58:58 <coreyfloyd> TimStarling: yeah… I am not the expert on OCG, but it is being sunset and replaced with something else… that research is in process now 21:59:17 <TimStarling> perhaps OCG would be replaced by browser automation? 21:59:27 <tgr> TimStarling: current plan is to keep (refactor, hopefully) Collection and switch out OCG to Electron (or wkhtmltopdf, decision still pending) 21:59:39 <DanielK_WMDE> tgr: three, like watchlists, collections, and now reading lists?... 21:59:44 <gwicke> something browser-based, in any case 22:00:18 <tgr> DanielK_WMDE: well, Collection never had any serious list support 22:00:20 <gwicke> okay, we should wrap up soon 22:00:33 <tgr> session and wikitext hacks 22:00:47 <coreyfloyd> DanielK_WMDE: how these 3 work together and what can be eventually decommissioned is being looked at… they do have different use cases… but maybe we can get them to the same backend 22:01:06 <gwicke> I didn't hear any objections against Tim's proposal, and I think there were no major objections raised here that would make a last call inappropriate 22:01:20 <DanielK_WMDE> sgtm 22:01:26 <coreyfloyd> 👍 22:01:54 <gwicke> #agreed After changes to call out Option 1 as proposed, this RFC will enter its Final Comment Period. 22:02:43 <gwicke> final chance to add anything to the log.. 22:02:50 <Zppix> Nope i agree with tim 22:03:08 <gwicke> #endmeeting